|
Oracle9i Database Development
Oracle9i - Sorting Capabilities
Oracle provides linguistic sort
capabilities that handle the complex sorting requirements of different languages and
cultures. Different languages have different sort orders. What' s more, different cultures
or countries using the same alphabets may sort words differently.
For example, in Danish, the letter Æ is
after Z, while Y and Ü are considered to be variants of the same letter.
Sort order can becase sensitive or insensitive, and can ignore accents or not. It can also
be either phonetic or based on the appearance of the character, such as ordering by the
number of strokes or by radicals for East Asian ideographs.Another common sorting issue is
when letters are combined.
For example, in traditional Spanish, "ch"is a distinct character, which means
that the correct order would be: cerveza, Colorado, cheremoya, and so on. This means that
the letter "c" cannot be sorteduntil checking to see if the next letter is an
"h".
Oracle provides several different types of sort, and can achieve a linguistically correct
sort as well as the new multilingual ISO standard (10646) designed to handle many
languages at the same time.
Using Binary Sorts
Conventionally, when character data is stored, the sort sequence is based on thenumeric
values of the characters defined by the character encoding scheme. This is called a binary
sort.
Binary sorts are the fastest type of sort, and produce reasonable results for the English
alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending
numeric value.
Note, however, that in the ASCII standard, all uppercase letters appear before any
lowercase letters. In the EBCDIC standard, the opposite is true: all lowercase letters
appear before any uppercase letters. When characters used in other languages are present,
a binary sort generally does not produce reasonable results.
For example, an ascending ORDER BY query
would return the character strings ABC, ABZ, BCD, ÄBC, in the sequence, when the Ä has a
higher numeric value than B in the character encoding scheme.
For languages using Chinese characters, a binary sort is not linguistically meaningful.
Using Linguistic
Sorts
To produce a sort sequence that matches the alphabetic sequence of characters, another
sort technique must be used that sorts characters independently of their
numeric values in the character encoding scheme. This technique is called a linguistic
sort.
A linguistic sort operates by replacing characters with numeric values that reflect each
character' s proper linguistic order. These numeric values are found in a table containing
major and minor values.
Oracle makes two passes when comparing strings.
The first pass is to compare the major value of entire string from the major table.
The second pass is to compare the minor value from the minor table.
Each major table entry contains the Unicode codepoint and major value. Usually, letters
with the same appearance will have the same major value.
Oracle defines letters with diacritic and case differences for the same major value but
different minor values.
Oracle offers two
kinds of linguistic sort:
Monolingual, commonly used for European languages
Multilingual, commonly used for Asian languages.
|