Oracle9i - Unicode
What is
Unicode?
Unicode is a universal encoded character set that allows you to store information from any
language using a single character set.
Extended Unicode Enablement
Unicode provides a unique code value for every character, regardless of the
platform, program, or language.
The Unicode standard has been adopted by
many software and hardware vendors, many operating systems and browsers now support
Unicode.
Unicode is required by modern standards
such as XML, Java, JavaScript, LDAP, CORBA 3.0,
WML, and it is also compliant to ISO/IEC 10646 standard.
Oracle started supporting Unicode as a
database character set in Oracle7.
In Oracle9i, Unicode support has been
greatly expanded so that customers can find the right solution for their globalization
needs.
Oracle9i supports Unicode 3.0, the third
and most recent version of the Unicode standard.
Unicode Encoding
There are two common ways to encode Unicode 3.0 characters:
UTF-16 Encoding
UTF-8 Encoding
UTF-8 Encoding
This is the 8-bit encoding of Unicode. It
is a variable-width multibyte encoding in which the character codes 0x00 through 0x7F have
the same meaning as ASCII.
One Unicode character can be 1-byte,
2-bytes, or 3-bytes in this encoding.
Generally characters from the European
scripts are represented in either 1 or 2 bytes, while characters from most Asian scripts
are represented in 3 bytes.
UTF-16 Encoding
This is the 16-bit encoding of Unicode.
It is a 2 byte fixed-width encoding in
which
the character codes 0x0000 through 0x007F have the same meaning as ASCII.
OneUnicode character is 2-bytes in this
encoding. Characters from all scripts arerepresented in 2 bytes. |