GB18030
| Unicode series |
| Unicode
|
| UCS |
| UTF-7 |
| UTF-8 |
| UTF-16 |
| UTF-32 |
| SCSU |
| Punycode
|
| Bi-directional text |
| BOM |
| Han unification |
| Unicode and HTML |
GB18030 can be considered a Chinese equivalent of UTF-8. Like UTF-8 it is a superset of ASCII and can represent the whole range of Unicode code points. However, unlike Unicode, GB18030 also maintains compatibility with GB2312/GBK which was the preexisting standard character encoding used in the PRC. Part of the mapping data is from a lookup table (similar to GBK). The rest is calculated algorithmically.
Most major western computer companies had already standardised on some variant of Unicode as the primary format for use in their binary formats and OS calls. However they mostly had only bothered to support code points in the BMP. In a move of historical significance the PRC decided to mandate support of certain code points outside the BMP. This means that operating systems can no longer get away with treating characters as 16 bit fixed width entities (UCS-2). Therefore they must either process the data in a variable width format (such as UTF-8 or UTF-16) or move to a larger fixed width format (such as UCS-4 or UTF-32). Microsoft made the change from UCS-2 to UTF-16 with Windows 2000.
The SimSun 18030 font enables the display of the GB 18030 characters, which includes all the characters in Unicode 2.1 plus new characters found in the Unicode CJK Unified Ideographs Extension A section.
| Table of contents |
|
2 References 3 External links |
See also
References
- IANA Charset Registration for GB18030
- English language summary of GB 18030-2000
- Authoritative mapping table between GB18030 and Unicode
- ICU Converter Explorer: GB18030
- Unicode CJK Unified Ideographs Extension A (PDF, 1.5MB)
- Unicode CJK Unified Ideographs Extension B (PDF, 13 MB)