I70 Character Encoding

Identifies a scheme for representing a set of graphical characters with bit patterns.

Type

Identifier (ID)

Length

Min 1 / Max 2

Codes

Code	Description
1	US-ASCII Latin, Swahili, Hawaiian and American English without most typographic frills.
2	EBCDIC-US IBM mainframes Extended Binary Coded Decimal Information Code.
3	ISO 646
4	ISO 8859-1 North America, Western Europe, Latin America, the Caribbean, Canada, Africa.
5	ISO 8859-2 Eastern Europe.
6	ISO 8859-5 The Cyrillic alphabet. Bulgarian, Belarusian, Russian and Macedonian.
7	ISO 8859-7 The modern Greek alphabet and mathematical symbols derived from the Greek.
8	ISO 8859-3 SE Europe, Esperanto, miscellaneous others.
9	ISO 8859-4 Scandinavia/Baltics (and others not in ISO-8859-1).
10	ISO 8859-6 The Arabic alphabet.
11	ISO 8859-8 The Hebrew alphabet.
12	ISO 8859-9 The Turkish alphabet. Same as ISO-8859-1 except Turkish characters replace Icelandic.
13	ISO 8859-15 Nordic alphabets. Lappish, Nordic, Eskimo.
14	ISO 2022
15	ISO 2375
16	ISO 10646
17	GB18030 Simplified and traditional Chinese characters.
18	EUC-JP Japanese character set standards, namely JIS X 0208, JIS X 0212, and JIS X 0201.
19	ISO-2022-JP JIS Encodings.
20	ISO-2022-JP-2 Multilingual extension of ISO-2022-JP.
21	ISO-2022-KR Encodes ASCII and the Korean double-byte.
22	EUC-KR Used primarily for Japanese, Korean, and simplified Chinese.
23	ISO-2022-CN-EXT Extends ISO-2022-CN with additional Guobiao standard.
24	ISO-2022-CN Support the character sets GB 2312 (for simplified Chinese) and CNS 11643 (for traditional Chinese).
25	Big5 Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters.
26	ISO-10646-UCS-2 Universal Character Set. Fixed-length 16 bits (2 bytes). Replaced by UTF-16.
27	ISO-10646-UCS-4 Universal Character Set coded in 4 octets. It is now treated simply as a synonym for UTF-32.
28	SCSU Standard Compression Scheme for Unicode. Technical Standard for reducing the number of bytes needed to represent Unicode text.
29	UTF-7 Universal Character Set. 7-bit Unicode Transformation Format.
30	UTF-16BE Universal Character Set. 16-bit Unicode Transformation Format Big Endian Byte Order.
31	UTF-16LE Universal Character Set. 16-bit Unicode Transformation Format Little Endian Byte Order.
32	UTF-16 Universal Character Set. 16-bit Unicode Transformation.
33	UTF8/CESU-8 Universal Character Set. 8-bit Unicode Transformation.
34	UTF-32 Universal Character Set. 32-bit Unicode Transformation.
35	UTF-32BE Universal Character Set. 32-bit Unicode Transformation Format Big Endian Byte Order.
36	UTF-32LE Universal Character Set. 32-bit Unicode Transformation Format Little Endian Byte Order.
37	BOCU-1 Binary Ordered Compression for Unicode (BOCU) is a MIME compatible Unicode compression scheme. Combines UTF-8 with the compactness of Standard Compression Scheme for Unicode (SCSU).