DBCS

A double-byte character set (DBCS) is a character set that represents each character with 2 bytes. The DBCS supports national languages that contain a large number of unique characters or symbols (the maximum number of characters that can be represented with 1 byte is 256 characters, while 2 bytes can represent up to 65,536 characters). Examples of such languages include Japanese, Korean, and Chinese.

DBCS stands for Double Byte Character Set. This term has two basic meanings:

In CJK (Chinese, Japanese and Korean) computing, the term "DBCS" traditionally means a character set in which every graphic character not representable by an accompanying SBCS is encoded in two bytes; Han characters would generally comprise most of these two-byte characters.
The term "DBCS" can also mean a character set in which all characters (including all control characters) are encoded in two bytes.

1 The DBCS in CJK computing
2 Controversy
3 See also
4 External links

The DBCS in CJK computing

The term DBCS traditionally refers to a character set where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bit set (i.e., being greater than 7 bits), and is always paired up with a single-byte character-set (SBCS). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC.

Note that this original meaning of DBCS is different from what some consider correct usage today. Some insist that these character sets be properly called either multi-byte character sets (MBCS) or variable-width encodings because character sets like EUC-JP, EUC-TW, GB18030 and UTF-8 use more than 2 bytes for some characters, and they support 1 byte for some other characters.

Controversy

Some people use DBCS to mean the UTF-16 and UTF-8 encodings, while other people use the term DBCS to mean older (pre-Unicode) code pages that use more than one byte per character. Shift-JIS, GB2312 and Big5 are a few code pages that can contain more than one byte per character, but even using the term DBCS for these code pages is incorrect terminology because these code pages are really MBCS (MultiByte Character Sets). Some IBM mainframes do have true DBCS code pages, which contain only the double byte portion of a multibyte code page.

If a person uses the term "DBCS Enablement" for software internationalization, they are using ambiguous terminology. They either mean they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies translation into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible code pages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other code pages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.

External links

Character encodings

Character sets

Early telecommunications	ASCII ISO/IEC 646 ISO/IEC 6937 T.61 sixbit code pages Baudot code Morse code Chinese telegraph code

ISO/IEC 8859	-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16

Bibliographic use	ANSEL ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 MARC-8

National standards	ArmSCII CNS 11643 GOST 10859 GB 2312 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KPS 9566 KS X 1001 PASCII TIS-620 TSCII VISCII YUSCII

EUC	CN JP KR TW

ISO/IEC 2022	CN JP KR CCCII

MacOS codepages ("scripts")	Arabic CentralEurRoman ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari Dingbats Farsi Greek Gujarati Gurmukhi Hebrew Icelandic Japanese / ShiftJIS Korean / EUC-KR Roman Romanian Symbol Thai / TIS-620 Turkish Ukrainian

DOS codepages	437 720 737 775 850 852 855 857 858 860 861 862 863 864 865 866 869 Kamenický Mazovia MIK Iran System

Windows codepages	874 / TIS-620 932 / ShiftJIS 936 / GBK 949 / EUC-KR 950 / Big5 1250 1251 1252 1253 1254 1255 1256 1257 1258 1361 54936 / GB18030

EBCDIC codepages	37/1140 273/1141 277/1142 278/1143 280/1144 284/1145 285/1146 297/1147 420/16804 424/12712 500/1148 838/1160 871/1149 875/9067 930/1390 933/1364 937/1371 935/1388 939/1399 1025/1154 1026/1155 1047/924 1112/1156 1122/1157 1123/1158 1130/1164 JEF KEIS

Platform specific	ATASCII CDC display code DEC-MCS DEC Radix-50 Fieldata GSM 03.38 HP roman8 PETSCII TI calculator character sets WISCII ZX Spectrum character set

Unicode / ISO/IEC 10646	UTF-8 UTF-16/UCS-2 UTF-32/UCS-4 UTF-7 UTF-1 UTF-EBCDIC GB 18030 SCSU BOCU-1

Miscellaneous codepages	APL Cork HZ IBM code page 1133 KOI8 TRON

Related topics	control character (C0 C1) CCSID Character encodings in HTML charset detection Han unification ISO 6429/IEC 6429/ANSI X3.64 mojibake

DBCS

Contents

The DBCS in CJK computing

Controversy

See also

External links