Halfwidth and fullwidth forms

A command prompt (cmd.exe) with Korean Localisation showing halfwidth and fullwidth characters.

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. With fixed-width fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.

In the days of computer terminals and text mode computing, characters were normally laid out in a grid, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide, and an SBCS (single byte character set) was generally used to encode characters of western languages.

For a number of practical and aesthetic reasons, Han characters would need to be twice as wide as these fixed-width SBCS characters. These "fullwidth characters" were typically encoded in a DBCS (double byte character set), although less common systems used other variable-width character sets that used more bytes per character.

Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF.

In Unicode

Halfwidth and Fullwidth Forms
Range	U+FF00..U+FFEF (240 code points)
Plane	BMP
Scripts	Hangul (52 char.) Katakana (55 char.) Latin (52 char.) Common (66 char.)
Symbol sets	Variant width characters
Assigned	225 code points
Unused	15 reserved code points
Unicode version history

1.0.0	216 (+216)
1.0.1	223 (+7)
3.2	225 (+2)

Note: ^[1]^[2]^[3]

In Unicode, if a certain grapheme can be represented as either a fullwidth character or a halfwidth character, it is said to have both a fullwidth form and a halfwidth form.

Halfwidth and Fullwidth Forms is the name of Unicode block U+FF00–FFEF, the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.

Range U+FF01–FF5E reproduces the characters of ASCII 21 to 7E as fullwidth forms, that is, a fixed width form used in CJK computing. This is useful for typesetting Latin characters in a CJK environment. U+FF00 does not correspond to a fullwidth ASCII 20 (space character), since that role is already fulfilled by U+3000 "ideographic space."

Range U+FF65–FFDC encodes halfwidth forms of Katakana and Hangul characters – see half-width kana. Range U+FFE0–FFEE includes fullwidth and halfwidth symbols.

Block

Halfwidth and Fullwidth Forms^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+FF0x		！	＂	＃	＄	％	＆	＇	（	）	＊	＋	，	－	．	／
U+FF1x	０	１	２	３	４	５	６	７	８	９	：	；	＜	＝	＞	？
U+FF2x	＠	Ａ	Ｂ	Ｃ	Ｄ	Ｅ	Ｆ	Ｇ	Ｈ	Ｉ	Ｊ	Ｋ	Ｌ	Ｍ	Ｎ	Ｏ
U+FF3x	Ｐ	Ｑ	Ｒ	Ｓ	Ｔ	Ｕ	Ｖ	Ｗ	Ｘ	Ｙ	Ｚ	［	＼	］	＾	＿
U+FF4x	｀	ａ	ｂ	ｃ	ｄ	ｅ	ｆ	ｇ	ｈ	ｉ	ｊ	ｋ	ｌ	ｍ	ｎ	ｏ
U+FF5x	ｐ	ｑ	ｒ	ｓ	ｔ	ｕ	ｖ	ｗ	ｘ	ｙ	ｚ	｛	｜	｝	～	｟
U+FF6x	｠	｡	｢	｣	､	･	ｦ	ｧ	ｨ	ｩ	ｪ	ｫ	ｬ	ｭ	ｮ	ｯ
U+FF7x	ｰ	ｱ	ｲ	ｳ	ｴ	ｵ	ｶ	ｷ	ｸ	ｹ	ｺ	ｻ	ｼ	ｽ	ｾ	ｿ
U+FF8x	ﾀ	ﾁ	ﾂ	ﾃ	ﾄ	ﾅ	ﾆ	ﾇ	ﾈ	ﾉ	ﾊ	ﾋ	ﾌ	ﾍ	ﾎ	ﾏ
U+FF9x	ﾐ	ﾑ	ﾒ	ﾓ	ﾔ	ﾕ	ﾖ	ﾗ	ﾘ	ﾙ	ﾚ	ﾛ	ﾜ	ﾝ	ﾞ	ﾟ
U+FFAx	HW HF	ﾡ	ﾢ	ﾣ	ﾤ	ﾥ	ﾦ	ﾧ	ﾨ	ﾩ	ﾪ	ﾫ	ﾬ	ﾭ	ﾮ	ﾯ
U+FFBx	ﾰ	ﾱ	ﾲ	ﾳ	ﾴ	ﾵ	ﾶ	ﾷ	ﾸ	ﾹ	ﾺ	ﾻ	ﾼ	ﾽ	ﾾ
U+FFCx			ￂ	ￃ	ￄ	ￅ	ￆ	ￇ			ￊ	ￋ	ￌ	ￍ	ￎ	ￏ
U+FFDx			ￒ	ￓ	ￔ	ￕ	ￖ	ￗ			ￚ	ￛ	ￜ
U+FFEx	￠	￡	￢	￣	￤	￥	￦		￨	￩	￪	￫	￬	￭	￮
Notes 1.^ As of Unicode version 10.0 2.^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Halfwidth and Fullwidth Forms block:

Version	Final code points^{[lower-alpha 1]}	Count	L2 ID	WG2 ID	Document
1.0.0	U+FF01..FF5E, FF61..FFBE, FFC2..FFC7, FFCA..FFCF, FFD2..FFD7, FFDA..FFDC, FFE0..FFE6	216			(to be determined)
1.0.1	U+FFE8..FFEE	7			(to be determined)
3.2	U+FF5F..FF60	2	L2/99-052		Freytag, Asmus (1999-02-05), The math pieces from the symbol font
			L2/01-033		Karlsson, Kent; Freytag, Asmus (2001-01-16), Disunify braces/brackets for math, computing science, and Z notation from similar-looking CJK braces/brackets
			L2/01-159	N2344	Ad-hoc report on Mathematical Symbols, 2001-04-03
			L2/01-157	N2345R	Karlsson, Kent (2001-04-04), Proposal to disunify certain fencing CJK punctuation marks from similar-looking Math fences
			L2/01-168		Whistler, Ken (2001-04-10), Bracket Disunification & Normalization Hell
			L2/01-223		Suignard, Michel (2001-05-23), Discussion of Issues Regarding Bracket Disunification
			L2/01-317		Suignard, Michel (2001-08-14), Bracket Disunification & Normalization
			L2/01-295R		Moore, Lisa (2001-11-06), Minutes from the UTC/L2 meeting #88
↑ Proposed code points and characters names may differ from final code points and names

In OpenType

OpenType has the fwid, halt, hwid and vhal "feature tags" to be used for providing fullwidth or halfwidth form of a character.

References

↑ "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
↑ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.

External links

East Asian Width Unicode Standard Annex #11

Unicode

Code points

Characters

Special purpose	BOM Combining Grapheme Joiner Left-to-right mark / Right-to-left mark Soft hyphen Word joiner Zero-width joiner Zero-width non-joiner Zero-width space
Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence Variation sequences
Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards

Related topics

Scripts and symbols in Unicode
Common and inherited scripts	Combining marks Diacritics Punctuation Space
Modern scripts	Adlam Arabic diacritics Armenian Balinese Bamum Batak Bengali Bopomofo Braille Buhid Burmese Canadian Aboriginal Chakma Cham Cherokee CJK Unified Ideographs (Han) Cyrillic Deseret Devanagari Ge'ez Georgian Greek Gujarati Gurmukhī Hangul Hanja Hanunó'o Hebrew diacritics Hiragana Javanese Kanji Kannada Katakana Kayah Li Khmer Khudawadi Lao Latin Lepcha Limbu Lisu (Fraser) Lontara Malayalam Mandaic Masaram Gondi Meetei Mayek Mende Kikakui Miao (Pollard) Mongolian Mro N'Ko New Tai Lue Newa Nushu Ol Chiki Oriya Osage Osmanya Pahawh Hmong Pau Cin Hau Rejang Samaritan Śāradā Saurashtra Shavian Sinhala Sorang Sompeng Sundanese Sylheti Nagari Syriac Tagalog (Baybayin) Tagbanwa Tai Le Tai Tham Tai Viet Takri Tamil Telugu Thaana Thai Tibetan Tifinagh Tirhuta Vai Warang Citi Yi
Ancient and historic scripts	Ahom Anatolian hieroglyphs Ancient North Arabian Avestan Bassa Vah Bhaiksuki Brāhmī Carian Caucasian Albanian Coptic Cuneiform Cypriot Egyptian hieroglyphs Elbasan Glagolitic Gothic Grantha Hatran Imperial Aramaic Inscriptional Pahlavi Inscriptional Parthian Kaithi Kharosthi Khojki Linear A Linear B Lycian Lydian Mahajani Manichaean Marchen Meroitic Modi Multani Nabataean Ogham Old Hungarian Old Italic Old Permic Old Persian cuneiform Old Turkic Palmyrene 'Phags-pa Phoenician Psalter Pahlavi Runic Siddham South Arabian Soyombo Tangut Ugaritic Zanabazar Square
Notational scripts	Duployan SignWriting
Symbols	Cultural, political, and religious symbols Currency Mathematical operators and symbols Phonetic symbols (including IPA) Emoji

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.