Half-width kana

From Wikipedia, the free encyclopedia

Half-width kana (半角カナ) is half of fullwidth form. It refers to the katakana character portion of the character set specified by JIS X 0201.

Although an official name is JIS X 0201 katakana, half-width kana is the commonly known name and this term will be used in this article.

Contents

[edit] History

ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it is Americentric in nature and does not contain characters and symbols (for example, the ¥ yen currency symbol) needed for representation of Japanese.

JIS X 0201 was developed in 1969, and computers at that time simply did not have the computational power & memory necessary to process the thousands of Kanji (Chinese-based) characters that exist in written Japanese, so as a simplification, Kanji characters were always represented by katakana.

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams. As single-byte characters..." [1]

To make katakana fit into the area allowed, some compromises were made: the diacritical marks Dakuten and Handakuten are treated as separate characters instead of being part of the preceding character. This led to the so-called "half-width kana" and these compromises still cause problems today for computer programs, apart from frequently being considered to be visually unattractive.

[edit] Half-width table

\Trailing 4 bits→
↓Leading 4 bits
0 1 2 3 4 5 6 7 8 9 a b c d e f
0                                
1                                
2                                
3                                
4                                
5                                
6                                
7                                
8                                
9                                
a  
b ソ
c
d
e                                
f                                

[edit] Half-width kana on the Internet

[edit] E-mail

Since SMTP and NNTP, protocols used to deliver e-mail and Usenet, respectively, formerly was only able to transmit 7-bits, so it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

[edit] Web pages

The problems that exists in e-mail do not exist with Web pages since HTTP accepts 8-bit characters.

A problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS,EUC-JP, or UTF-7 - hence character code information should be specified with a HTTP response header or a Meta tag.

[edit] Misunderstanding of JIS X 0201

In fact, JIS X 0201 katakana is not half-width katakana. The standard doesn't define character's width. It defines only the code representation of katakana characters. The term "half-width" is just the remains of the old devices that displayed single-byte characters in half-width (as compared with double-byte ones). In JIS X 0201 standard, katakana characters in its code chart are printed in normal width, not half-width.

However, the misunderstanding that the standard defines "half-width" characters is widespread. People who know the standard will often say "so-called half-width kana."

[edit] See also

[edit] References

  1. ^  Lunde, Ken. CJKV Information Processing. 1st ed. O'Reilly, 1999. p. 144-145
Languages