Cangjie input method

Cangjie input method
Coding of "倉頡輸入法" (i.e. Cangjie method) in traditional Chinese characters
Traditional Chinese 倉頡輸入法
Simplified Chinese 仓颉输入法

The Cangjie input method (sometimes spelt “Changjie” or “Cang Jei”) is a system by which Chinese characters may be entered into a computer by means of a standard keyboard. Invented in 1976 by Chu Bong-Foo, the method is named after Cangjie, the man historically attributed with the invention of the first writing system of China; the name was suggested by Chiang Wei-kuo, then Defence Minister of the Republic of China. Although the input method was initially based upon Traditional Chinese characters, it has since been revamped such that interoperability between Cangjie and the Simplified Chinese character set was made possible.

Sometimes, for example in filenames, the name Cangjie is abbreviated as cj.

Unlike pinyin, Cangjie is based on the graphological aspect of the characters wherein each basic, graphical unit is represented by a basic character component, of which there are 24 in all, each mapped to a particular letter key on a standard QWERTY keyboard. An additional "difficult character" function is mapped to the X key. Within the keystroke-to-character representations, there also exist four subsections of characters: the Philosophical Set (corresponding to the letters 'A' to 'G' and representing the sun and the moon and the 5 elements), the Strokes Set (corresponding to the letters 'H' to 'N' and representing the brief and subtle strokes), the Body-Related Set (corresponding to the letters 'O' to 'R' and representing various parts of the human anatomy), and the Shapes Set (corresponding to the letters 'S' to 'Y' and representing complex and encompassing character forms).

The basic character components in Cangjie are usually called "radicals"; nevertheless, Cangjie decomposition is not based on traditional Kangxi radicals, nor is it based on standard stroke order; it is in fact a simple geometric decomposition.

Contents

Overview of the input method

The keys and "radicals"

The basic character components in Cangjie are called "radicals" (字根) or "letters" (字母). There are 24 radicals but 26 keys; the 24 radicals (the basic shapes 基本字形) are associated with roughly 76 auxiliary shapes (輔助字形), which in many cases are either rotated or transposed versions of components of the basic shapes. For instance, the letter A (日) can represent either itself, the slightly wider 曰, or a 90° rotation of itself. (For a more complete account of the 76-odd transpositions and rotations than the one listed below, see the Chinese Wikibooks entry listed in the links at the bottom of this article.)

Group Key Name Primary meaning
Philosophical group A 日 sun 日, 曰, 90° rotated 日 (as in 巴)
B 月 moon the top four strokes of 目, 冂, 爫, 冖, the top and top-left part of 炙, 然, and 祭, the top-left four strokes of 豹 and 貓, and the top four strokes of 骨
C 金 gold itself, 丷, 八, and the penultimate two strokes of 四 and 匹
D 木 wood itself, the first two strokes of 寸 and 才, the first two strokes of 也 and 皮
E 水 water 氵, the last five strokes of 暴 and 康, 又
F 火 fire the shape 小, 灬, the first three strokes in 當 and 光
G 土 earth
Stroke group H 竹 bamboo The slant and short slant, the Kangxi radical 竹, namely the first four strokes in 笨 and 節
I 戈 weapon The dot, the first three strokes in 床 and 庫, and the shape 厶
J 十 ten The cross shape and the shape 宀
K 大 big The X shape, including 乂 and the first two strokes of 右, as well as 疒
L 中 centre The vertical stroke, as well as 衤 and the first four strokes of 書 and 盡
M 一 one The horizontal stroke, as well as the final stroke of 孑 and 刁, the shape 厂, and the shape 工
N 弓 bow The crossbow and the hook
Body parts group O 人 person The dismemberment, the Kangxi radical 人, the first two strokes of 丘 and 乓, the first two strokes of 知, 攻, and 氣, and the final two strokes of 兆
P 心 heart The Kangxi radical 忄, the second stroke in 心, the last four strokes in 恭, 慕, and 忝, the shape 匕, the shape 七, the penultimate two strokes in 代, and the shape 勹
Q 手 hand The Kangxi radical 手
R 口 mouth The Kangxi radical 口
Character shapes group S 尸 corpse 匚, the first two strokes of 己, the first stroke of 司 and 刀, the third stroke of 成 and 豕, the first four strokes of 長 and 髟
T 廿 twenty Two vertical strokes connected by a horizontal stroke; the Kangxi radical 艸 when written as 艹 (whether the horizontal stroke is connected or broken)
U 山 mountain Three-sided enclosure with an opening on the top
V 女 woman A hook to the right, a V shape, the last three strokes in 艮, 衣, and 長
W 田 field Itself, as well as any four-sided enclosure with something inside it, including the first two strokes in 母 and 毋
Y 卜 fortune telling The 卜 shape and rotated forms, the shape 辶, the first two strokes in 斗
Collision/Difficult key* X 重/難 collision/difficult (1) disambiguation of Cangjie code decomposition collisions, (2) code for a "difficult-to-decompose" part
Special character key* Z (See note) Auxiliary code used for entering special characters (no meaning on its own). In most cases, this key combined with other keys will produce Chinese punctuations (such as 。,、,「 」,『 』).

Note: Some variants use Z as a collision key instead of X, in those systems Z has the name "collision" (重) and X has the name "difficult" (難); but the use of Z as a collision key is neither in the original Cangjie nor used in the current mainstream implementations; in some other variants, Z may have the name "user-defined" (造) or some other names

The auxiliary shapes of each Cangjie radical have changed slightly between different versions of the Cangjie method; this is one reason why different versions of the Cangjie method are not completely compatible.

Keyboard layout

The basic rules

The typist must be familiar with several decomposition rules 拆字規則 that defines how to analyse a character to arrive at a Cangjie code.

The rules are subject to various principles:

The short list of exceptions

Some forms are always decomposed in the same way, whether the rules say they should be decomposed this way or not. The number of such exceptions is small:

Form Fixed decomposition
Version 2 Version 3 Version 5
門 (door) 日 弓 (AN) 日 弓 (AN) 日 弓 (AN)
目 (eye) 月 山 (BU) 月 山 (BU) 月 山 (BU)
鬼 (ghost) 竹 戈 (HI) 竹 戈 (HI)
几 (small table) 竹 山 (HU) 竹 弓 (HN) 竹 弓 (HN)
贏 (win) 卜 弓 月 山 金 (YNBUC)
虍 (tiger [radical]) 卜 心 (YP) 卜 心 (YP) 卜 心 (YP)
亡 on top of 口 卜 口 (YR) 卜 口 (YR)
隹 (fowl) 人 土 (OG) 人 土 (OG) 人 土 (OG)
气 (vapor) 人 山 (OU) 人 弓 (ON) 人 一 弓 (OMN)
畿 minus the 田 女 戈 (VI) 女 戈 (VI)
鬥 (compete) 中 弓 (LN) 中 弓 (LN) 中 弓 (LN)
引 (pull) 弓 中 (NL) 弓 中 (NL) 弓 中 (NL)

Examples

Early Cangjie system

In the beginning, the Cangjie input method was not a way to produce a character in any character set. It was, instead, an integrated system consisting of the Cangjie input rules and a Cangjie controller board. The controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output, using the hi-res graphics mode of an Apple II computer. In the preface of the Cangjie user's manual, Mr. Chu wrote in 1982:

[Translation]
In terms of output: The output and input, in fact, [form] an integrated whole; there is no reason that [they be] dogmatically separated into two different facilities.… This is in fact necessary.…

In this early system, when the user types "yk " (for example) to get the Chinese character 文, the Cangjie codes do not get converted to any character encoding; the actual string "yk " is stored. In a very real sense, the Cangjie code of each character (string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.

A particular "feature" of this early system is that if you send random lowercase words to the character generator, it will attempt to construct Chinese characters according to the Cangjie decomposition rules, sometimes causing strange, unknown characters to appear. This unusual feature, "automatic generation of characters", is actually described in the manual and is responsible for producing more than 10,000 of the about 15,000 characters that the system can handle. The name Cangjie, evocative of creation of new characters, was actually very apt for this early version of Cangjie.

The presence of the integrated character generator also explains the historical necessity for the existence of the "X" key as used for disambiguation of decomposition collisions: because characters are "chosen" when the codes are output, every character that can be displayed must in fact have one and only one Cangjie decomposition. It would not make sense—nor would it be practical—for the system to provide a choice of candidate characters when some random text file is displayed; the user would not know which of the candidates are correct.

Issues

Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing; nevertheless, many users find Cangjie to be a difficult method. Many of the perceived difficulties arise from poor instruction.

Perceived difficulties

Enough practice, however, can overcome the above problems. A typist with sufficient practice in Cangjie touch types, much like a typist that works on the English language; it is entirely possible for a touch typist to type at 25 words (Chinese characters) per minute or better in Cangjie, yet have difficulty remembering the list of auxiliary shapes or even the decomposition rules. Experienced Cangjie typists can reportedly attain a typing speed between 60 wpm and over 200 wpm.

Cangjie, however, also has some "real" problems:

Actual difficulties

In some situations it cannot be used at all. Cangjie uses all 26 keys in an English (United States) keyboard; it cannot be used to input Chinese on cell phones. For cell phones, pinyin, 5-stroke (or 9-stroke by Motorola) and the Q9 input method are the current norm because they are designed specifically for use on numeric keypads.

Versions of Cangjie

The Cangjie input method is commonly said to have gone through 5 generations (commonly referred to as “versions” in English), each of which slightly incompatible with the other. Currently, version 3 (第三代倉頡) is the most common, being the version of Cangjie supported natively by Microsoft Windows. Version 5 (第五代倉頡), supported by the Free Cangjie IME and previously the only Cangjie supported by SCIM, is a significant minority.

The early Cangjie system supported by the Zero One card on the Apple II was Version 2; Version 1 had never been released.

The Cangjie input method supported on the Mac OS is somewhat like Version 3 and somewhat like Version 5.

Besides the original Cangjie input method, Version 5 was also created directly by Mr Chu, the inventor. Originally slated for release as Version 6, Mr Chu had hoped that the release of Version 5 would bring an end to the “more than ten versions of Cangjie input method” (slightly incompatible versions created by different vendors).

Version 6 has been developed by Mr Chu's longtime assistant Shen Honglian (沈紅蓮). It is created as the encoding for a character set of about 100,000 characters extracted from literatures. This character set is developed independently from Unicode, which Mr Chu heavily criticized as inferior in design. Version 6 has not been released to the public, but it is being used to create a database which accurately stores every historical Chinese text.

Variants of Cangjie

Most modern implementations of Cangjie IMEs provide various convenience features:

Besides the wildcard key, many of the above features are very convenient for casual users but unsuitable for touch typists because they make the Cangjie IME unpredictable.

There are also various attempts to "simplify" Cangjie one way or another:

Applications

Many researchers have discussed ways to decompose Chinese characters into major components, and have tried to build applications based on the decomposition system. The idea can be referred as the study of The Genes of Chinese Characters [1]. Cangjie codes certainly offer a basis for such an endeavour. Sinica Academia in Taiwan [2] and Jiaotung University in Shanghai [3] have similar projects as well.

Computing the similarity of the writings of the Chinese characters is a direct applicaiton of the decomposed Characters, e.g., [4]. The Cangjie input method offers a good starting point for this kind of application. By relaxing the limits of five codes for each Chinese character and adopting more detailed Cangjie codes for each character, we can compute visually similar characters. When integrating with pronunciation information about Chinese characters, it is possible to achieve computer-assisted learning of Chinese characters [5].

See also

References

  1. ^ http://zh.wikipedia.org/zh-hant/漢字基因 漢字基因
  2. ^ http://cdp.sinica.edu.tw/cdphanzi/ 漢字構形資料庫
  3. ^ 上海交通大學漢字編碼組,上海漢語拼音文字研究組編著。漢字信息字典。北京市科學出版社,1988。
  4. ^ 宋柔,林民,葛詩利。漢字字形計算及其在校對系統中的應用,小型微型計算機系統,第29卷第10期,第1964至1968頁,2008。
  5. ^ http://dx.doi.org/10.1145/1967293.1967297 Chao-Lin Liu, Min-Hua Lai, Kan-Wen Tien, Yi-Hsuan Chuang, Shih-Hung Wu, and Chia-Ying Lee. Visually and phonologically similar characters in incorrect Chinese words: Analyses, identification, and applications, ACM Transactions on Asian Language Information Processing, 10(2), 10:1-39. Association for Computing Machinery, USA, 2011.

External links