Precomposed character
From Wikipedia, the free encyclopedia
A precomposed character (alternatively decomposable character) is a Unicode entity that can be decomposed into an equivalent string of several other characters. Typically, a precomposed character is decomposed into the main character and a combining diacritical mark.
The precomposed characters are included in the character set to aid computer systems with incomplete Unicode support, where decomposed equivalent characters may render incorrectly.
Similarly, ligatures are precompositions of their constituent letters or graphemes.
For example, the two strings
- ḱṷṓn (U+006B U+0301 U+0075 U+032D U+006F U+0304 U+0301 U+006E) and
- ḱṷṓn (U+1E31 U+1E77 U+1E53 U+006E)
are equivalent and should render identically. In practice, however, some Unicode implementations still have difficulties with combining the decomposed characters.
OpenType has the ccmp "feature tag" to define glyphs that are compositions or decompositions involving combining characters.
In theory, most Chinese characters as encoded by Han unification and similar schemes could be treated as precomposed characters, since they can be reduced (decomposed) to their constituent strokes and ideograph descriptions, though Unicode does not take this approach that would certainly be on the cutting edge of text storage and layout. Such an approach could potentially reduce the number of characters in the character set from tens of thousands to just a few hundred.
[edit] See also
[edit] External links
- Free Idg Serif, a derivative of the FreeSerif font with added declarations of precomposed characters.