Variant form (Unicode)
A variant form is a different glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode which consist of a base character followed by a variation selector character.
A variant form usually has a very similar appearance and meaning as its base form. The mechanism is intended for variant forms where, generally, if the variant form is unavailable, displaying the base character does not change the meaning of the text, and may not even be noticeable by many readers.
Unicode defines two types of variation sequences:[1]
- Standardized variation sequences defined in StandardizedVariants.txt[1][2]
- Ideographic variation sequences defined in the Ideographic Variation Database (IVD)[3][4]
Variation selector characters reside in several Unicode blocks:
- Variation Selectors (16 characters abbreviated VS1–VS16)
- Variation Selectors Supplement (240 characters abbreviated VS17–VS256)
- Mongolian (3 characters abbreviated FVS1–FVS3)
Variation selectors are not required for Arabic and Latin cursive characters, where substitution of glyphs can occur based on context: glyphs may be connected together depending on whether the character is the initial character in a word, the final character, a medial character or an isolated character. These types of glyph substitution are easily handled by the context of the character with no other authoring input involved. Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear. Ligatures are similar instances where glyphs may be substituted simply by turning ligatures on or off as a rich text attribute.
For other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually. This is the case with character/glyphs referred to as gaiji, where different glyphs are used for the same character either historically or for ideographs for family names. This is one of the gray areas in distinguishing between a glyph and a character. If a family name differs slightly from the ideograph character it derives from, then is that a simple glyph variant or a character variant.
Character substitutions may also occur outside of Unicode, for example with OpenType Layout tags.[5]
Blocks with standardized variation sequences
As of Unicode 8.0, standardized variation sequences specifically for emoji/text presentation are defined for base characters in seventeen blocks:[1][2]
- Arrows
- Basic Latin
- CJK Symbols and Punctuation
- Dingbats
- Enclosed Alphanumeric Supplement
- Enclosed Alphanumerics
- Enclosed CJK Letters and Months
- Enclosed Ideographic Supplement
- General Punctuation
- Geometric Shapes
- Latin-1 Supplement
- Letterlike Symbols
- Mahjong Tiles
- Miscellaneous Symbols
- Miscellaneous Symbols and Arrows
- Miscellaneous Technical
- Supplemental Arrows-B
Other standardized variation sequences are formed with base characters in the following eight blocks:[1][2]
- CJK Unified Ideographs
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- Manichaean
- Mathematical Operators
- Mongolian
- Phags-pa
- Supplemental Mathematical Operators
Blocks with ideographic variation sequences
As of 16 May 2014, ideographic variation sequences are defined for base characters in six blocks:[3][4]
- CJK Compatibility Ideographs
- CJK Unified Ideographs
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- CJK Unified Ideographs Extension C
- CJK Unified Ideographs Extension D
Variation Selectors block
Variation Selectors | |
---|---|
Range |
U+FE00..U+FE0F (16 code points) |
Plane | BMP |
Scripts | Common |
Assigned | 16 code points |
Unused | 0 reserved code points |
Unicode version history | |
3.2 | 16 (+16) |
Note: [6] |
Variation Selectors is a Unicode block containing 16 Variation Selector format characters. They are used to specify a specific glyph variant for a Unicode character, such as the Japanese, Chinese, Korean, or Taiwanese form of a particular CJK ideograph.
They affect the glyph variant of the preceding character.
These combining characters are named variation selector-1 (for U+FE00) through to variation selector-16 (U+FE0F), and are abbreviated VS1 – VS16.
- CJK compatibility ideograph variation sequences contain VS1–VS3 (U+FE00–U+FE02)
- CJK Unified Ideographs Extension A and B variation sequences contain VS1 (U+FE00) and VS2 (U+FE01)
- Emoji variation sequences contain VS16 (U+FE0F) for emoji-style or VS15 (U+FE0E) for text style
- Manichaean, Phags-pa, and mathematical variation sequences contain only VS1 (U+FE00)
- VS4–VS14 (U+FE03–U+FE0D) are not used for any variation sequences
Variation Selectors[1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+FE0x | VS 1 |
VS 2 |
VS 3 |
VS 4 |
VS 5 |
VS 6 |
VS 7 |
VS 8 |
VS 9 |
VS 10 |
VS 11 |
VS 12 |
VS 13 |
VS 14 |
VS 15 |
VS 16 |
Notes
|
Variation Selectors Supplement block
Variation Selectors Supplement | |
---|---|
Range |
U+E0100..U+E01EF (240 code points) |
Plane | SSP |
Scripts | Common |
Assigned | 240 code points |
Unused | 0 reserved code points |
Unicode version history | |
4.0 | 240 (+240) |
Note: [6] |
Variation Selectors Supplement is a Unicode block containing additional Variation Selectors beyond those found in the Variation Selectors block.
These combining characters are named variation selector-17 (for U+E0100) through to variation selector-256 (U+E01EF), abbreviated VS17 – VS256.
As of 16 May 2014, VS17 (U+E0100) to VS48 (U+E011F) are used in ideographic variation sequences in the Unicode Ideographic Variation Database (IVD).[3][4] However, as of Unicode 8.0, they are not found in any standardized variation sequence.
Variation Selectors Supplement[1] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+E010x | VS 17 |
VS 18 |
VS 19 |
VS 20 |
VS 21 |
VS 22 |
VS 23 |
VS 24 |
VS 25 |
VS 26 |
VS 27 |
VS 28 |
VS 29 |
VS 30 |
VS 31 |
VS 32 |
U+E011x | VS 33 |
VS 34 |
VS 35 |
VS 36 |
VS 37 |
VS 38 |
VS 39 |
VS 40 |
VS 41 |
VS 42 |
VS 43 |
VS 44 |
VS 45 |
VS 46 |
VS 47 |
VS 48 |
U+E012x | VS 49 |
VS 50 |
VS 51 |
VS 52 |
VS 53 |
VS 54 |
VS 55 |
VS 56 |
VS 57 |
VS 58 |
VS 59 |
VS 60 |
VS 61 |
VS 62 |
VS 63 |
VS 64 |
U+E013x | VS 65 |
VS 66 |
VS 67 |
VS 68 |
VS 69 |
VS 70 |
VS 71 |
VS 72 |
VS 73 |
VS 74 |
VS 75 |
VS 76 |
VS 77 |
VS 78 |
VS 79 |
VS 80 |
U+E014x | VS 81 |
VS 82 |
VS 83 |
VS 84 |
VS 85 |
VS 86 |
VS 87 |
VS 88 |
VS 89 |
VS 90 |
VS 91 |
VS 92 |
VS 93 |
VS 94 |
VS 95 |
VS 96 |
U+E015x | VS 97 |
VS 98 |
VS 99 |
VS 100 |
VS 101 |
VS 102 |
VS 103 |
VS 104 |
VS 105 |
VS 106 |
VS 107 |
VS 108 |
VS 109 |
VS 110 |
VS 111 |
VS 112 |
U+E016x | VS 113 |
VS 114 |
VS 115 |
VS 116 |
VS 117 |
VS 118 |
VS 119 |
VS 120 |
VS 121 |
VS 122 |
VS 123 |
VS 124 |
VS 125 |
VS 126 |
VS 127 |
VS 128 |
U+E017x | VS 129 |
VS 130 |
VS 131 |
VS 132 |
VS 133 |
VS 134 |
VS 135 |
VS 136 |
VS 137 |
VS 138 |
VS 139 |
VS 140 |
VS 141 |
VS 142 |
VS 143 |
VS 144 |
U+E018x | VS 145 |
VS 146 |
VS 147 |
VS 148 |
VS 149 |
VS 150 |
VS 151 |
VS 152 |
VS 153 |
VS 154 |
VS 155 |
VS 156 |
VS 157 |
VS 158 |
VS 159 |
VS 160 |
U+E019x | VS 161 |
VS 162 |
VS 163 |
VS 164 |
VS 165 |
VS 166 |
VS 167 |
VS 168 |
VS 169 |
VS 170 |
VS 171 |
VS 172 |
VS 173 |
VS 174 |
VS 175 |
VS 176 |
U+E01Ax | VS 177 |
VS 178 |
VS 179 |
VS 180 |
VS 181 |
VS 182 |
VS 183 |
VS 184 |
VS 185 |
VS 186 |
VS 187 |
VS 188 |
VS 189 |
VS 190 |
VS 191 |
VS 192 |
U+E01Bx | VS 193 |
VS 194 |
VS 195 |
VS 196 |
VS 197 |
VS 198 |
VS 199 |
VS 200 |
VS 201 |
VS 202 |
VS 203 |
VS 204 |
VS 205 |
VS 206 |
VS 207 |
VS 208 |
U+E01Cx | VS 209 |
VS 210 |
VS 211 |
VS 212 |
VS 213 |
VS 214 |
VS 215 |
VS 216 |
VS 217 |
VS 218 |
VS 219 |
VS 220 |
VS 221 |
VS 222 |
VS 223 |
VS 224 |
U+E01Dx | VS 225 |
VS 226 |
VS 227 |
VS 228 |
VS 229 |
VS 230 |
VS 231 |
VS 232 |
VS 233 |
VS 234 |
VS 235 |
VS 236 |
VS 237 |
VS 238 |
VS 239 |
VS 240 |
U+E01Ex | VS 241 |
VS 242 |
VS 243 |
VS 244 |
VS 245 |
VS 246 |
VS 247 |
VS 248 |
VS 249 |
VS 250 |
VS 251 |
VS 252 |
VS 253 |
VS 254 |
VS 255 |
VS 256 |
Notes
|
Mongolian free variation selectors (FVS)
The Mongolian Unicode block contains its own variation selectors (listed as format controls) for use with the traditional Mongolian alphabet:[7]
- U+180B Mongolian free variation selector one (FVS1)
- U+180C Mongolian free variation selector two (FVS2)
- U+180D Mongolian free variation selector three (FVS3)
Additional variations may be also available for traditional Mongolian script characters according to the context of the character, or by using a zero-width joiner (ZWJ, U+200D) and/or a zero width non-joiner (ZWNJ, U+200C) to select the specific form. The block also contains format control named "Mongolian vowel separator" (MVS, U+180E).
See also
References
- 1 2 3 4 5 "UCD: Standardized Variants". Unicode Consortium.
- 1 2 3 4 "UCD: Standardized Variation Sequences". Unicode Consortium.
- 1 2 3 "Ideographic Variation Database". Unicode Consortium.
- 1 2 3 "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
- ↑ http://www.microsoft.com/typography/otspec/languagetags.htm
- 1 2 "Unicode Character Database (UCD)". The Unicode Standard. Retrieved 27 November 2015.
- ↑ http://www.unicode.org/versions/Unicode7.0.0/ch13.pdf#G27882