Variant form (Unicode)

A variant form is a different glyph for a character, encoded in Unicode through the mechanism of variation sequences: sequences in Unicode which consist of a base character followed by a variation selector character.

A variant form usually has a very similar appearance and meaning as its base form. The mechanism is intended for variant forms where, generally, if the variant form is unavailable, displaying the base character does not change the meaning of the text, and may not even be noticeable by many readers.

Unicode defines two types of variation sequences:

Variation selector characters reside in several Unicode blocks:

Variation selectors are not required for Arabic and Latin cursive characters, where substitution of glyphs can occur based on context: glyphs may be connected together depending on whether the character is the initial character in a word, the final character, a medial character or an isolated character. These types of glyph substitution are easily handled by the context of the character with no other authoring input involved. Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear. Ligatures are similar instances where glyphs may be substituted simply by turning ligatures on or off as a rich text attribute.

For other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually. This is the case with character/glyphs referred to as gaiji, where different glyphs are used for the same character either historically or for ideographs for family names. This is one of the gray areas in distinguishing between a glyph and a character: If a family name differs slightly from the ideograph character it derives from, then is that a simple glyph variant or a character variant?

Character substitutions may also occur outside of Unicode, for example with OpenType Layout tags.[4]

Blocks with standardized variation sequences

As of Unicode 10.0, standardized variation sequences specifically for emoji/text presentation are defined for base characters in twenty blocks:[1]

Other standardized variation sequences are formed with base characters in the following ten blocks:[1]

Blocks with ideographic variation sequences

As of 15 August 2016, ideographic variation sequences are defined for base characters in six blocks:[2][3]

Variation Selectors block

Variation Selectors
Range U+FE00..U+FE0F
(16 code points)
Plane BMP
Scripts Inherited
Assigned 16 code points
Unused 0 reserved code points
Unicode version history
3.2 16 (+16)
Note: [5][6]

Variation Selectors is a Unicode block containing 16 Variation Selector format characters. They are used to specify a specific glyph variant for a Unicode character, such as the Japanese, Chinese, Korean, or Taiwanese form of a particular CJK ideograph.

They affect the glyph variant of the preceding character.

These combining characters are named variation selector-1 (for U+FE00) through to variation selector-16 (U+FE0F), and are abbreviated VS1 – VS16.

As of Unicode 10.0:[1]

Variation Selectors[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+FE0x  VS 
1
 VS 
2
 VS 
3
 VS 
4
 VS 
5
 VS 
6
 VS 
7
 VS 
8
 VS 
9
 VS 
10
 VS 
11
 VS 
12
 VS 
13
 VS 
14
 VS 
15
 VS 
16
Notes
1.^ As of Unicode version 10.0

Variation Selectors Supplement block

Variation Selectors Supplement
Range U+E0100..U+E01EF
(240 code points)
Plane SSP
Scripts Inherited
Assigned 240 code points
Unused 0 reserved code points
Unicode version history
4.0 240 (+240)
Note: [5][6]

Variation Selectors Supplement is a Unicode block containing additional Variation Selectors beyond those found in the Variation Selectors block.

These combining characters are named variation selector-17 (for U+E0100) through to variation selector-256 (U+E01EF), abbreviated VS17 – VS256.

As of 15 August 2016, VS17 (U+E0100) to VS48 (U+E011F) are used in ideographic variation sequences in the Unicode Ideographic Variation Database (IVD).[2][3] However, as of Unicode 10.0, they are not found in any standardized variation sequence.

Variation Selectors Supplement[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+E010x  VS 
17
 VS 
18
 VS 
19
 VS 
20
 VS 
21
 VS 
22
 VS 
23
 VS 
24
 VS 
25
 VS 
26
 VS 
27
 VS 
28
 VS 
29
 VS 
30
 VS 
31
 VS 
32
U+E011x  VS 
33
 VS 
34
 VS 
35
 VS 
36
 VS 
37
 VS 
38
 VS 
39
 VS 
40
 VS 
41
 VS 
42
 VS 
43
 VS 
44
 VS 
45
 VS 
46
 VS 
47
 VS 
48
U+E012x  VS 
49
 VS 
50
 VS 
51
 VS 
52
 VS 
53
 VS 
54
 VS 
55
 VS 
56
 VS 
57
 VS 
58
 VS 
59
 VS 
60
 VS 
61
 VS 
62
 VS 
63
 VS 
64
U+E013x  VS 
65
 VS 
66
 VS 
67
 VS 
68
 VS 
69
 VS 
70
 VS 
71
 VS 
72
 VS 
73
 VS 
74
 VS 
75
 VS 
76
 VS 
77
 VS 
78
 VS 
79
 VS 
80
U+E014x  VS 
81
 VS 
82
 VS 
83
 VS 
84
 VS 
85
 VS 
86
 VS 
87
 VS 
88
 VS 
89
 VS 
90
 VS 
91
 VS 
92
 VS 
93
 VS 
94
 VS 
95
 VS 
96
U+E015x  VS 
97
 VS 
98
 VS 
99
 VS 
100
 VS 
101
 VS 
102
 VS 
103
 VS 
104
 VS 
105
 VS 
106
 VS 
107
 VS 
108
 VS 
109
 VS 
110
 VS 
111
 VS 
112
U+E016x  VS 
113
 VS 
114
 VS 
115
 VS 
116
 VS 
117
 VS 
118
 VS 
119
 VS 
120
 VS 
121
 VS 
122
 VS 
123
 VS 
124
 VS 
125
 VS 
126
 VS 
127
 VS 
128
U+E017x  VS 
129
 VS 
130
 VS 
131
 VS 
132
 VS 
133
 VS 
134
 VS 
135
 VS 
136
 VS 
137
 VS 
138
 VS 
139
 VS 
140
 VS 
141
 VS 
142
 VS 
143
 VS 
144
U+E018x  VS 
145
 VS 
146
 VS 
147
 VS 
148
 VS 
149
 VS 
150
 VS 
151
 VS 
152
 VS 
153
 VS 
154
 VS 
155
 VS 
156
 VS 
157
 VS 
158
 VS 
159
 VS 
160
U+E019x  VS 
161
 VS 
162
 VS 
163
 VS 
164
 VS 
165
 VS 
166
 VS 
167
 VS 
168
 VS 
169
 VS 
170
 VS 
171
 VS 
172
 VS 
173
 VS 
174
 VS 
175
 VS 
176
U+E01Ax  VS 
177
 VS 
178
 VS 
179
 VS 
180
 VS 
181
 VS 
182
 VS 
183
 VS 
184
 VS 
185
 VS 
186
 VS 
187
 VS 
188
 VS 
189
 VS 
190
 VS 
191
 VS 
192
U+E01Bx  VS 
193
 VS 
194
 VS 
195
 VS 
196
 VS 
197
 VS 
198
 VS 
199
 VS 
200
 VS 
201
 VS 
202
 VS 
203
 VS 
204
 VS 
205
 VS 
206
 VS 
207
 VS 
208
U+E01Cx  VS 
209
 VS 
210
 VS 
211
 VS 
212
 VS 
213
 VS 
214
 VS 
215
 VS 
216
 VS 
217
 VS 
218
 VS 
219
 VS 
220
 VS 
221
 VS 
222
 VS 
223
 VS 
224
U+E01Dx  VS 
225
 VS 
226
 VS 
227
 VS 
228
 VS 
229
 VS 
230
 VS 
231
 VS 
232
 VS 
233
 VS 
234
 VS 
235
 VS 
236
 VS 
237
 VS 
238
 VS 
239
 VS 
240
U+E01Ex  VS 
241
 VS 
242
 VS 
243
 VS 
244
 VS 
245
 VS 
246
 VS 
247
 VS 
248
 VS 
249
 VS 
250
 VS 
251
 VS 
252
 VS 
253
 VS 
254
 VS 
255
 VS 
256
Notes
1.^ As of Unicode version 10.0

Mongolian free variation selectors (FVS)

The Mongolian Unicode block contains its own variation selectors (listed as format controls) for use with the traditional Mongolian alphabet:[7]

Additional variations may be also available for traditional Mongolian script characters according to the context of the character, or by using a zero-width joiner (ZWJ, U+200D) and/or a zero width non-joiner (ZWNJ, U+200C) to select the specific form. The block also contains format control named "Mongolian vowel separator" (MVS, U+180E).

See also

References

  1. 1 2 3 4 "UCD: Standardized Variation Sequences". Unicode Consortium.
  2. 1 2 3 "Ideographic Variation Database". Unicode Consortium.
  3. 1 2 3 "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
  4. "Language system tags". Microsoft.
  5. 1 2 "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  6. 1 2 "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  7. "Free Variation Selectors" (PDF). www.unicode.org.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.