KOI8-RU

KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case.

In IBM, KOI8-U is assigned code page 1167.[1][2]

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-RU becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.

Character set

The following table shows the KOI8-RU encoding.[1] Each character is shown with its equivalent Unicode code point and its decimal code point.

KOI8-RU
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
 
0_
 
 
1_
 
 
2_
 
SP
0020
32
!
0021
33
"
0022
34
#
0023
35
$
0024
36
%
0025
37
&
0026
38
'
0027
39
(
0028
40
)
0029
41
*
002A
42
+
002B
43
,
002C
44
-
002D
45
.
002E
46
/
002F
47
 
3_
 
0
0030
48
1
0031
49
2
0032
50
3
0033
51
4
0034
52
5
0035
53
6
0036
54
7
0037
55
8
0038
56
9
0039
57
:
003A
58
;
003B
59
<
003C
60
=
003D
61
>
003E
62
?
003F
63
 
4_
 
@
0040
64
A
0041
65
B
0042
66
C
0043
67
D
0044
68
E
0045
69
F
0046
70
G
0047
71
H
0048
72
I
0049
73
J
004A
74
K
004B
75
L
004C
76
M
004D
77
N
004E
78
O
004F
79
 
5_
 
P
0050
80
Q
0051
81
R
0052
82
S
0053
83
T
0054
84
U
0055
85
V
0056
86
W
0057
87
X
0058
88
Y
0059
89
Z
005A
90
[
005B
91
\
005C
92
]
005D
93
^
005E
94
_
005F
95
 
6_
 
`
0060
96
a
0061
97
b
0062
98
c
0063
99
d
0064
100
e
0065
101
f
0066
102
g
0067
103
h
0068
104
i
0069
105
j
006A
106
k
006B
107
l
006C
108
m
006D
109
n
006E
110
o
006F
111
 
7_
 
p
0070
112
q
0071
113
r
0072
114
s
0073
115
t
0074
116
u
0075
117
v
0076
118
w
0077
119
x
0078
120
y
0079
121
z
007A
122
{
007B
123
|
007C
124
}
007D
125
~
007E
126
 
8_
 

2500
128

2502
129

250C
130

2510
131

2514
132

2518
133

251C
134

2524
135

252C
136

2534
137

253C
138

2580
139

2584
140

2588
141

258C
142

2590
143
 
9_
 

2591
144

2592
145

2593
146

201C
147

25A0
148

2219
149

201D
150

2014
151

2116
152

2122
153
NBSP
00A0
154
»
00BB
155
®
00AE
156
«
00AB
157
·
00B7
158
¤
00A4
159
 
A_
 

2550
160

2551
161

2552
162
ё
0451
163
є
0454
164

2554
165
і
0456
166
ї
0457
167

2557
168

2558
169

2559
170

255A
171

255B
172
ґ
0491
173
ў
045D
174

255E
175
 
B_
 

255F
176

2560
177

2561
178
Ё
0401
179
Є
0404
180

2563
181
І
0406
182
Ї
0407
183

2566
184

2567
185

2568
186

2569
187

256A
188
Ґ
0490
189
Ў
040D
190
©
00A9
191
 
C_
 
ю
044E
192
а
0430
193
б
0431
194
ц
0446
195
д
0434
196
е
0435
197
ф
0444
198
г
0433
199
х
0445
200
и
0438
201
й
0439
202
к
043A
203
л
043B
204
м
043C
205
н
043D
206
о
043E
207
 
D_
 
п
043F
208
я
044F
209
р
0440
210
с
0441
211
т
0442
212
у
0443
213
ж
0436
214
в
0432
215
ь
044C
216
ы
044B
217
з
0437
218
ш
0448
219
э
044D
220
щ
0449
221
ч
0447
222
ъ
044A
223
 
E_
 
Ю
042E
224
А
0410
225
Б
0411
226
Ц
0426
227
Д
0414
228
Е
0415
229
Ф
0424
230
Г
0413
231
Х
0425
232
И
0418
233
Й
0419
234
К
041A
235
Л
041B
236
М
041C
237
Н
041D
238
О
041E
239
 
F_
 
П
041F
240
Я
042F
241
Р
0420
242
С
0421
243
Т
0422
244
У
0423
245
Ж
0416
246
В
0412
247
Ь
042C
248
Ы
042B
249
З
0417
250
Ш
0428
251
Э
042D
252
Щ
0429
253
Ч
0427
254
Ъ
042A
255
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

In the table above, 0x20 is the regular SPACE character, and 0x9A is the NO-BREAK SPACE.

The difference with KOI8-R consists of the positions 0xA4; 0xA6; 0xA7; 0xAD; and 0xB4; 0xB6; 0xB7; 0xBD; which consist of extra letters that don't exist in Russian.

Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

References

  1. 1 2 "SBCS code page information - CPGID: 01167 / Name: Ukrainian KOI8-U". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
  2. "CCSID information document; CCSID 1167; KOI8-U". IBM. Archived from the original on 2017-02-18. Retrieved 2017-02-18.


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.