Western Latin character sets (computing)
From Wikipedia, the free encyclopedia
Several binary representations of the character sets used for Western European languages in computers are compared in this article. This includes the Romance languages and Germanic languages, which use the Latin alphabet with a limited set of specialized letters and diacritics.
Contents |
[edit] Summary
- The two ISO-8859 encodings have a section of rarely used control codes from 0x80–0x9F.
- In terms of printable characters Windows-1252 has everything ISO-8859-1 and ISO-8859-15 have and more.
- IBM437, being intended for English only, has very little in the way of accented letters but has far more graphics characters than the others and also some Greek characters that are useful as technical symbols.
- IBM850 has all the printable characters that ISO-8859-1 has (albeit arranged differently) and still manages to have enough graphics characters to build a decent text-mode user interface.
- The Mac OS Roman character set (often referred to as MacRoman and known by the IANA as simply MACINTOSH) has most, but not all, of the same characters as ISO-8859-1 but in a very different arrangement, and it also adds many technical and mathematical characters and more diacritics. Older Macintosh web browsers were known to munge the few characters that were in ISO-8859-1 but not their native Macintosh character set when editing text from websites. Conversely, in web material prepared on an older Macintosh, many characters displayed incorrectly when read with other operating systems.
[edit] Notes
- The mappings for the IBM code pages are from the Unicode site supplied by Microsoft. Refer to the Unicode Consortium's document on the differences between IBM's and Microsoft's mappings for these code pages.
- The old PC code pages actually defined printable characters for the control code ranges. While these could not be used when printing text through DOS, as they would be trapped before reaching the screen, they could be used by applications that used screen memory directly.
- Position F0HEX was used in the Macintosh character sets for the Apple logo. The Apple logo was not accepted into Unicode due to its trademarked nature, and so Apple mapped it to a code point (U+F8FF) in the private use area. Therefore it may not display correctly in the table.
- In Windows-1252, positions 81, 8D, 8F, 90, and 9D are unused according to the mapping tables on the Unicode site. However the conversion routines in Windows seem to convert them to the C1 control codes that are at those positions in ISO-8859-1.
[edit] History
The earlier seven-bit U.S. ASCII encoding has characters sufficient to properly represent only English, Latin, and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most U.S.-supplied computer platforms, ASCII was unavoidable in most of the non-English-speaking world (seven-bit encoding was necessitated by the limitations of early computing networks). There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages.
Although seven-bit communication was the norm, most computers internally used eight-bit bytes, and they mostly put some form of characters in the 128 higher byte positions. In the early days most of these were system specific, but gradually a few standards were settled on.
In recent years, as storage and memory costs fall, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to Unicode as their main internal representation. However at least on Windows many applications continue to use the non-Unicode versions of the API calls.
[edit] The euro sign
The coming of the euro introduced significant pressure to support the euro sign (€), and most character sets had to be adapted in some way.
- MacRoman simply replaced the generic currency sign (¤). This caused significant difficulty because organisations had found other uses for it, such as the company logo.
- ISO introduced ISO 8859-15, which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics.
- Windows-1252 simply placed the euro sign in a gap (position 80hex) in the existing C1 control codes.
[edit] Comparison table
Headers in this table are repeated after every 16 rows so they remain visible as the page is scrolled. Code points U+0000 to U+007F are not shown in this table currently, as they are directly mapped in all character sets listed here.
The table is arranged by Unicode code point. Character sets are referred to here by their IANA names in upper case.
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
---|---|---|---|---|---|---|---|
PAD | U+0080 | 80 | 80 | ||||
HOP | U+0081 | 81 | 81 | ||||
BPH | U+0082 | 82 | 82 | ||||
NBH | U+0083 | 83 | 83 | ||||
IND | U+0084 | 84 | 84 | ||||
NEL | U+0085 | 85 | 85 | ||||
SSA | U+0086 | 86 | 86 | ||||
ESA | U+0087 | 87 | 87 | ||||
HTS | U+0088 | 88 | 88 | ||||
HTJ | U+0089 | 89 | 89 | ||||
VTS | U+008A | 8A | 8A | ||||
PLD | U+008B | 8B | 8B | ||||
PLU | U+008C | 8C | 8C | ||||
RI | U+008D | 8D | 8D | ||||
SS2 | U+008E | 8E | 8E | ||||
SS3 | U+008F | 8F | 8F | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
DCS | U+0090 | 90 | 90 | ||||
PU1 | U+0091 | 91 | 91 | ||||
PU2 | U+0092 | 92 | 92 | ||||
STS | U+0093 | 93 | 93 | ||||
CCH | U+0094 | 94 | 94 | ||||
MW | U+0095 | 95 | 95 | ||||
SPA | U+0096 | 96 | 96 | ||||
EPA | U+0097 | 97 | 97 | ||||
SOS | U+0098 | 98 | 98 | ||||
SGCI | U+0099 | 99 | 99 | ||||
SCI | U+009A | 9A | 9A | ||||
CSI | U+009B | 9B | 9B | ||||
ST | U+009C | 9C | 9C | ||||
OSC | U+009D | 9D | 9D | ||||
PM | U+009E | 9E | 9E | ||||
APC | U+009F | 9F | 9F | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
NBSP | U+00A0 | A0 | A0 | A0 | FF | FF | CA |
¡ | U+00A1 | A1 | A1 | A1 | AD | AD | C1 |
¢ | U+00A2 | A2 | A2 | A2 | 9B | BD | A2 |
£ | U+00A3 | A3 | A3 | A3 | 9C | 9C | A3 |
¤ | U+00A4 | A4 | A4 | CF | |||
¥ | U+00A5 | A5 | A5 | A5 | 9D | BE | B4 |
¦ | U+00A6 | A6 | A6 | DD | |||
§ | U+00A7 | A7 | A7 | A7 | F5 | A4 | |
¨ | U+00A8 | A8 | A8 | F9 | AC | ||
© | U+00A9 | A9 | A9 | A9 | B8 | A9 | |
ª | U+00AA | AA | AA | AA | A6 | A6 | BB |
« | U+00AB | AB | AB | AB | AE | AE | C7 |
¬ | U+00AC | AC | AC | AC | AA | AA | C2 |
SHY | U+00AD | AD | AD | AD | F0 | ||
® | U+00AE | AE | AE | AE | A9 | A8 | |
¯ | U+00AF | AF | AF | AF | EE | F8 | |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
° | U+00B0 | B0 | B0 | B0 | F8 | F8 | A1 |
± | U+00B1 | B1 | B1 | B1 | F1 | F1 | B1 |
² | U+00B2 | B2 | B2 | B2 | FD | FD | |
³ | U+00B3 | B3 | B3 | B3 | FC | ||
´ | U+00B4 | B4 | B4 | EF | AB | ||
µ | U+00B5 | B5 | B5 | B5 | E6 | E6 | B5 |
¶ | U+00B6 | B6 | B6 | B6 | F4 | A6 | |
· | U+00B7 | B7 | B7 | B7 | FA | FA | E1 |
¸ | U+00B8 | B8 | B8 | F7 | FC | ||
¹ | U+00B9 | B9 | B9 | B9 | FB | ||
º | U+00BA | BA | BA | BA | A7 | A7 | BC |
» | U+00BB | BB | BB | BB | AF | AF | C8 |
¼ | U+00BC | BC | BC | AC | AC | ||
½ | U+00BD | BD | BD | AB | AB | ||
¾ | U+00BE | BE | BE | F3 | |||
¿ | U+00BF | BF | BF | BF | A8 | A8 | C0 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
À | U+00C0 | C0 | C0 | C0 | B7 | CB | |
Á | U+00C1 | C1 | C1 | C1 | B5 | E7 | |
 | U+00C2 | C2 | C2 | C2 | B6 | E5 | |
à | U+00C3 | C3 | C3 | C3 | C7 | CC | |
Ä | U+00C4 | C4 | C4 | C4 | 8E | 8E | 80 |
Å | U+00C5 | C5 | C5 | C5 | 8F | 8F | 81 |
Æ | U+00C6 | C6 | C6 | C6 | 92 | 92 | AE |
Ç | U+00C7 | C7 | C7 | C7 | 80 | 80 | 82 |
È | U+00C8 | C8 | C8 | C8 | D4 | E9 | |
É | U+00C9 | C9 | C9 | C9 | 90 | 90 | 83 |
Ê | U+00CA | CA | CA | CA | D2 | E6 | |
Ë | U+00CB | CB | CB | CB | D3 | E8 | |
Ì | U+00CC | CC | CC | CC | DE | ED | |
Í | U+00CD | CD | CD | CD | D6 | EA | |
Î | U+00CE | CE | CE | CE | D7 | EB | |
Ï | U+00CF | CF | CF | CF | D8 | EC | |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
Ð | U+00D0 | D0 | D0 | D0 | D1 | ||
Ñ | U+00D1 | D1 | D1 | D1 | A5 | A5 | 84 |
Ò | U+00D2 | D2 | D2 | D2 | E3 | F1 | |
Ó | U+00D3 | D3 | D3 | D3 | E0 | EE | |
Ô | U+00D4 | D4 | D4 | D4 | E2 | EF | |
Õ | U+00D5 | D5 | D5 | D5 | E5 | CD | |
Ö | U+00D6 | D6 | D6 | D6 | 99 | 99 | 85 |
× | U+00D7 | D7 | D7 | D7 | 9E | ||
Ø | U+00D8 | D8 | D8 | D8 | 9D | AF | |
Ù | U+00D9 | D9 | D9 | D9 | EB | F4 | |
Ú | U+00DA | DA | DA | DA | E9 | F2 | |
Û | U+00DB | DB | DB | DB | EA | F3 | |
Ü | U+00DC | DC | DC | DC | 9A | 9A | 86 |
Ý | U+00DD | DD | DD | DD | ED | ||
Þ | U+00DE | DE | DE | DE | E8 | ||
ß | U+00DF | DF | DF | DF | E1 | E1 | A7 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
à | U+00E0 | E0 | E0 | E0 | 85 | 85 | 88 |
á | U+00E1 | E1 | E1 | E1 | A0 | A0 | 87 |
â | U+00E2 | E2 | E2 | E2 | 83 | 83 | 89 |
ã | U+00E3 | E3 | E3 | E3 | C6 | 8B | |
ä | U+00E4 | E4 | E4 | E4 | 84 | 84 | 8A |
å | U+00E5 | E5 | E5 | E5 | 86 | 86 | 8C |
æ | U+00E6 | E6 | E6 | E6 | 91 | 91 | BE |
ç | U+00E7 | E7 | E7 | E7 | 87 | 87 | 8D |
è | U+00E8 | E8 | E8 | E8 | 8A | 8A | 8F |
é | U+00E9 | E9 | E9 | E9 | 82 | 82 | 8E |
ê | U+00EA | EA | EA | EA | 88 | 88 | 90 |
ë | U+00EB | EB | EB | EB | 89 | 89 | 91 |
ì | U+00EC | EC | EC | EC | 8D | 8D | 93 |
í | U+00ED | ED | ED | ED | A1 | A1 | 92 |
î | U+00EE | EE | EE | EE | 8C | 8C | 94 |
ï | U+00EF | EF | EF | EF | 8B | 8B | 95 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
ð | U+00F0 | F0 | F0 | F0 | D0 | ||
ñ | U+00F1 | F1 | F1 | F1 | A4 | A4 | 96 |
ò | U+00F2 | F2 | F2 | F2 | 95 | 95 | 98 |
ó | U+00F3 | F3 | F3 | F3 | A2 | A2 | 97 |
ô | U+00F4 | F4 | F4 | F4 | 93 | 93 | 99 |
õ | U+00F5 | F5 | F5 | F5 | E4 | 9B | |
ö | U+00F6 | F6 | F6 | F6 | 94 | 94 | 9A |
÷ | U+00F7 | F7 | F7 | F7 | F6 | F6 | D6 |
ø | U+00F8 | F8 | F8 | F8 | 9B | BF | |
ù | U+00F9 | F9 | F9 | F9 | 97 | 97 | 9D |
ú | U+00FA | FA | FA | FA | A3 | A3 | 9C |
û | U+00FB | FB | FB | FB | 96 | 96 | 9E |
ü | U+00FC | FC | FC | FC | 81 | 81 | 9F |
ý | U+00FD | FD | FD | FD | EC | ||
þ | U+00FE | FE | FE | FE | E7 | ||
ÿ | U+00FF | FF | FF | FF | 98 | 98 | D8 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
ı | U+0131 | D5 | F5 | ||||
Œ | U+0152 | BC | 8C | CE | |||
œ | U+0153 | BD | 9C | CF | |||
Š | U+0160 | A6 | 8A | ||||
š | U+0161 | A8 | 9A | ||||
Ÿ | U+0178 | BE | 9F | D9 | |||
Ž | U+017D | B4 | 8E | ||||
ž | U+017E | B8 | 9E | ||||
ƒ | U+0192 | 83 | 9F | 9F | C4 | ||
ˆ | U+02C6 | 88 | F6 | ||||
ˇ | U+02C7 | FF | |||||
˘ | U+02D8 | F9 | |||||
˙ | U+02D9 | FA | |||||
˚ | U+02DA | FB | |||||
˛ | U+02DB | FE | |||||
˜ | U+02DC | 98 | F7 | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
˝ | U+02DD | FD | |||||
Γ | U+0393 | E2 | |||||
Θ | U+0398 | E9 | |||||
Σ | U+03A3 | E4 | |||||
Φ | U+03A6 | E8 | |||||
Ω | U+03A9 | EA | BD | ||||
α | U+03B1 | E0 | |||||
δ | U+03B4 | EB | |||||
ε | U+03B5 | EE | |||||
π | U+03C0 | E3 | B9 | ||||
σ | U+03C3 | E5 | |||||
τ | U+03C4 | E7 | |||||
φ | U+03C6 | ED | |||||
– | U+2013 | 96 | D0 | ||||
— | U+2014 | 97 | D1 | ||||
‗ | U+2017 | F2 | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
‘ | U+2018 | 91 | D4 | ||||
’ | U+2019 | 92 | D5 | ||||
‚ | U+201A | 82 | E2 | ||||
“ | U+201C | 93 | D2 | ||||
” | U+201D | 94 | D3 | ||||
„ | U+201E | 84 | E3 | ||||
† | U+2020 | 86 | A0 | ||||
‡ | U+2021 | 87 | E0 | ||||
• | U+2022 | 95 | A5 | ||||
… | U+2026 | 85 | C9 | ||||
‰ | U+2030 | 89 | E4 | ||||
‹ | U+2039 | 8B | DC | ||||
› | U+203A | 9B | DD | ||||
⁄ | U+2044 | DA | |||||
ⁿ | U+207F | FC | |||||
₧ | U+20A7 | 9E | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
€ | U+20AC | A4 | 80 | DB | |||
™ | U+2122 | 99 | AA | ||||
∂ | U+2202 | B6 | |||||
∆ | U+2206 | C6 | |||||
∏ | U+220F | B8 | |||||
∑ | U+2211 | B7 | |||||
∙ | U+2219 | F9 | |||||
√ | U+221A | FB | C3 | ||||
∞ | U+221E | EC | B0 | ||||
∩ | U+2229 | EF | |||||
∫ | U+222B | BA | |||||
≈ | U+2248 | F7 | C5 | ||||
≠ | U+2260 | AD | |||||
≡ | U+2261 | F0 | |||||
≤ | U+2264 | F3 | B2 | ||||
≥ | U+2265 | F2 | B3 | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
⌐ | U+2310 | A9 | |||||
⌠ | U+2320 | F4 | |||||
⌡ | U+2321 | F5 | |||||
─ | U+2500 | C4 | C4 | ||||
│ | U+2502 | B3 | B3 | ||||
┌ | U+250C | DA | DA | ||||
┐ | U+2510 | BF | BF | ||||
└ | U+2514 | C0 | C0 | ||||
┘ | U+2518 | D9 | D9 | ||||
├ | U+251C | C3 | C3 | ||||
┤ | U+2524 | B4 | B4 | ||||
┬ | U+252C | C2 | C2 | ||||
┴ | U+2534 | C1 | C1 | ||||
┼ | U+253C | C5 | C5 | ||||
═ | U+2550 | CD | CD | ||||
║ | U+2551 | BA | BA | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
╒ | U+2552 | D5 | |||||
╓ | U+2553 | D6 | |||||
╔ | U+2554 | C9 | C9 | ||||
╕ | U+2555 | B8 | |||||
╖ | U+2556 | B7 | |||||
╗ | U+2557 | BB | BB | ||||
╘ | U+2558 | D4 | |||||
╙ | U+2559 | D3 | |||||
╚ | U+255A | C8 | C8 | ||||
╛ | U+255B | BE | |||||
╜ | U+255C | BD | |||||
╝ | U+255D | BC | BC | ||||
╞ | U+255E | C6 | |||||
╟ | U+255F | C7 | |||||
╠ | U+2560 | CC | CC | ||||
╡ | U+2561 | B5 | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
╢ | U+2562 | B6 | |||||
╣ | U+2563 | B9 | B9 | ||||
╤ | U+2564 | D1 | |||||
╥ | U+2565 | D2 | |||||
╦ | U+2566 | CB | CB | ||||
╧ | U+2567 | CF | |||||
╨ | U+2568 | D0 | |||||
╩ | U+2569 | CA | CA | ||||
╪ | U+256A | D8 | |||||
╫ | U+256B | D7 | |||||
╬ | U+256C | CE | CE | ||||
▀ | U+2580 | DF | DF | ||||
▄ | U+2584 | DC | DC | ||||
█ | U+2588 | DB | DB | ||||
▌ | U+258C | DD | |||||
▐ | U+2590 | DE | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
░ | U+2591 | B0 | B0 | ||||
▒ | U+2592 | B1 | B1 | ||||
▓ | U+2593 | B2 | B2 | ||||
■ | U+25A0 | FE | FE | ||||
◊ | U+25CA | D7 | |||||
| U+F8FF | F0 | |||||
fi | U+FB01 | DE | |||||
fl | U+FB02 | DF |