ISO/IEC 6937
From Wikipedia, the free encyclopedia
ISO/IEC 6937 is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. Note, however, that no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space.
ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, and Luek Zeckondorf.
ISO6937/2 defines all 327 characters found in modern European languages. Unfortunately non-European characters, such as Cyrillic and Greek are not included in the standard.
Single byte characters.
The primary set of ISO6937/2 is based on ISO646 (characters 0x00..0x7f) with the exception of character 0x42 ($) which is denoted as a "general currency sign" (¤):
!"#¤%&`()*+'-./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` abcdefghijklmnopqrstuvwxyz{|}
The supplementary set (characters 0x80..0xff) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.
Two byte characters.
The characters which are not represented in the primary set are coded on two bytes. The first byte the "non spacing diacritical mark" is followed by a letter from the base set e.g.:
small e with acute accent (é) = [Acute]+e
In total 13 diacritical marks can be followed by the selected characters from the primary set:
Accent | Code | Second character | Result |
---|---|---|---|
Grave | 0xC1 | AEIOUaeiou | ÀÈÌÒÙàèìòù |
Acute | 0xC2 | ACEILNORSUYZaceilnorsuyz | ÁĆÉÍĹŃÓŔŚÚÝŹáćéíĺńóŕśúýź |
Circumflex | 0xC3 | ACEGHIJOSUWYaceghijosuwy | ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ |
Tilde | 0xC4 | AINOUainou | ÃĨÑÕŨãĩñõũ |
Macron | 0xC5 | AEIOUaeiou | ĀĒĪŌŪāēīōū |
Breve | 0xC6 | AGUagu | ĂĞŬăğŭ |
Dot | 0xC7 | CEGIZcegiz | ĊĖĠİŻċėġıż |
Umlaut | 0xC8 | AEIOUYaeiouy | ÄËÏÖÜŸäëïöüÿ |
Ring | 0xCA | AUau | ÅŮåů |
Cedilla | 0xCB | CGKLNRSTcgklnrst | ÇĢĶĻŅŖŞŢçģķļņŗşţ |
DoubleAcute | 0xCD | OUou | ŐŰőű |
Ogonek | 0xCE | AEIUaeiu | ĄĘĮŲąęįų |
Caron | 0xCF | CDELNRSTZcdelnrstz | ČĎĚĽŇŘŠŤŽčďěľňřšťž |