ArmSCII
From Wikipedia, the free encyclopedia
ARMSCII or ArmSCII is the acronym of the Armenian Standard Code for Information Interchange. It refers to several single-byte character encodings defined by Armenian national standard 166-97.
However these encodings are not widely used because the standard was published one year after the publication of international standard ISO 10585 that defined another 7-bit encoding, from which the encoding and mapping to the UCS (Univeral Coded Character Set, as defined in the international ISO 10646 and Unicode 1.1 standards) were also derived, and there was a lack of support in the computer industry for adding ArmSCII.
Contents |
[edit] The encodings defined in the ArmSCII standard
Very few systems support these encodings. Windows does not support them for example. It is usually better to use Unicode for proper interchange of Armenian text for web browsers and email, since most modern computers do not support ARMSCII by default.
The following three main variants are defined:
- ArmSCII-7 defined in AST 34.005 is an 7-bit encoding, not containing latin characters.
- ArmSCII-8 defined in AST 34.002 is an 8 bit encoding and a superset of ASCII.
- ArmSCII-8A defined in AST 34.002 is an alternate 8 bit encoding and also a superset of ASCII.
Note that each ArmSCII encoding also has several minor variants, depending on the revision of the related Armenian standard (which was not made official before 1997, and was defined informally before that; this has caused various confusions and the mappings described below are just best practices according to the latest 1997 revision of the Armenian standard), that may change the exact mapping and usage of a few punctuation characters and symbols.
None of the ArmSCII encodings have reached international approval (unlike the ISO 10585 standard, despite of the critics sent by the official Armenian standard body to ISO/DIS JTC 1/SC 2/WG 2, working on single byte coded character sets) because all international efforts have been made since then to work with the UCS (in Unicode and ISO 10646).
ArmSCII-8 is intended for use on Unix and Windows systems, and for information interchange on the WWW and by email. However Microsoft wanted users to use Unicode and not introduce a pleathora of new code pages, so it is not supported natively on Windows. It just consists in remapping ArmSCII-7 in the higher range above the standard US ASCII range.
ArmSCII-8A is intended for use on DOS and Mac systems. It is a rearrangement of ArmSCII-8, to work with existing DOS and Mac code that reserve a range of code values for characters not intended for text but for presentation layout, using modified fonts; it is however considered as a "hack" of the code pages over which it is applied, as neither DOS (or Windows in the "OEM" compatibility code page used by the text-only console) nor MacOS has ever supported this encoding natively, notably in their filesystem (but this is also true for the now deprecated ISO 10585 standard). However, this encoding cannot map all the punctuation characters normally needed for Armenian, so the missing characters must be approximated using fallbacks to ASCII punctuation (some Armenian fonts may display these ASCII punctuation using the rendering intended for the Armenian characters that are mapped to them by these fallbacks).
[edit] ArmSCII-7
AST 34.005:1997 (ArmSCII-7) 7-bit coded character set for Armenian. | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
0x | unused | |||||||||||||||
1x | ||||||||||||||||
2x | SP | § | ։ | ) | ( | » | « | ― | · | ՝ | , | ‐ | ֊ | … | ՜ | |
3x | ՛ | ՞ | Ա | ա | Բ | բ | Գ | գ | Դ | դ | Ե | ե | Զ | զ | Է | է |
4x | Ը | ը | Թ | թ | Ժ | ժ | Ի | ի | Լ | լ | Խ | խ | Ծ | ծ | Կ | կ |
5x | Հ | հ | Ձ | ձ | Ղ | ղ | Ճ | ճ | Մ | մ | Յ | յ | Ն | ն | Շ | շ |
6x | Ո | ո | Չ | չ | Պ | պ | Ջ | ջ | Ռ | ռ | Ս | ս | Վ | վ | Տ | տ |
7x | Ր | ր | Ց | ց | Ւ | ւ | Փ | փ | Ք | ք | Օ | օ | Ֆ | ֆ | ՚ |
In the table on the left, code value 21 is the eternity sign, which has no designated codepoint in Unicode. Some mappings incorrectly claim that it has a codepoint of U+0530. This is incorrect, as that codepoint has not been allocated.
Code value 20 is the regular SPACE character, code values 00–1F and 7F are not assigned to characters by AST 34.005, though they may be the same as the ASCII control characters that are located in those positions.
Code value 22 was initially used to encode the Armenian ligature ew (և), but later replaced by the section sign (§). It is strongly suggested to encode this ligature with the normal Armenian ech (yech) and yiwn (vyun) small letters pair as various softwares or fonts will render it differently depending on the version of ArmsCII-7 they are assuming, and let the renderer generate the ligature.
Code value 7F may be used sometimes as a substitution for the non-breaking space.
This table is simply remapped to higher codes by simple offset in ArmSCII-8 (below).
[edit] ArmSCII-8
AST 34.002:1997 (ArmSCII-8) 8-bit coded character set for Armenian. | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
0x | unused | |||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | unused | |||||||||||||||
9x | ||||||||||||||||
Ax | NB SP |
§ | ։ | ) | ( | » | « | ― | · | ՝ | , | ‐ | ֊ | … | ՜ | |
Bx | ՛ | ՞ | Ա | ա | Բ | բ | Գ | գ | Դ | դ | Ե | ե | Զ | զ | Է | է |
Cx | Ը | ը | Թ | թ | Ժ | ժ | Ի | ի | Լ | լ | Խ | խ | Ծ | ծ | Կ | կ |
Dx | Հ | հ | Ձ | ձ | Ղ | ղ | Ճ | ճ | Մ | մ | Յ | յ | Ն | ն | Շ | շ |
Ex | Ո | ո | Չ | չ | Պ | պ | Ջ | ջ | Ռ | ռ | Ս | ս | Վ | վ | Տ | տ |
Fx | Ր | ր | Ց | ց | Ւ | ւ | Փ | փ | Ք | ք | Օ | օ | Ֆ | ֆ | ՚ |
In the table on the left, code value 20 is reserved for the regular SPACE character, code value A0 is reserved for the non-breaking space, and code value A1 is assigned to the eternity sign, which currently has no designated code point in Unicode. Some mappings incorrectly claim that it has a code point of U+0530. This is incorrect, as that code point has not been allocated.
Code values 00–1F, and 7F–9F are not assigned to characters by AST 34.002, though they may be the same as the ISO-8859-1 control characters that are located in those positions.
The code value A2 was used for encoding the Armenian ligature ew (used as a symbol), but was later replaced by the section sign punctuation. Some Armenian fonts display this ligature at the position of the ASCII ampersand symbol, but it is strongly suggested to encode the ligature using the two standard Armenian small letters that compose it.
The code value FF may be filled with the Armenian small letter modifier apostrophe (but it has no mapping in Unicode, and shown here using the ASCII apostrophe instead, for correct rendering with Unicode fonts, it is suggested that the small letter modifier be represented using code value FE with ligature control to change its position because it only occurs after a small Armenian letter), and the Armenian apostrophe at encoded at FE occurs only after a capital Armenian letter. So most implementations do not encode anything at code value FF.
[edit] ArmSCII-8A
AST 34.001:1997 (ArmSCII-8A) 8-bit coded character set for Armenian. | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
0x | unused | |||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | Ա | ա | Բ | բ | Գ | գ | Դ | դ | Ե | ե | Զ | զ | Է | է | Ը | ը |
9x | Թ | թ | Ժ | ժ | Ի | ի | Լ | լ | Խ | խ | Ծ | ծ | Կ | կ | Հ | հ |
Ax | Ձ | ձ | Ղ | ղ | Ճ | ճ | Մ | մ | Յ | յ | Ն | ն | Շ | շ | « | » |
Bx | unused | |||||||||||||||
Cx | ||||||||||||||||
Dx | unused | ֊ | … | ՞ | ||||||||||||
Ex | Ո | ո | Չ | չ | Պ | պ | Ջ | ջ | Ռ | ռ | Ս | ս | Վ | վ | Տ | տ |
Fx | Ր | ր | Ց | ց | Ւ | ւ | Փ | փ | Ք | ք | Օ | օ | Ֆ | ֆ | ՚ | NB SP |
In the table above, code value 20 is the regular SPACE character, and code value DC is the eternity sign, which has no designated codepoint in Unicode. Some mappings incorrectly claim that it has a codepoint of U+0530. This is incorrect, as that codepoint has not been allocated.
Code values 00–1F, 7F, and B0–DB are not assigned to characters by AST 34.002, though they may be the same as those used in a legacy DOS/OEM codepage 437 (box drawing characters) or Macintosh Roman.
[edit] Support for the Armenian script in other standards
[edit] ISO 10585:1996
ISO 10585:1996 7-bit coded character set for Armenian. | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
0x | not used | |||||||||||||||
1x | ||||||||||||||||
2x | SP | Ա | Բ | Գ | Դ | Ե | Զ | Է | Ը | Թ | Ժ | Ի | Լ | Խ | Ծ | Կ |
3x | Հ | Ձ | Ղ | Ճ | Մ | Յ | Ն | Շ | Ո | Չ | Պ | Ջ | Ռ | Ս | Վ | Տ |
4x | Ր | Ց | Ւ | Փ | Ք | Օ | Ֆ | ՝ | ՚ | ֊ | ։ | , | ՞ | ՟ | ||
5x | ա | բ | գ | դ | ե | զ | է | ը | թ | ժ | ի | լ | խ | ծ | կ | |
6x | հ | ձ | ղ | ճ | մ | յ | ն | շ | ո | չ | պ | ջ | ռ | ս | վ | տ |
7x | ր | ց | ւ | փ | ք | օ | ֆ | ― | ‐ | ″ | · | ՛ | ՜ |
For comparison, this is the 7-bit encoding in the international standard ISO/IEC 10585 standard that was used before the revision in the Armenian standard AST34.002:1997 (ArmSCII-8).
In this standard (as well as in ISO/IEC 10646 and Unicode), there's only one Armenian apostrophe modifier letter encoded at 0x49 when Armenian uses two modifier letter apostrophes which are cased (U+055A represents the capital apostrophe but is not considered dual-cased in Unicode and this ISO 15985 standard, the small letter apostrophe is absent but generally represented by the ASCII apostrophe U+0027 in Unicode documents).
The left half-ring punctuation (a modifier letter) and the eternity symbol are also missing, and only one double quotation mark (U+2033) is encoded in code value 7A instead of double guillemots in the three ArmSCII variants.
[edit] ISO/IEC 10646-1 and Unicode
Armenian Unicode.org chart (PDF) |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+053x | Ա | Բ | Գ | Դ | Ե | Զ | Է | Ը | Թ | Ժ | Ի | Լ | Խ | Ծ | Կ | |
U+054x | Հ | Ձ | Ղ | Ճ | Մ | Յ | Ն | Շ | Ո | Չ | Պ | Ջ | Ռ | Ս | Վ | Տ |
U+055x | Ր | Ց | Ւ | Փ | Ք | Օ | Ֆ | ՙ | ՚ | ՛ | ՜ | ՝ | ՞ | ՟ | ||
U+056x | ա | բ | գ | դ | ե | զ | է | ը | թ | ժ | ի | լ | խ | ծ | կ | |
U+057x | հ | ձ | ղ | ճ | մ | յ | ն | շ | ո | չ | պ | ջ | ռ | ս | վ | տ |
U+058x | ր | ց | ւ | փ | ք | օ | ֆ | և | ։ | ֊ |
For comparison, the Unicode code points for Armenian are shown on the left.
Its encoding since Unicode 1.1 (except the Armenian hyphen U+058A, the last character added since Unicode 3.0) was based on the previous ISO 10585 7-bit encoding standard with a dozen of characters added (but non letters were reorganized to include most characters added in Armenian standard AST32.002:1997).
Capital letters are encoded in the first half of the block (terminated by modifier letters).
Lowercase letters are encoded in the second half of the block (terminated by Armenian punctuation signs).
Unlike the ArmSCII encodings, this encoding is stable and portable across systems, and contain all characters needed for Armenian (with the exception of the Armenian eternity sign). Some Unicode-encoded fonts for Armenian are mapping the eternity sign at code point U+0530. This is incorrect, as that code point has not been allocated.
However the distinction for the mirrored parenthesis, so the standard ASCII/Unicode punctuation must be used according to their usual rendering. The left half-ring mark (modifier letter) is encoded here, and some other marks are unified with other scripts (notably the quotation marks, middle dot and dashes).
[edit] Code mappings and classification
Note that some transcodings are shown below between parentheses. They are only approximation fallbacks but do not map exactly the intended character.
[edit] See also
[edit] External references
- [ArmSCII] Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991.
- [AST 34.001-97] Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997.
- [ArmSCII Version 2] Armenian Standard Code for Information Interchange, Version 2 -- ArmSCII Working Group, May 1999.
[edit] External links
- http://www.freenet.am/armscii/ basic information and utilities to support for the ArmSCII standard