ArmSCII

From Wikipedia, the free encyclopedia

ARMSCII or ArmSCII is the acronym of the Armenian Standard Code for Information Interchange. It refers to several single-byte character encodings defined by Armenian national standard 166-97.

However these encodings are not widely used because the standard was published one year after the publication of international standard ISO 10585 that defined another 7-bit encoding, from which the encoding and mapping to the UCS (Univeral Coded Character Set, as defined in the international ISO 10646 and Unicode 1.1 standards) were also derived, and there was a lack of support in the computer industry for adding ArmSCII.

1 The encodings defined in the ArmSCII standard
2 Support for the Armenian script in other standards
- 2.1 ISO 10585:1996
- 2.2 ISO/IEC 10646-1 and Unicode
3 Code mappings and classification
4 See also

[edit] The encodings defined in the ArmSCII standard

Very few systems support these encodings. Windows does not support them for example. It is usually better to use Unicode for proper interchange of Armenian text for web browsers and email, since most modern computers do not support ARMSCII by default.

The following three main variants are defined:

ArmSCII-7 defined in AST 34.005 is an 7-bit encoding, not containing latin characters.
ArmSCII-8 defined in AST 34.002 is an 8 bit encoding and a superset of ASCII.
ArmSCII-8A defined in AST 34.002 is an alternate 8 bit encoding and also a superset of ASCII.

Note that each ArmSCII encoding also has several minor variants, depending on the revision of the related Armenian standard (which was not made official before 1997, and was defined informally before that; this has caused various confusions and the mappings described below are just best practices according to the latest 1997 revision of the Armenian standard), that may change the exact mapping and usage of a few punctuation characters and symbols.

None of the ArmSCII encodings have reached international approval (unlike the ISO 10585 standard, despite of the critics sent by the official Armenian standard body to ISO/DIS JTC 1/SC 2/WG 2, working on single byte coded character sets) because all international efforts have been made since then to work with the UCS (in Unicode and ISO 10646).

ArmSCII-8 is intended for use on Unix and Windows systems, and for information interchange on the WWW and by email. However Microsoft wanted users to use Unicode and not introduce a pleathora of new code pages, so it is not supported natively on Windows. It just consists in remapping ArmSCII-7 in the higher range above the standard US ASCII range.

ArmSCII-8A is intended for use on DOS and Mac systems. It is a rearrangement of ArmSCII-8, to work with existing DOS and Mac code that reserve a range of code values for characters not intended for text but for presentation layout, using modified fonts; it is however considered as a "hack" of the code pages over which it is applied, as neither DOS (or Windows in the "OEM" compatibility code page used by the text-only console) nor MacOS has ever supported this encoding natively, notably in their filesystem (but this is also true for the now deprecated ISO 10585 standard). However, this encoding cannot map all the punctuation characters normally needed for Armenian, so the missing characters must be approximated using fallbacks to ASCII punctuation (some Armenian fonts may display these ASCII punctuation using the rendering intended for the Armenian characters that are mapped to them by these fallbacks).

[edit] ArmSCII-7

AST 34.005:1997 (ArmSCII-7) 7-bit coded character set for Armenian.
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	unused
1x	unused
2x	SP		§	։	)	(	»	«	―	·	՝	,	‐	֊	…	՜
3x	՛	՞	Ա	ա	Բ	բ	Գ	գ	Դ	դ	Ե	ե	Զ	զ	Է	է
4x	Ը	ը	Թ	թ	Ժ	ժ	Ի	ի	Լ	լ	Խ	խ	Ծ	ծ	Կ	կ
5x	Հ	հ	Ձ	ձ	Ղ	ղ	Ճ	ճ	Մ	մ	Յ	յ	Ն	ն	Շ	շ
6x	Ո	ո	Չ	չ	Պ	պ	Ջ	ջ	Ռ	ռ	Ս	ս	Վ	վ	Տ	տ
7x	Ր	ր	Ց	ց	Ւ	ւ	Փ	փ	Ք	ք	Օ	օ	Ֆ	ֆ	՚

In the table on the left, code value 21 is the eternity sign, which has no designated codepoint in Unicode. Some mappings incorrectly claim that it has a codepoint of U+0530. This is incorrect, as that codepoint has not been allocated.

Code value 20 is the regular SPACE character, code values 00–1F and 7F are not assigned to characters by AST 34.005, though they may be the same as the ASCII control characters that are located in those positions.

Code value 22 was initially used to encode the Armenian ligature ew (և), but later replaced by the section sign (§). It is strongly suggested to encode this ligature with the normal Armenian ech (yech) and yiwn (vyun) small letters pair as various softwares or fonts will render it differently depending on the version of ArmsCII-7 they are assuming, and let the renderer generate the ligature.

Code value 7F may be used sometimes as a substitution for the non-breaking space.

This table is simply remapped to higher codes by simple offset in ArmSCII-8 (below).

[edit] ArmSCII-8

AST 34.002:1997 (ArmSCII-8) 8-bit coded character set for Armenian.
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	unused
1x	unused
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x	unused
9x	unused
Ax	NB SP		§	։	)	(	»	«	―	·	՝	,	‐	֊	…	՜
Bx	՛	՞	Ա	ա	Բ	բ	Գ	գ	Դ	դ	Ե	ե	Զ	զ	Է	է
Cx	Ը	ը	Թ	թ	Ժ	ժ	Ի	ի	Լ	լ	Խ	խ	Ծ	ծ	Կ	կ
Dx	Հ	հ	Ձ	ձ	Ղ	ղ	Ճ	ճ	Մ	մ	Յ	յ	Ն	ն	Շ	շ
Ex	Ո	ո	Չ	չ	Պ	պ	Ջ	ջ	Ռ	ռ	Ս	ս	Վ	վ	Տ	տ
Fx	Ր	ր	Ց	ց	Ւ	ւ	Փ	փ	Ք	ք	Օ	օ	Ֆ	ֆ	՚

In the table on the left, code value 20 is reserved for the regular SPACE character, code value A0 is reserved for the non-breaking space, and code value A1 is assigned to the eternity sign, which currently has no designated code point in Unicode. Some mappings incorrectly claim that it has a code point of U+0530. This is incorrect, as that code point has not been allocated.

Code values 00–1F, and 7F–9F are not assigned to characters by AST 34.002, though they may be the same as the ISO-8859-1 control characters that are located in those positions.

The code value A2 was used for encoding the Armenian ligature ew (used as a symbol), but was later replaced by the section sign punctuation. Some Armenian fonts display this ligature at the position of the ASCII ampersand symbol, but it is strongly suggested to encode the ligature using the two standard Armenian small letters that compose it.

The code value FF may be filled with the Armenian small letter modifier apostrophe (but it has no mapping in Unicode, and shown here using the ASCII apostrophe instead, for correct rendering with Unicode fonts, it is suggested that the small letter modifier be represented using code value FE with ligature control to change its position because it only occurs after a small Armenian letter), and the Armenian apostrophe at encoded at FE occurs only after a capital Armenian letter. So most implementations do not encode anything at code value FF.

[edit] ArmSCII-8A

AST 34.001:1997 (ArmSCII-8A) 8-bit coded character set for Armenian.
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	unused
1x	unused
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x	Ա	ա	Բ	բ	Գ	գ	Դ	դ	Ե	ե	Զ	զ	Է	է	Ը	ը
9x	Թ	թ	Ժ	ժ	Ի	ի	Լ	լ	Խ	խ	Ծ	ծ	Կ	կ	Հ	հ
Ax	Ձ	ձ	Ղ	ղ	Ճ	ճ	Մ	մ	Յ	յ	Ն	ն	Շ	շ	«	»
Bx	unused
Cx	unused
Dx	unused													֊	…	՞
Ex	Ո	ո	Չ	չ	Պ	պ	Ջ	ջ	Ռ	ռ	Ս	ս	Վ	վ	Տ	տ
Fx	Ր	ր	Ց	ց	Ւ	ւ	Փ	փ	Ք	ք	Օ	օ	Ֆ	ֆ	՚	NB SP

In the table above, code value 20 is the regular SPACE character, and code value DC is the eternity sign, which has no designated codepoint in Unicode. Some mappings incorrectly claim that it has a codepoint of U+0530. This is incorrect, as that codepoint has not been allocated.

Code values 00–1F, 7F, and B0–DB are not assigned to characters by AST 34.002, though they may be the same as those used in a legacy DOS/OEM codepage 437 (box drawing characters) or Macintosh Roman.

[edit] Support for the Armenian script in other standards

[edit] ISO 10585:1996

	ISO 10585:1996 7-bit coded character set for Armenian.
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
0x	not used
1x	not used
2x	SP	Ա	Բ	Գ	Դ	Ե	Զ	Է	Ը	Թ	Ժ	Ի	Լ	Խ	Ծ	Կ
3x	Հ	Ձ	Ղ	Ճ	Մ	Յ	Ն	Շ	Ո	Չ	Պ	Ջ	Ռ	Ս	Վ	Տ
4x	Ր	Ց	Ւ	Փ	Ք	Օ	Ֆ		՝	՚	֊		։	,	՞	՟
5x		ա	բ	գ	դ	ե	զ	է	ը	թ	ժ	ի	լ	խ	ծ	կ
6x	հ	ձ	ղ	ճ	մ	յ	ն	շ	ո	չ	պ	ջ	ռ	ս	վ	տ
7x	ր	ց	ւ	փ	ք	օ	ֆ		―	‐	″		·	՛	՜

For comparison, this is the 7-bit encoding in the international standard ISO/IEC 10585 standard that was used before the revision in the Armenian standard AST34.002:1997 (ArmSCII-8).

In this standard (as well as in ISO/IEC 10646 and Unicode), there's only one Armenian apostrophe modifier letter encoded at 0x49 when Armenian uses two modifier letter apostrophes which are cased (U+055A represents the capital apostrophe but is not considered dual-cased in Unicode and this ISO 15985 standard, the small letter apostrophe is absent but generally represented by the ASCII apostrophe U+0027 in Unicode documents).

The left half-ring punctuation (a modifier letter) and the eternity symbol are also missing, and only one double quotation mark (U+2033) is encoded in code value 7A instead of double guillemots in the three ArmSCII variants.

[edit] ISO/IEC 10646-1 and Unicode

	Armenian Unicode.org chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+053x		Ա	Բ	Գ	Դ	Ե	Զ	Է	Ը	Թ	Ժ	Ի	Լ	Խ	Ծ	Կ
U+054x	Հ	Ձ	Ղ	Ճ	Մ	Յ	Ն	Շ	Ո	Չ	Պ	Ջ	Ռ	Ս	Վ	Տ
U+055x	Ր	Ց	Ւ	Փ	Ք	Օ	Ֆ			ՙ	՚	՛	՜	՝	՞	՟
U+056x		ա	բ	գ	դ	ե	զ	է	ը	թ	ժ	ի	լ	խ	ծ	կ
U+057x	հ	ձ	ղ	ճ	մ	յ	ն	շ	ո	չ	պ	ջ	ռ	ս	վ	տ
U+058x	ր	ց	ւ	փ	ք	օ	ֆ	և		։	֊

For comparison, the Unicode code points for Armenian are shown on the left.

Its encoding since Unicode 1.1 (except the Armenian hyphen U+058A, the last character added since Unicode 3.0) was based on the previous ISO 10585 7-bit encoding standard with a dozen of characters added (but non letters were reorganized to include most characters added in Armenian standard AST32.002:1997).

Capital letters are encoded in the first half of the block (terminated by modifier letters).

Lowercase letters are encoded in the second half of the block (terminated by Armenian punctuation signs).

Unlike the ArmSCII encodings, this encoding is stable and portable across systems, and contain all characters needed for Armenian (with the exception of the Armenian eternity sign). Some Unicode-encoded fonts for Armenian are mapping the eternity sign at code point U+0530. This is incorrect, as that code point has not been allocated.

However the distinction for the mirrored parenthesis, so the standard ASCII/Unicode punctuation must be used according to their usual rendering. The left half-ring mark (modifier letter) is encoded here, and some other marks are unified with other scripts (notably the quotation marks, middle dot and dashes).

[edit] Code mappings and classification

Note that some transcodings are shown below between parentheses. They are only approximation fallbacks but do not map exactly the intended character.

Subset	Character	Armenian description or usage	Short name	Encodings					Notes
Subset	Character	Armenian description or usage	Short name	ArmSCII-7	ArmSCII-8	ArmSCII-8A	ISO 10585	Unicode ISO/IEC 10646	Notes
General purpose		space	space	20	20	20	20	0020	same as ASCII and Unicode
General purpose		non-breaking space	nbsp	(20)	A0	FF	(20)	00A0	missing in ArmSCII-7 and ISO 10585
Armenian symbols		eternity sign	armeternity	21	A1	DC	—	—	missing in Unicode
	և	ligature ech yiwn (ew)	armew	(3B,75)	(26) (or BB,F5)	(26) (or 89,F5)	(55,72)	0587 (or 0565,0582)	specific to Armenian : compatibility ligature of Armenian ech (yech) and yiwn (vyun) small letters, used as a symbol (similar to ampersand symbol in ASCII)
	§	section sign	armsection	22	A2	—	—	00A7	from ISO 8859; missing in all ArmSCII variants
Armenian punctuation	։	full stop (vertsaket)	armfullstop	23	A3	(3A)	4C	0589	specific to Armenian : looks mostly like ASCII colon, but distinct usage ; missing in ArmSCII-8A (approximated by ASCII colon)
	)	right parenthesis	armparenright	24	A4	29	(79)	0029	from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)
	(	left parenthesis	armparenleft	25	A5	28	(79)	0028	from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)
	»	right quotation mark	armquotright	26	A6	AF	(7A)	00BB	from ISO-8859, name and usage different and Unicode
	«	left quotation mark	armquotleft	27	A7	AE	(7A)	00AB	from ISO-8859, name and usage different and Unicode
	″	quotation mark	—	—	—	—	7A	2033	used for either left or right quotation mark in ISO 10585
	―	em-dash	armemdash	28	A8	(5F)	78	2015	from ISO-8859; missing in ArmSCII-8A (approximated by ASCII underscore)
	.	middle dot (mijaket)	armdot	29	A9	(2E)	7C	00B7	sometimes similar to ASCII full stop, but usage different in Armenian where the middle dot is preferred; missing in ArmSCII-8A (approximated by ASCII full stop)
	՝	separation mark (but)	armsep	2A	AA	(60)	48	055D	usage specific to Armenian : used as a comma ; = bowt ; missing in ArmSCII-8A (approximated by ASCII backquote)
	,	comma	armcomma	2B	AB	2C	4D	002C	same as ASCII and Unicode comma
	‐	dash	armendash	2C	AC	(2D)	79	2010	similar to the short variant of the ASCII and Unicode minus-hyphen (shorter than the general purpose minus sign used in ASCII) ; missing in ArmSCII-8A (approximated by ASCII minus-hyphen)
Armenian modifier letters	֊	hyphen (yentamna)	armyentamna	2D	AD	DD	4A	058A	specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)
	…	ellipsis	armellipsis	2E	AE	DE	(7C,7C,7C)	2026	from ISO-8859, but not a punctuation : a modifier letter that follows and modifies another normal Armenian letter (possibly with combining punctuation between them)
	ՙ	numeric mark (left half-ring)	armnum	—	—	—	—	0559	specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them) ; missing in all ArmSCII variants
	՚	apostrophe (right half-ring)	armapostrophe	7E	FE	FE	49	055A	specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)
Armenian combining punctuation	՜	exclamation mark (amanak)	armexclam	2F	AF	(7E)	7E	055C	specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters however they are normally not spacing ; = batsaganchakan nshan ; missing in ArmSCII-8A (approximated by ASCII tilde symbol)
	՛	emphasis mark (shesht)	armaccent	30	B0	(27)	7D	055B	specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters however they are normally not spacing ; missing in ArmSCII-8A (approximated by ASCII single quote)
	՞	question mark (paruyk)	armquestion	31	B1	DF	4E	055E	specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters however they are normally not spacing ; = hartsakan nshan
	՟	abbreviation mark (patiw)	armabbrev	—	—	—	4F	055F	specific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters however they are normally not spacing
Armenian capital letters	Ա	Ayb	Armayb	32	B2	80	21	0531
	Բ	Ben	Armben	34	B4	82	22	0532
	Գ	Gim	Armgim	36	B6	84	23	0533
	Դ	Da	Armda	38	B8	86	24	0534
	Ե	Ech (Yech)	Armyech	3A	BA	88	25	0535
	Զ	Za	Armza	3C	BC	8A	26	0536
	Է	Eh (E)	Arme	3E	BE	8C	27	0537
	Ը	Et (At)	Armat	40	C0	8E	28	0538
	Թ	To	Armto	42	C2	90	29	0539
	Ժ	Zhe	Armzhe	44	C4	92	2A	053A
	Ի	Ini	Armini	46	C6	94	2B	053B
	Լ	Liwn (Lyun)	Armlyun	48	C8	96	2C	053C
	Խ	Xeh (Khe)	Armkhe	4A	CA	98	2D	053D
	Ծ	Ca (Tsa)	Armtsa	4C	CC	9A	2E	053E
	Կ	Ken	Armken	4E	CE	9C	2F	053F
	Հ	Ho	Armho	50	D0	9E	30	0540
	Ձ	Ja (Dza)	Armdza	52	D2	A0	31	0541
	Ղ	Ghad (Ghat)	Armghat	54	D4	A2	32	0542
	Ճ	Cheh (Tche)	Armtche	56	D6	A4	33	0543
	Մ	Men	Armmen	58	D8	A6	34	0544
	Յ	Yi (Hi)	Armhi	5A	DA	A8	35	0545
	Ն	Now (Nu)	Armnu	5C	DC	AA	36	0546
	Շ	Sha	Armsha	5E	DE	AC	37	0547
	Ո	Vo	Armvo	60	E0	E0	38	0548
	Չ	Cha	Armcha	62	E2	E2	39	0549
	Պ	Peh (Pe)	Armpe	64	E4	E4	3A	054A
	Ջ	Jheh (Je)	Armje	66	E6	E6	3B	054B
	Ռ	Ra	Armra	68	E8	E8	3C	054C
	Ս	Seh (Se)	Armse	6A	EA	EA	3D	054D
	Վ	Vew (Vev)	Armvev	6C	EC	EC	3E	054E
	Տ	Tiwn (Tyun)	Armtyun	6E	EE	EE	3F	054F
	Ր	Reh (Re)	Armre	70	F0	F0	40	0550
	Ց	Co (Tso)	Armtso	72	F2	F2	41	0551
	Ւ	Yiwn (Vyun)	Armvyun	74	F4	F4	42	0552
	Փ	Piwr (Pyur)	Armpyur	76	F6	F6	43	0553
	Ք	Keh (Ke)	Armke	78	F8	F8	44	0554
	Օ	Oh (O)	Armo	7A	FA	FA	45	0555
	Ֆ	Feh (Fe)	Armfe	7C	FC	FC	46	0556
Armenian small letters	ա	ayb	armayb	33	B3	81	51	0561
	բ	ben	armben	35	B5	83	52	0562
	գ	gim	armgim	37	B7	85	53	0563
	դ	da	armda	39	B9	87	54	0564
	ե	ech (yech)	armyech	3B	BB	89	55	0565
	զ	za	armza	3D	BD	8B	56	0566
	է	eh (e)	arme	3F	BF	8D	57	0567
	ը	et (at)	armat	41	C1	8F	58	0568
	թ	to	armto	43	C3	91	59	0569
	ժ	zhe	armzhe	45	C5	93	5A	056A
	ի	ini	armini	47	C7	95	5B	056B
	լ	liwn (lyun)	armlyun	49	C9	97	5C	056C
	խ	xeh (khe)	armkhe	4B	CB	99	5D	056D
	ծ	ca (tsa)	armtsa	4D	CD	9B	5E	056E
	կ	ken	armken	4F	CF	9D	5F	056F
	հ	ho	armho	51	D1	9F	60	0570
	ձ	ja (dza)	armdza	53	D3	A1	61	0571
	ղ	ghad (ghat)	armghat	55	D5	A3	62	0572
	ճ	cheh (tche)	armtche	57	D7	A5	63	0573
	մ	men	armmen	59	D9	A7	64	0574
	յ	yi (hi)	armhi	5B	DB	A9	65	0575
	ն	now (nu)	armnu	5D	DD	AB	66	0576
	շ	sha	armsha	5F	DF	AD	67	0577
	ո	vo	armvo	61	E1	E1	68	0578
	չ	cha	armcha	63	E3	E3	69	0579
	պ	peh (pe)	armpe	65	E5	E5	6A	057A
	ջ	jheh (je)	armje	67	E7	E7	6B	057B
	ռ	ra	armra	69	E9	E9	6C	057C
	ս	seh (se)	armse	6B	EB	EB	6D	057D
	վ	vew (vev)	armvev	6D	ED	ED	6E	057E
	տ	tiwn (tyun)	armtyun	6F	EF	EF	6F	057F
	ր	reh (re)	armre	71	F1	F1	70	0580
	ց	co (tso)	armtso	73	F3	F3	71	0581
	ւ	yiwn (vyun)	armvyun	75	F5	F5	72	0582
	փ	piwr (pyur)	armpyur	77	F7	F7	73	0583
	ք	keh (ke)	armke	79	F9	F9	74	0584
	օ	oh (o)	armo	7B	FB	FB	75	0585
	ֆ	feh (fe)	armfe	7D	FD	FD	76	0586

[edit] See also

[edit] External references

[ArmSCII] Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991.
[AST 34.001-97] Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997.
[ArmSCII Version 2] Armenian Standard Code for Information Interchange, Version 2 -- ArmSCII Working Group, May 1999.

[edit] External links

http://www.freenet.am/armscii/ basic information and utilities to support for the ArmSCII standard

[edit] Related articles

Armenian alphabet
Armenian language
Romanization of Armenian (including ISO 9985 standard)
Traditional Armenian orthography
Reformed Armenian orthography
Armenian calendar

Categories: Character sets | Communications in Armenia