ISO/IEC 646

From Wikipedia, the free encyclopedia

ISO 646 is an ISO standard that specifies a 7-bit character code from which several national standards are derived, the best known of which is ASCII. Since the portion of ISO 646 shared by all countries specified only the letters used in the English alphabet, other countries using the Latin alphabet with extensions needed to create national variants of ISO 646 to be able to use their native languages. Since universal acceptance of the 8 bit byte did not exist at that time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in ASCII do not appear in other national variants of ISO 646.

1 History
2 National variants
3 Variants of ASCII that are not ISO 646
4 See also
5 External links

[edit] History

ISO/IEC 646 and its predecessor ASCII, ANSI X3.4, largely endorses existing practice regarding character encodings in the telecommunications industry's network

During the 1960s, there was debate regarding whether character encoding standards (at either the national or international levels) for computers should follow 1) existing practice in the telecommunications industry (which was largely paper-tape based, but which was commonly transmitted on-line digitally over wires) or, conversely, 2) existing practice in the punched-card portion of the computer industry, whose heritage was especially the off-line storage of World War II-era electro-mechanical punched-card machines predating electronic computers. For obvious corporate-history reasons regarding Hollerith punched cards, IBM sided with the punched-card character encodings, embodied by EBCDIC, whereas many other computer manufacturers sided with the telecommunications industry's character encodings.

The ISO 8859 series of standards governing 8-bit character encodings supersede the ISO 646 international standard and its national variants. The ISO 10646 standard, directly related to Unicode, supersedes all of ISO 646's and ISO 8859's sets of national-variant character encodings with arguably one unified set of character encodings.

[edit] National variants

Some national variants of ISO 646 are:

Code	ISO- IR	Standard	Used in
CA-1	121	CSA Z243.4-1985	Canada (nr. 1 alternative, with “î”) (French, classical)
CA-2	122	CSA Z243.4-1985	Canada (nr. 2 alternative, with “É”) (French, reformed orthography)
CN	057	GB/T 1988-80	People's Republic of China (Basic Latin)
CU	151	NC 99-10:81	Cuba (Spanish)
DE	021	DIN 66083	Germany (German)
DK	—	DS 2089	Denmark (Danish)
FR	069	AFNOR NF Z 62010-1982	France (French)
FR-0	025	AFNOR NF Z 62010-1973	France (obsolete since April 1985)
GB	004	BSI 4730	United Kingdom (English)
GR	088	HOS ELOT	Greece (obsolete)
HU	086	MSZ 7795/3	Hungary(Hungarian)
IE	207	NSAI 433:1996	Ireland (Irish Goidelic)

Code	ISO- IR	Standard	Used in
INV	—	ISO 646:1983	international (Invariant subset)
IRV	002	ISO 646:1983	International Reference Variant
JA	014	JIS C 6220-1969	Japan (Romaji)
JA-O	092	JIS C 6229-1984	Japan (OCR-B)
KR	—	?	South Korea
MT	—	?	Malta (Maltese, English)
NO	060	NS 4551 version 1	Norway
NO-2	061	NS 4551 version 2	Norway (obsolete since June 1987)
SE	010	SEN 85 02 00 Annex B	Sweden (basic Swedish)
SE-C	011	SEN 85 02 00 Annex C	Sweden (extended Swedish for names)
T.61	102	ITU/CCITT T.61 Recommendation	International (Teletex)
US	006	ANSI X3.4-1968	United States (ASCII)
YU	141	JUS I.B1.002	former Yugoslavia (Croatian, Slovenian, Serbian, Latin)

Other proprietary standards approved later for international use by some standard committees:

Code	ISO- IR	Approved by	Origin	Used in
ES	085	ECMA	IBM	Spain (Basque, Castilian, Catalan, Galician)
esp	017	ECMA	Olivetti	Spanish (international)
DK-SE	009-1	SSK	NATS, main set	Sweden and Denmark (journalistic texts)
FI-SE	008-1	SSK	NATS, main set	Sweden and Finland (journalistic texts)

Code	ISO- IR	Approved by	Origin	Used in
ita	015	ECMA	Olivetti	Italian
PT	084	ECMA	IBM	Portugal (Portuguese, Spanish)
por	016	ECMA	Olivetti	Portuguese (international)

The specifics of the changes for some of these variants are given in this table:

Codes			Characters for each ISO 646 compatible charset
binary	decimal	hexa	INV	US	T.61	JA	JA-O	CN	IRV	GB	DK	NO	NO-2	SE	SE-C	DE	HU	FR	FR-0	CA-1	CA-2	IE	IS	ita	por	PT	esp	ES	CU	MT	YU
010 0010	34	22	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"	"
010 0011	35	23		#	#	#	#	#	#	£	#	#	§	#	#	#	#	£	£	#	#	£	#	£	#	£	#	#	#	#	#
010 0100	36	24		$	¤	$	$	¥	$	$	$	$	$	¤	¤	$	¤	$	$	$	$	$	$	$	$	$	$	$	¤	$	$
010 1001	39	27	'	'	'	'	'	'	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’	’
010 1100	44	2C	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,	,
010 1101	45	2D	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
010 1111	47	2F	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/	/
100 0000	64	40		@	@	@	@	@	@	@	@	@	@	@	É	§	Á	à	à	à	à	Ó	Ð	§	§	´	§	·	@	@	Ž
101 1011	91	5B		[	[	[	[	[	[	[	Æ	Æ	Æ	Ä	Ä	Ä	É	°	°	â	â	É	Þ	°	Ã	Ã	¡	¡	¡	ġ	Š
101 1100	92	5C		\		¥	¥	\	\	\	Ø	Ø	Ø	Ö	Ö	Ö	Ö	ç	ç	ç	ç	Í	\	ç	Ç	Ç	Ñ	Ñ	Ñ	ż	Đ
101 1101	93	5D		]	]	]	]	]	]	]	Å	Å	Å	Å	Å	Ü	Ü	§	§	ê	ê	Ú	Æ	é	Õ	Õ	¿	Ç	]	ħ	Ć
101 1110	94	5E		^		^	^	^	ˆ	ˆ	ˆ	ˆ	ˆ	ˆ	Ü	ˆ	ˆ	^	ˆ	î	É	Á	Ö	ˆ	ˆ	ˆ	ˆ	¿	¿	ˆ	Č
101 1111	95	5F	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_	_
110 0000	96	60		`		`		`	`	`	`	`	`	`	é	`	á	µ	µ	ô	ô	ó	ð	ù	`	`	`	`	`	ċ	ž
111 1011	123	7B		{		{	{	{	{	{	æ	æ	æ	ä	ä	ä	é	é	é	é	é	é	þ	à	ã	ã	°	´	´	Ġ	š
111 1100	124	7C		\|	\|	\|	\|	\|	\|	\|	ø	ø	ø	ö	ö	ö	ö	ù	ù	ù	ù	í	\|	ò	ç	ç	ñ	ñ	ñ	Ż	đ
111 1101	125	7D		}		}	}	}	}	}	å	å	å	å	å	ü	ü	è	è	è	è	ú	æ	è	õ	õ	ç	ç	[	Ħ	ć
111 1110	126	7E		~		‾		~	˜	˜	˜	¯	\|	˜	ü	ß	˝	¨	¨	û	û	á	ö	ì	°	˜	˜	¨	¨	Ċ	č

In the table above, the cells with non-white background emphasize the differences from the US variant used in the Basic Latin subset of ISO/IEC 10646 and Unicode.

The characters displayed in cells with red background could be used as combining diacritics, when preceded or followed with a backspace C0 control (this encoding method is deprecated or is not recommended as it was part of some withdrawn national standards). Without such complex encoding, they are no different from the symbols used in the US variant (although glyph variants are still possible, especially on the quotation marks, and circumflex or tilde symbols).

Later, when 8 bit character sets gained more acceptance, ISO 8859-1, ISO 8859-2, and ISO 8859-3 became the preferred method of coding most of these variants.

[edit] Variants of ASCII that are not ISO 646

There are also some 7-bit character sets that are not officially part of the ISO 646 standard. Examples include:

7-bit Greek, ELOT 927. The Greek alphabet is mapped to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters. This mapping with the high bit set is ISO 8859-7.

7-bit Cyrillic, KOI-7 or Short KOI. The Cyrillic characters are mapped to positions 0x60–0x7E, on top of the Latin lowercase letters. Superseded by the KOI-8 variants.

7-bit Hebrew, SI 960. The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is ISO 8859-8.

7-bit Arabic, ASMO 449. The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters. This mapping with the high bit set is ISO 8859-6.

[edit] See also

[edit] External links

Zeichensatz nach ISO 646 (ASCII) (in German)
History at GNU Aspell website
Character Tables by Koichi Yasuoka (see Domestic ISO646 Character Tables and Quasi-ISO646 Character Tables)
Turkish Text Deasciifier a tool (based on statistical pentagram analysis of the Turkish language) which reverts an ASCII'fied Turkish text by determining the appropriate (but ambiguous) diacritics normally needed in Turkish but missing in the US-ASCII set.