Tamil Script Code for Information Interchange

From Wikipedia, the free encyclopedia

Note: This article contains special characters.

Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.^[1]

TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter.

Unicode has used the logical order encoding strategy for Tamil, following ISCII, in contrast to the case of Thai, where the visual order encoding grandfathered by TIS-620 was adopted.

The government of Tamil Nadu endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the WWW.

The free etext collection at Project Madurai uses the TSCII encoding, but has already started to provide Unicode versions.

[edit] Codepage layout

The following character set table may require cleanup to meet Wikipedia's quality standards.
Please improve this table if you can.

TSCII
	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	xA	xB	xC	xD	xE	xF
8x	௦	௧	ஸ்ரீ	ஜ	ஷ	ஸ	ஹ	க்ஷ	ஜ்‌	ஷ்‌	ஸ்‌	ஹ்‌	க்ஷ்‌	௨	௩	௪
9x	௫	‘	’	“	”	௬	௭	௮	௯	ஙு	ஞு	ஙூ	ஞூ	௰	௱	௲
Ax	NBSP	ா	ி	ீ	ு	ூ	ெ	ே	ை	©	ௗ	அ	ஆ		ஈ	உ
Bx	ஊ	எ	ஏ	ஐ	ஒ	ஓ	ஔ	ஃ	க	ங	ச	ஞ	ட	ண	த	ந
Cx	ப	ம	ய	ர	ல	வ	ழ	ள	ற	ன	டி	டீ	கு	சு	டு	ணு
Dx	து	நு	பு	மு	யு	ரு	லு	வு	ழு	ளு	று	னு	கூ	சூ	டூ	ணூ
Ex	தூ	நூ	பூ	மூ	யூ	ரூ	லூ	வூ	ழூ	ளூ	றூ	னூ	க்‌	ங்‌	ச்‌	ஞ்‌
Fx	ட்‌	ண்‌	த்‌	ந்‌	ப்‌	ம்‌	ய்‌	ர்‌	ல்‌	வ்‌	ழ்‌	ள்‌	ற்‌	ன்‌	இ

In the table above 80 is U+0BE6 TAMIL DIGIT ZERO, which has been accepted in Unicode version 4.1. A0 is the NO-BREAK SPACE. The codes AD and FF are unassigned.