Basic Latin (Unicode block)

C0 Controls and Basic Latin
Range	U+0000..U+007F (128 code points)
Plane	BMP
Scripts	Latin (52 char.) Common (76 char.)
Major alphabets	English French Spanish German Vietnamese
Symbol sets	Arabic numerals Punctuation
Assigned	128 code points 33 Control or Format
Unused	0 reserved code points
Source standards	ISO/IEC 8859, ISO 646
Unicode version history

1.0.0	128 (+128)

Note: ^[1]^[2]

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding.

The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.^[3]

Table of characters

Code	Result	Description	Acronym
C0 controls
U+0000		Null character	NUL
U+0001		Start of Heading	SOH
U+0002		Start of Text	STX
U+0003		End-of-text character	ETX
U+0004		End-of-transmission character	EOT
U+0005		Enquiry character	ENQ
U+0006		Acknowledge character	ACK
U+0007		Bell character	BEL
U+0008		Backspace	BS
U+0009		Horizontal tab	HT
U+000A		Line feed	LF
U+000B		Vertical tab	VT
U+000C		Form feed	FF
U+000D		Carriage return	CR
U+000E		Shift Out	SO
U+000F		Shift In	SI
U+0010		Data Link Escape	DLE
U+0011		Device Control 1	DC1
U+0012		Device Control 2	DC2
U+0013		Device Control 3	DC3
U+0014		Device Control 4	DC4
U+0015		Negative-acknowledge character	NAK
U+0016		Synchronous Idle	SYN
U+0017		End of Transmission Block	ETB
U+0018		Cancel character	CAN
U+0019		End of Medium	EM
U+001A		Substitute character	SUB
U+001B		Escape character	ESC
U+001C		File Separator	FS
U+001D		Group Separator	GS
U+001E		Record Separator	RS
U+001F		Unit Separator	US
ASCII punctuation and symbols
U+0020		Space	SP
U+0021	!	Exclamation mark
U+0022	"	Quotation mark
U+0023	#	Number sign
U+0024	$	Dollar sign
U+0025	%	Percent sign
U+0026	&	Ampersand
U+0027	'	Apostrophe
U+0028	(	Left parenthesis
U+0029	)	Right parenthesis
U+002A	*	Asterisk
U+002B	+	Plus sign
U+002C	,	Comma
U+002D	-	Hyphen-minus
U+002E	.	Full stop or period
U+002F	/	Solidus or Slash
ASCII digits
U+0030	0	Digit Zero
U+0031	1	Digit One
U+0032	2	Digit Two
U+0033	3	Digit Three
U+0034	4	Digit Four
U+0035	5	Digit Five
U+0036	6	Digit Six
U+0037	7	Digit Seven
U+0038	8	Digit Eight
U+0039	9	Digit Nine
ASCII punctuation and symbols
U+003A	:	Colon
U+003B	;	Semicolon
U+003C	<	Less-than sign
U+003D	=	Equal sign
U+003E	>	Greater-than sign
U+003F	?	Question mark
U+0040	@	At sign or Commercial at
Uppercase Latin alphabet
U+0041	A	Latin Capital letter A
U+0042	B	Latin Capital letter B
U+0043	C	Latin Capital letter C
U+0044	D	Latin Capital letter D
U+0045	E	Latin Capital letter E
U+0046	F	Latin Capital letter F
U+0047	G	Latin Capital letter G
U+0048	H	Latin Capital letter H
U+0049	I	Latin Capital letter I
U+004A	J	Latin Capital letter J
U+004B	K	Latin Capital letter K
U+004C	L	Latin Capital letter L
U+004D	M	Latin Capital letter M
U+004E	N	Latin Capital letter N
U+004F	O	Latin Capital letter O
U+0050	P	Latin Capital letter P
U+0051	Q	Latin Capital letter Q
U+0052	R	Latin Capital letter R
U+0053	S	Latin Capital letter S
U+0054	T	Latin Capital letter T
U+0055	U	Latin Capital letter U
U+0056	V	Latin Capital letter V
U+0057	W	Latin Capital letter W
U+0058	X	Latin Capital letter X
U+0059	Y	Latin Capital letter Y
U+005A	Z	Latin Capital letter Z
ASCII punctuation and symbols
U+005B	[	Left Square Bracket
U+005C	\	Backslash ^[A]
U+005D	]	Right Square Bracket
U+005E	^	Circumflex accent
U+005F	_	Low line
U+0060	`	Grave accent
Lowercase Latin alphabet
U+0061	a	Latin Small Letter A
U+0062	b	Latin Small Letter B
U+0063	c	Latin Small Letter C
U+0064	d	Latin Small Letter D
U+0065	e	Latin Small Letter E
U+0066	f	Latin Small Letter F
U+0067	g	Latin Small Letter G
U+0068	h	Latin Small Letter H
U+0069	i	Latin Small Letter I
U+006A	j	Latin Small Letter J
U+006B	k	Latin Small Letter K
U+006C	l	Latin Small Letter L
U+006D	m	Latin Small Letter M
U+006E	n	Latin Small Letter N
U+006F	o	Latin Small Letter O
U+0070	p	Latin Small Letter P
U+0071	q	Latin Small Letter Q
U+0072	r	Latin Small Letter R
U+0073	s	Latin Small Letter S
U+0074	t	Latin Small Letter T
U+0075	u	Latin Small Letter U
U+0076	v	Latin Small Letter V
U+0077	w	Latin Small Letter W
U+0078	x	Latin Small Letter X
U+0079	y	Latin Small Letter Y
U+007A	z	Latin Small Letter Z
ASCII punctuation and symbols
U+007B	{	Left Curly Bracket
U+007C	\|	Vertical bar
U+007D	}	Right Curly Bracket
U+007E	~	Tilde
Control character
U+007F		Delete	DEL

^A The letter U+005C (\) may show up as a Yen or Won sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.^[4]

Subheadings

The C0 Controls and Basic Latin block contains six subheadings.^[5]

C0 controls

The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.^[5]

ASCII punctuation and symbols

This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.^[5]

ASCII digits

The ASCII Digits subheading contains the standard European number characters 1–9 and 0.^[5]

Uppercase Latin alphabet

The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.^[5]

Lowercase Latin alphabet

The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.^[5]

Control character

The Control Character subheading contains the "Delete" character.^[5]

Compact table

C0 Controls and Basic Latin^[1] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+000x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
U+001x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
U+002x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
U+003x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
U+004x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
U+005x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
U+006x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
U+007x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
Notes 1.^ As of Unicode version 10.0

Emoji

The Basic Latin block contains twelve emoji: U+0023, U+002A and U+0030–U+0039.^[6]^[7] They're keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP).

A standardized variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).

The block has 24 standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the following twelve base characters: U+0023, U+002A and U+0030–U+0039.^[8]

All of these base characters default to a text presentation.

Emoji variation sequences
U+	0023	002A	0030	0031	0032	0033	0034	0035	0036	0037	0038	0039
base code point	#	*	0	1	2	3	4	5	6	7	8	9
base+VS15 (text)	#︎	*︎	0︎	1︎	2︎	3︎	4︎	5︎	6︎	7︎	8︎	9︎
base+VS16 (emoji)	#️	*️	0️	1️	2️	3️	4️	5️	6️	7️	8️	9️

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:

Version	Final code points^{[lower-alpha 1]}	Count	L2 ID	WG2 ID	Document
1.0.0	U+0000..007F	128			(to be determined)
			L2/04-145		Starner, David (2004-04-30), C with stroke character examples from BAE report 1884 (Dorsey)
			L2/04-202		Anderson, Deborah (2004-06-07), Slashed C Feedback
			L2/11-043		Freytag, Asmus; Karlsson, Kent (2011-02-02), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters
			L2/11-160		PRI #181 Changing General Category of Twelve Characters, 2011-05-02
			L2/11-438^{[lower-alpha 2]}^{[lower-alpha 3]}	N4182	Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429)
			L2/15-268		Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30), Proposal to Represent the Slashed Zero Variant of Empty Set
			L2/15-301^{[lower-alpha 4]}^{[lower-alpha 3]}		Pournader, Roozbeh (2015-11-01), A proposal for 278 standardized variation sequences for emoji
↑ Proposed code points and characters names may differ from final code points and names ↑ See also L2/10-458, L2/11-414, L2/11-415, and L2/11-429 1 2 Refer to the history section of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents ↑ See also L2/15-198 and L2/15-275

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
↑ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
↑ Sorting it all Out : When is a backslash not a backslash?
1 2 3 4 5 6 7 "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
↑ "UTR #51: Unicode Emoji". Unicode Consortium. 2017-05-18.
↑ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2017-03-27.
↑ "UTS #51 Emoji Variation Sequences". The Unicode Consortium.

External links

Unicode chart U0000 (pdf)

Unicode

Code points

Characters

Special purpose	BOM Combining Grapheme Joiner Left-to-right mark / Right-to-left mark Soft hyphen Word joiner Zero-width joiner Zero-width non-joiner Zero-width space
Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence Variation sequences
Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards