Unicode subscripts and superscripts

Unicode has subscripted and superscripted versions of a number of characters including a full set of arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters: "When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts...However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription."^[1]

[hide]

1 Uses
2 Superscripts and subscripts block
3 Other superscript and subscript characters
4 Composite characters
5 References

Uses

Most fonts that include these characters design them for mathematical numerator and denominator glyphs, which are smaller than normal characters but are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph).

This was not the intended use of these characters when Unicode was designed. The intended use was to allow chemical and algebra formulas to be written without markup. Proper appearance of these requires true superscript and subscript, H₂O probably looks better using a subscript markup than using these characters, which appear in your browser as H₂O.

Another Unicode character, the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) was intended to tell a layout system that a fraction, such as ¹¹⁄₁₂, is preferred^[2]. Most font layout systems do not actually produce this, your browser for instance produces 11⁄12.

Superscripts and subscripts block

The most common superscript digits (1,2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript/superscript. Gray cells are reserved for future use, white cells are other characters from Latin-1.

Unicode characters
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+00Bx			x²	x³						x¹
U+207x	x⁰	xⁱ			x⁴	x⁵	x⁶	x⁷	x⁸	x⁹	x⁺	x⁻	x⁼	x⁽	x⁾	xⁿ
U+208x	x₀	x₁	x₂	x₃	x₄	x₅	x₆	x₇	x₈	x₉	x₊	x₋	x₌	x₍	x₎
U+209x	xₐ	xₑ	xₒ	xₓ	xₔ	xₕ	xₖ	xₗ	xₘ	xₙ	xₚ	xₛ	xₜ

equivalent HTML markup
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+00Bx			x²	x³						x¹
U+207x	x⁰	xⁱ			x⁴	x⁵	x⁶	x⁷	x⁸	x⁹	x⁺	x⁻	x⁼	x⁽	x⁾	xⁿ
U+208x	x₀	x₁	x₂	x₃	x₄	x₅	x₆	x₇	x₈	x₉	x₊	x₋	x₌	x₍	x₎
U+209x	x_a	x_e	x_o	x_x	x_ə	x_h	x_k	x_l	x_m	x_n	x_p	x_s	x_t

Other superscript and subscript characters

Unicode also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:

the Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
the Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over a long string of periods: ....ͣ...ͤ...ͥ...ͦ...ͧ...ͨ...ͩ...ͪ...ͫ...ͬ...ͭ...ͮ...ͯ..
the Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ
the Phonetic Extensions block has several sub- and super-scripted letters and symbols: ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵎ ᵏ ᵐ ᵑ ᵒ ᵓ ᵔ ᵕ ᵖ ᵗ ᵘ ᵙ ᵚ ᵛ ᵜ ᵝ ᵞ ᵟ ᵠ ᵡ ᵢ ᵣ ᵤ ᵥ ᵦ ᵧ ᵨ ᵩ ᵪ ᵸ
the Phonetic Extensions Supplement block has a few more: ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ ᶿ

Consolidated for cut-and-pasting purposes, the Unicode standard defines complete sub- and super-scripts for numbers and common mathematical symbols ( ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎ ), a full superscript Latin lowercase alphabet except q ( ᵃ ᵇ ᶜ ᵈ ᵉ ᶠ ᵍ ʰ ⁱ ʲ ᵏ ˡ ᵐ ⁿ ᵒ ᵖ ʳ ˢ ᵗ ᵘ ᵛ ʷ ˣ ʸ ᶻ ), a limited uppercase Latin alphabet ( ᴬ ᴮ ᴰ ᴱ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴼ ᴾ ᴿ ᵀ ᵁ ⱽ ᵂ ), a few subscripted lowercase letters ( ₐ ₑ ₕ ᵢ ₖ ₗ ₘ ₙ ₒ ₚ ᵣ ₛ ₜ ᵤ ᵥ ₓ ), and some Greek letters ( ᵅ ᵝ ᵞ ᵟ ᵋ ᶿ ᶥ ᶲ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ ). Note that since these glyphs come from different ranges, they may not be of the same size and position, depending on the typeface.

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that composite super and subscripts along with other symbols. In most fonts these render much better than attempting to construct these symbols from the above characters or by using markup.

the Latin-1 Supplement block contains the precomposed diagonal fractions ½, ¼, and ¾. The copyright © and registered trademark signs ® are also in this block.
the General Punctuation block contains the permille sign ‰ and the per-ten-thousand sign ‱.
the Number Forms block contains several pre-composed diagonal fractions: ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟ ↉
the Letterlike Symbols block contains a few symbols composed of subscript and superscript characters: ℀ ℁ ℅ ℆ № ℠ ™ ⅍

References

"Superscripts and Subscripts" (PDF file)

^ Martin Dürst, Asmus Freytag (16 May 2007). "Unicode in XML and other Markup Languages". W3C. http://www.w3.org/TR/unicode-xml/#Superscripts. Retrieved 13 September 2010.
^ Martin Dürst, Asmus Freytag (16 May 2007). "Fraction Slash". W3C. http://www.w3.org/TR/unicode-xml/#Fraction. Retrieved 13 September 2010.

Unicode

Code points

Characters

Special purpose	BOM Combining grapheme joiner Left-to-right mark and Right-to-left mark Soft hyphen Zero-width non-breaking space Zero-width joiner Zero-width non-joiner Zero-width space

Miscellaneous lists	Combining character Duplicate characters Graphic characters

Processing

Algorithms	Bi-directional text Collation (ISO 14651) Equivalence

Transformation	BOCU-1 CESU-8 UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC Punycode SCSU Comparison

On pairs
of code points

Usage

Related standards