List of XML and HTML character entity references

From Wikipedia, the free encyclopedia

Note: This article contains special characters.

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.

Contents

[edit] Character reference overview

A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and uses the format

&#nnnn;

or

&#xhhhh;

where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style.

In contrast, a character entity reference refers to a character by the name of an entity which has the desired character as its replacement text. The entity must either be predefined (built-in to the markup language) or explicitly declared in a Document Type Definition (DTD). The format is the same as for any entity reference:

&name;

where name is the name of the entity. The semicolon is required.

[edit] Character entities in XML

The XML specification defines five built-in character entities and requires that all XML processors honor them. The entities can be explicitly declared in a DTD, as well, but if this is done, the replacement text must be the same as the built-in definitions. XML also allows other named entities of any size to be defined on a per-document basis.

The table below lists the five XML built-in character entities. The "Name" column mentions the entity's name. The "Character" column shows the character, if it is renderable. In order to render the character, the format &name; is used; for example, & renders as &. The "Unicode code point" column cites the character via standard UCS/Unicode "U+" notation, which shows the character's code point in hexadecimal. The decimal equivalent of the code point is then shown in parentheses. The "Standard" column indicates the first version of XML that includes the entity. The "Description" column cites the character via its canonical UCS/Unicode name, in English.

Name Character Unicode code point Standard Description
quot " U+0022 (34) XML 1.0 quotation mark
amp & U+0026 (38) XML 1.0 ampersand
apos ' U+0027 (39) XML 1.0 apostrophe
lt < U+003C (60) XML 1.0 less-than sign
gt > U+003E (62) XML 1.0 greater-than sign

[edit] Character entities in HTML

The HTML 4 DTD explicitly declares 252 character entities. HTML processors must honor the HTML DTD's declarations, even if the DTD is not mentioned in the HTML document. HTML does not allow other named entities to be defined.

HTML document authors who have been exposed to XML and XHTML often overlook the fact that the apos entity is not defined in HTML.

In the table below, the HTML built-in character entities are listed. The columns are as in the XML entity table, above, except "Standard" column indicates the first version of HTML that includes the entity. The version is one of the major releases of the HTML spec: 2.0, 3.2, or 4.0. HTML 4.01 didn't introduce any new entities.

Name Character Unicode code point Standard Description
quot " U+0022 (34) HTML 2.0 quotation mark
amp & U+0026 (38) HTML 2.0 ampersand
lt < U+003C (60) HTML 2.0 less-than sign
gt > U+003E (62) HTML 2.0 greater-than sign
nbsp   U+00A0 (160) HTML 3.2 no-break space
iexcl ¡ U+00A1 (161) HTML 3.2 inverted exclamation mark
cent ¢ U+00A2 (162) HTML 3.2 cent sign
pound £ U+00A3 (163) HTML 3.2 pound sign
curren ¤ U+00A4 (164) HTML 3.2 currency sign
yen ¥ U+00A5 (165) HTML 3.2 yen sign
brvbar ¦ U+00A6 (166) HTML 3.2 broken bar
sect § U+00A7 (167) HTML 3.2 section sign
uml ¨ U+00A8 (168) HTML 3.2 diaeresis
copy © U+00A9 (169) HTML 3.2 copyright sign
ordf ª U+00AA (170) HTML 3.2 feminine ordinal indicator
laquo « U+00AB (171) HTML 3.2 left-pointing double angle quotation mark
not ¬ U+00AC (172) HTML 3.2 not sign
shy ­ U+00AD (173) HTML 3.2 soft hyphen
reg ® U+00AE (174) HTML 3.2 registered sign
macr ¯ U+00AF (175) HTML 3.2 macron
deg ° U+00B0 (176) HTML 3.2 degree sign
plusmn ± U+00B1 (177) HTML 3.2 plus-minus sign
sup2 ² U+00B2 (178) HTML 3.2 superscript two
sup3 ³ U+00B3 (179) HTML 3.2 superscript three
acute ´ U+00B4 (180) HTML 3.2 acute accent
micro µ U+00B5 (181) HTML 3.2 micro sign
para U+00B6 (182) HTML 3.2 pilcrow sign
middot · U+00B7 (183) HTML 3.2 middle dot
cedil ¸ U+00B8 (184) HTML 3.2 cedilla
sup1 ¹ U+00B9 (185) HTML 3.2 superscript one
ordm º U+00BA (186) HTML 3.2 masculine ordinal indicator
raquo  » U+00BB (187) HTML 3.2 right-pointing double angle quotation mark
frac14 ¼ U+00BC (188) HTML 3.2 vulgar fraction one quarter
frac12 ½ U+00BD (189) HTML 3.2 vulgar fraction one half
frac34 ¾ U+00BE (190) HTML 3.2 vulgar fraction three quarters
iquest ¿ U+00BF (191) HTML 3.2 inverted question mark
Agrave À U+00C0 (192) HTML 2.0 latin capital letter a with grave
Aacute Á U+00C1 (193) HTML 2.0 latin capital letter a with acute
Acirc  U+00C2 (194) HTML 2.0 latin capital letter a with circumflex
Atilde à U+00C3 (195) HTML 2.0 latin capital letter a with tilde
Auml Ä U+00C4 (196) HTML 2.0 latin capital letter a with diaeresis
Aring Å U+00C5 (197) HTML 2.0 latin capital letter a with ring above
AElig Æ U+00C6 (198) HTML 2.0 latin capital letter ae
Ccedil Ç U+00C7 (199) HTML 2.0 latin capital letter c with cedilla
Egrave È U+00C8 (200) HTML 2.0 latin capital letter e with grave
Eacute É U+00C9 (201) HTML 2.0 latin capital letter e with acute
Ecirc Ê U+00CA (202) HTML 2.0 latin capital letter e with circumflex
Euml Ë U+00CB (203) HTML 2.0 latin capital letter e with diaeresis
Igrave Ì U+00CC (204) HTML 2.0 latin capital letter i with grave
Iacute Í U+00CD (205) HTML 2.0 latin capital letter i with acute
Icirc Î U+00CE (206) HTML 2.0 latin capital letter i with circumflex
Iuml Ï U+00CF (207) HTML 2.0 latin capital letter i with diaeresis
ETH Ð U+00D0 (208) HTML 2.0 latin capital letter eth
Ntilde Ñ U+00D1 (209) HTML 2.0 latin capital letter n with tilde
Ograve Ò U+00D2 (210) HTML 2.0 latin capital letter o with grave
Oacute Ó U+00D3 (211) HTML 2.0 latin capital letter o with acute
Ocirc Ô U+00D4 (212) HTML 2.0 latin capital letter o with circumflex
Otilde Õ U+00D5 (213) HTML 2.0 latin capital letter o with tilde
Ouml Ö U+00D6 (214) HTML 2.0 latin capital letter o with diaeresis
times × U+00D7 (215) HTML 3.2 multiplication sign
Oslash Ø U+00D8 (216) HTML 2.0 latin capital letter o with stroke
Ugrave Ù U+00D9 (217) HTML 2.0 latin capital letter u with grave
Uacute Ú U+00DA (218) HTML 2.0 latin capital letter u with acute
Ucirc Û U+00DB (219) HTML 2.0 latin capital letter u with circumflex
Uuml Ü U+00DC (220) HTML 2.0 latin capital letter u with diaeresis
Yacute Ý U+00DD (221) HTML 2.0 latin capital letter y with acute
THORN Þ U+00DE (222) HTML 2.0 latin capital letter thorn
szlig ß U+00DF (223) HTML 2.0 latin small letter sharp s (German Eszett)
agrave à U+00E0 (224) HTML 2.0 latin small letter a with grave
aacute á U+00E1 (225) HTML 2.0 latin small letter a with acute
acirc â U+00E2 (226) HTML 2.0 latin small letter a with circumflex
atilde ã U+00E3 (227) HTML 2.0 latin small letter a with tilde
auml ä U+00E4 (228) HTML 2.0 latin small letter a with diaeresis
aring å U+00E5 (229) HTML 2.0 latin small letter a with ring above
aelig æ U+00E6 (230) HTML 2.0 latin lowercase ligature ae
ccedil ç U+00E7 (231) HTML 2.0 latin small letter c with cedilla
egrave è U+00E8 (232) HTML 2.0 latin small letter e with grave
eacute é U+00E9 (233) HTML 2.0 latin small letter e with acute
ecirc ê U+00EA (234) HTML 2.0 latin small letter e with circumflex
euml ë U+00EB (235) HTML 2.0 latin small letter e with diaeresis
igrave ì U+00EC (236) HTML 2.0 latin small letter i with grave
iacute í U+00ED (237) HTML 2.0 latin small letter i with acute
icirc î U+00EE (238) HTML 2.0 latin small letter i with circumflex
iuml ï U+00EF (239) HTML 2.0 latin small letter i with diaeresis
eth ð U+00F0 (240) HTML 2.0 latin small letter eth
ntilde ñ U+00F1 (241) HTML 2.0 latin small letter n with tilde
ograve ò U+00F2 (242) HTML 2.0 latin small letter o with grave
oacute ó U+00F3 (243) HTML 2.0 latin small letter o with acute
ocirc ô U+00F4 (244) HTML 2.0 latin small letter o with circumflex
otilde õ U+00F5 (245) HTML 2.0 latin small letter o with tilde
ouml ö U+00F6 (246) HTML 2.0 latin small letter o with diaeresis
divide ÷ U+00F7 (247) HTML 3.2 division sign
oslash ø U+00F8 (248) HTML 2.0 latin small letter o with stroke
ugrave ù U+00F9 (249) HTML 2.0 latin small letter u with grave
uacute ú U+00FA (250) HTML 2.0 latin small letter u with acute
ucirc û U+00FB (251) HTML 2.0 latin small letter u with circumflex
uuml ü U+00FC (252) HTML 2.0 latin small letter u with diaeresis
yacute ý U+00FD (253) HTML 2.0 latin small letter y with acute
thorn þ U+00FE (254) HTML 2.0 latin small letter thorn
yuml ÿ U+00FF (255) HTML 2.0 latin small letter y with diaeresis
OElig Œ U+0152 (338) HTML 4.0 latin capital ligature oe
oelig œ U+0153 (339) HTML 4.0 latin small ligature oe
Scaron Š U+0160 (352) HTML 4.0 latin capital letter s with caron
scaron š U+0161 (353) HTML 4.0 latin small letter s with caron
Yuml Ÿ U+0178 (376) HTML 4.0 latin capital letter y with diaeresis
fnof ƒ U+0192 (402) HTML 4.0 latin small letter f with hook
circ ˆ U+02C6 (710) HTML 4.0 modifier letter circumflex accent
tilde ˜ U+02DC (732) HTML 4.0 small tilde
Alpha Α U+0391 (913) HTML 4.0 greek capital letter alpha
Beta Β U+0392 (914) HTML 4.0 greek capital letter beta
Gamma Γ U+0393 (915) HTML 4.0 greek capital letter gamma
Delta Δ U+0394 (916) HTML 4.0 greek capital letter delta
Epsilon Ε U+0395 (917) HTML 4.0 greek capital letter epsilon
Zeta Ζ U+0396 (918) HTML 4.0 greek capital letter zeta
Eta Η U+0397 (919) HTML 4.0 greek capital letter eta
Theta Θ U+0398 (920) HTML 4.0 greek capital letter theta
Iota Ι U+0399 (921) HTML 4.0 greek capital letter iota
Kappa Κ U+039A (922) HTML 4.0 greek capital letter kappa
Lambda Λ U+039B (923) HTML 4.0 greek capital letter lamda
Mu Μ U+039C (924) HTML 4.0 greek capital letter mu
Nu Ν U+039D (925) HTML 4.0 greek capital letter nu
Xi Ξ U+039E (926) HTML 4.0 greek capital letter xi
Omicron Ο U+039F (927) HTML 4.0 greek capital letter omicron
Pi Π U+03A0 (928) HTML 4.0 greek capital letter pi
Rho Ρ U+03A1 (929) HTML 4.0 greek capital letter rho
Sigma Σ U+03A3 (931) HTML 4.0 greek capital letter sigma
Tau Τ U+03A4 (932) HTML 4.0 greek capital letter tau
Upsilon Υ U+03A5 (933) HTML 4.0 greek capital letter upsilon
Phi Φ U+03A6 (934) HTML 4.0 greek capital letter phi
Chi Χ U+03A7 (935) HTML 4.0 greek capital letter chi
Psi Ψ U+03A8 (936) HTML 4.0 greek capital letter psi
Omega Ω U+03A9 (937) HTML 4.0 greek capital letter omega
alpha α U+03B1 (945) HTML 4.0 greek small letter alpha
beta β U+03B2 (946) HTML 4.0 greek small letter beta
gamma γ U+03B3 (947) HTML 4.0 greek small letter gamma
delta δ U+03B4 (948) HTML 4.0 greek small letter delta
epsilon ε U+03B5 (949) HTML 4.0 greek small letter epsilon
zeta ζ U+03B6 (950) HTML 4.0 greek small letter zeta
eta η U+03B7 (951) HTML 4.0 greek small letter eta
theta θ U+03B8 (952) HTML 4.0 greek small letter theta
iota ι U+03B9 (953) HTML 4.0 greek small letter iota
kappa κ U+03BA (954) HTML 4.0 greek small letter kappa
lambda λ U+03BB (955) HTML 4.0 greek small letter lamda
mu μ U+03BC (956) HTML 4.0 greek small letter mu
nu ν U+03BD (957) HTML 4.0 greek small letter nu
xi ξ U+03BE (958) HTML 4.0 greek small letter xi
omicron ο U+03BF (959) HTML 4.0 greek small letter omicron
pi π U+03C0 (960) HTML 4.0 greek small letter pi
rho ρ U+03C1 (961) HTML 4.0 greek small letter rho
sigmaf ς U+03C2 (962) HTML 4.0 greek small letter final sigma
sigma σ U+03C3 (963) HTML 4.0 greek small letter sigma
tau τ U+03C4 (964) HTML 4.0 greek small letter tau
upsilon υ U+03C5 (965) HTML 4.0 greek small letter upsilon
phi φ U+03C6 (966) HTML 4.0 greek small letter phi
chi χ U+03C7 (967) HTML 4.0 greek small letter chi
psi ψ U+03C8 (968) HTML 4.0 greek small letter psi
omega ω U+03C9 (969) HTML 4.0 greek small letter omega
thetasym ϑ U+03D1 (977) HTML 4.0 greek theta symbol
upsih ϒ U+03D2 (978) HTML 4.0 greek upsilon with hook symbol
piv ϖ U+03D6 (982) HTML 4.0 greek pi symbol
ensp U+2002 (8194) HTML 4.0 en space [1]
emsp U+2003 (8195) HTML 4.0 em space [2]
thinsp U+2009 (8201) HTML 4.0 thin space [3]
zwnj U+200C (8204) HTML 4.0 zero width non-joiner
zwj U+200D (8205) HTML 4.0 zero width joiner
lrm U+200E (8206) HTML 4.0 left-to-right mark
rlm U+200F (8207) HTML 4.0 right-to-left mark
ndash U+2013 (8211) HTML 4.0 en dash
mdash U+2014 (8212) HTML 4.0 em dash
lsquo U+2018 (8216) HTML 4.0 left single quotation mark
rsquo U+2019 (8217) HTML 4.0 right single quotation mark
sbquo U+201A (8218) HTML 4.0 single low-9 quotation mark
ldquo U+201C (8220) HTML 4.0 left double quotation mark
rdquo U+201D (8221) HTML 4.0 right double quotation mark
bdquo U+201E (8222) HTML 4.0 double low-9 quotation mark
dagger U+2020 (8224) HTML 4.0 dagger
Dagger U+2021 (8225) HTML 4.0 double dagger
bull U+2022 (8226) HTML 4.0 bullet
hellip U+2026 (8230) HTML 4.0 horizontal ellipsis
permil U+2030 (8240) HTML 4.0 per mille sign
prime U+2032 (8242) HTML 4.0 prime
Prime U+2033 (8243) HTML 4.0 double prime
lsaquo U+2039 (8249) HTML 4.0 single left-pointing angle quotation mark
rsaquo U+203A (8250) HTML 4.0 single right-pointing angle quotation mark
oline U+203E (8254) HTML 4.0 overline
frasl U+2044 (8260) HTML 4.0 fraction slash
euro U+20AC (8364) HTML 4.0 euro sign
image U+2111 (8465) HTML 4.0 black-letter capital i
weierp U+2118 (8472) HTML 4.0 script capital p (Weierstrass p)
real U+211C (8476) HTML 4.0 black-letter capital r
trade U+2122 (8482) HTML 4.0 trademark sign
alefsym U+2135 (8501) HTML 4.0 alef symbol
larr U+2190 (8592) HTML 4.0 leftwards arrow
uarr U+2191 (8593) HTML 4.0 upwards arrow
rarr U+2192 (8594) HTML 4.0 rightwards arrow
darr U+2193 (8595) HTML 4.0 downwards arrow
harr U+2194 (8596) HTML 4.0 left right arrow
crarr U+21B5 (8629) HTML 4.0 downwards arrow with corner leftwards
lArr U+21D0 (8656) HTML 4.0 leftwards double arrow
uArr U+21D1 (8657) HTML 4.0 upwards double arrow
rArr U+21D2 (8658) HTML 4.0 rightwards double arrow
dArr U+21D3 (8659) HTML 4.0 downwards double arrow
hArr U+21D4 (8660) HTML 4.0 left right double arrow
forall U+2200 (8704) HTML 4.0 for all
part U+2202 (8706) HTML 4.0 partial differential
exist U+2203 (8707) HTML 4.0 there exists
empty U+2205 (8709) HTML 4.0 empty set
nabla U+2207 (8711) HTML 4.0 nabla
isin U+2208 (8712) HTML 4.0 element of
notin U+2209 (8713) HTML 4.0 not an element of
ni U+220B (8715) HTML 4.0 contains as member
prod U+220F (8719) HTML 4.0 n-ary product
sum U+2211 (8721) HTML 4.0 n-ary summation
minus U+2212 (8722) HTML 4.0 minus sign
lowast U+2217 (8727) HTML 4.0 asterisk operator
radic U+221A (8730) HTML 4.0 square root
prop U+221D (8733) HTML 4.0 proportional to
infin U+221E (8734) HTML 4.0 infinity
ang U+2220 (8736) HTML 4.0 angle
and U+2227 (8743) HTML 4.0 logical and
or U+2228 (8744) HTML 4.0 logical or
cap U+2229 (8745) HTML 4.0 intersection
cup U+222A (8746) HTML 4.0 union
int U+222B (8747) HTML 4.0 integral
there4 U+2234 (8756) HTML 4.0 therefore
sim U+223C (8764) HTML 4.0 tilde operator
cong U+2245 (8773) HTML 4.0 congruent to
asymp U+2248 (8776) HTML 4.0 almost equal to
ne U+2260 (8800) HTML 4.0 not equal to
equiv U+2261 (8801) HTML 4.0 identical to (equivalent to)
le U+2264 (8804) HTML 4.0 less-than or equal to
ge U+2265 (8805) HTML 4.0 greater-than or equal to
sub U+2282 (8834) HTML 4.0 subset of
sup U+2283 (8835) HTML 4.0 superset of
nsub U+2284 (8836) HTML 4.0 not a subset of
sube U+2286 (8838) HTML 4.0 subset of or equal to
supe U+2287 (8839) HTML 4.0 superset of or equal to
oplus U+2295 (8853) HTML 4.0 circled plus
otimes U+2297 (8855) HTML 4.0 circled times
perp U+22A5 (8869) HTML 4.0 up tack
sdot U+22C5 (8901) HTML 4.0 dot operator
lceil U+2308 (8968) HTML 4.0 left ceiling
rceil U+2309 (8969) HTML 4.0 right ceiling
lfloor U+230A (8970) HTML 4.0 left floor
rfloor U+230B (8971) HTML 4.0 right floor
lang U+2329 (9001) HTML 4.0 left-pointing angle bracket
rang U+232a (9002) HTML 4.0 right-pointing angle bracket
loz U+25CA (9674) HTML 4.0 lozenge
spades U+2660 (9824) HTML 4.0 black spade suit
clubs U+2663 (9827) HTML 4.0 black club suit
hearts U+2665 (9829) HTML 4.0 black heart suit
diams U+2666 (9830) HTML 4.0 black diamond suit

  A blue background has been used in order to display each space's width.

[edit] Character entities in XHTML

The XHTML DTDs explicitly declare the same 252 character entities as HTML. Also, by virtue of being XML, XHTML documents may reference the apos entity, and additional entities of any size may be defined on a per-document basis. However, the usability of entity references in XHTML is affected by how the document is being processed:

  • If the document is read by a conforming HTML processor, then only the 252 HTML character entities can safely be used. The use of &apos; or custom entity references may not be supported and may produce unpredictable results.
  • If the document is read by an XML parser that does not or cannot read external entities, then only the five built-in XML character entities can safely be used, although other entities may be used if they are declared in the internal DTD subset.
  • If the document is read by an XML parser that does read external entities, then the five built-in XML character entities can safely be used. The other 248 HTML character entities can be used as long as the XHTML DTD is accessible to the parser at the time the document is read. Other entities may also be used if they are declared in the internal DTD subset.

Only &quot;, &amp;, &lt;, and &gt; will work in all processing situations.

[edit] See also

[edit] References

In other languages