XHTML
From Wikipedia, the free encyclopedia
XHTML | |
---|---|
File extension: | .xhtml, .xht, .html, .htm |
MIME type: | application/xhtml+xml |
Developed by: | World Wide Web Consortium |
Type of format: | Markup language |
Extended from: | XML, HTML |
Standard(s): | 1.0 (Recommendation), 2.0 (Working Draft) |
The Extensible HyperText Markup Language, or XHTML, is a markup language that has the same depth of expression as HTML, but a stricter syntax. Whereas HTML is an application of SGML, a very flexible markup language, XHTML is an application of XML, a more restrictive subset of SGML. Because they need to be well-formed, XHTML documents allow for automated processing to be performed using a standard XML library—unlike HTML, which requires a relatively complex, lenient, and generally custom parser (though an SGML parser library could possibly be used). XHTML can be thought of as the intersection of HTML and XML in many respects, since it is a reformulation of HTML in XML. XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000. XHTML 1.1 became a W3C recommendation May 31, 2001.
Contents |
[edit] Overview
XHTML is the successor to HTML. As such, many consider XHTML to be the current or latest version of HTML. However, XHTML is a separate recommendation; the W3C continues to recommend the use of XHTML 1.1, XHTML 1.0, and HTML 4.01 for web publishing.
[edit] Motivation
The need for a more strict version of HTML was felt primarily because World Wide Web content now needs to be delivered to many devices (like mobile devices) apart from traditional computers, where extra resources cannot be devoted to support the additional complexity of HTML syntax.
Another goal for XHTML and XML was to reduce the demands on parsers and user-agents in general. With HTML, user-agents increasingly took on the burden of “correcting” errant documents. Instead XML requires user-agents to fail when encountering malformed XML. This means an XHTML browser can theoretically be faster and made to run more easily on miniaturized devices than a comparable HTML browser. The recommendation for browsers to post an error rather than attempt to render malformed content should help eliminate malformed content. Even when authors do not validate code, and simply test against an XML browser, errors will be revealed.
An especially useful feature XHTML inherits from its XML underpinnings is XML namespaces. With namespaces, authors or communities of authors can define their own XML elements, attributes and content models to mix within XHTML documents. This is similar to the semantic flexibility of the ‘class’ attribute from HTML, but with much more power. Some W3C XML namespaces/schema that can be mixed with XHTML include MathML for semantic math markup, Scalable Vector Graphics for markup of vector graphics, and RDFa for embedding RDF data.
[edit] Differences from HTML
The changes from HTML to first-generation XHTML 1.0 are minor and are mainly to achieve conformance with XML. The most important change is the requirement that the document must be well-formed and that all elements must be explicitly closed as required in XML. In XML, all element and attribute names are case-sensitive, so the XHTML approach has been to define all tag names to be lowercase. This contrasts with some earlier established traditions which began around the time of HTML 2.0, when many used uppercase tags. In XHTML, all attribute values must be enclosed by quotes (either '
single'
or "
double"
quotes may be used). In contrast, this was sometimes optional in SGML, and hence in HTML, where numeric or boolean attributes can dispense with quotes (quoted attributes are assumed to be strings). All elements must also be explicitly closed, including empty (aka singleton) elements such as img
and br
. This can be done by adding a closing slash to the start tag: <img />
and <br />
. Attribute minimization (e.g., <option selected>
) is also prohibited as the attribute “selected
” contains no explicit value; instead, use <option selected="selected">
. HTML elements which are optional in the content model, will not appear in the DOM tree unless they are explicitly specified. For example, an XHTML page must have a <body>
element, and a table will not have a <tbody>
element unless the author specifies one. More differences are detailed in the W3C XHTML 1.0 recommendation [1].
[edit] Adoption
Adoption of XHTML continues at an uneven pace. The similarities between HTML 4.01 and XHTML 1.0 led many web authors, content management systems and web sites eagerly to adopt the initial W3C XHTML 1.0 recommendations. To aid authors in the transition, the W3C included an appendix to the XHTML 1.0 recommendations describing how to publish XHTML 1.0 documents as HTML-compatible documents and serve them to HTML browsers that were not designed for XHTML.
Browsers' adoption remains incomplete even though it has been many years since the recommendation status of XHTML 1.0 (January 2000). Foremost among these issues is Internet Explorer by Microsoft (MSIE). Though MSIE has XML parsing capabilities (ever since version 5.0 in 1999), and can render XHTML documents which follow the HTML compatibility recommendations, it currently has no support for documents served with the "application/xhtml+xml" MIME type[1].
While most other browsers respond properly to all of the possible XHTML MIME types, they are not yet feature complete; for example Firefox and other Gecko-based browsers do not yet incrementally render XHTML as they receive it over the network in the way they do with HTML (this is fixed in Gecko 1.9, which should become widely available in Firefox 3.0).
Obstacles from browser vendors have slowed the effective rate of the adoption. Without broader browser support, XHTML documents must continue to be served with the MIME type ‘text/html’, and therefore some of the advantages of XML — namespaces, faster parsing and smaller foot-print browsers — remain elusively unattainable on a wide-scale basis. Recently, some[2] have begun to question why web authors ever made the leap into authoring in XHTML and are suggesting that the W3C’s Appendix C HTML Compatibility Guidelines are a hack.[3].
[edit] Versions of XHTML
[edit] XHTML 1.0
The original XHTML W3C Recommendation, XHTML 1.0, was simply a reformulation of HTML 4.01 in XML. There are three different "flavors" of XHTML 1.0, each equal in scope to their respective HTML 4.01 versions.
- XHTML 1.0 Strict is the same as HTML 4.01 Strict, but follows XML syntax rules.
- XHTML 1.0 Transitional allows some common deprecated elements and attributes to be used, which are not permitted in XHTML 1.0 Strict. These include
<center>
,<u>
,<strike>
, and<applet>
. It supports everything found in XHTML 1.0 Strict, but also permits the use of a number of elements and attributes that are judged presentational. [2] - XHTML 1.0 Frameset: Allows the use of HTML framesets.
[edit] XHTML 1.1
The most recent XHTML W3C Recommendation is XHTML 1.1: Module-based XHTML, which is a reformulation of XHTML 1.0 Strict using a set of modules selected from a larger set defined in Modularization of XHTML, a W3C Recommendation which provides a modularization framework, a standard set of modules, and various conformance definitions. All deprecated features of HTML, such as presentational elements and framesets, have been removed from this version. Presentation is controlled purely by Cascading Style Sheets (CSS). This version also allows for ruby markup support, needed for East-Asian languages (especially CJK).
Although Modularization of XHTML allows small chunks of XHTML to be re-used by other XML applications in a well-defined manner, and for XHTML to be extended for specialized purposes, XHTML 1.1 adds the concept of a "strictly conforming" document: such a document cannot employ such features—it must be a complete document containing only elements defined in the modules required by XHTML 1.1. For example, if a document is extended by using elements from the XHTML Frames (frameset) module, it may still be described as XHTML 1.1, but not strictly conforming XHTML 1.1. Instead, it might be described as an XHTML Host Language Conforming Document, if the relevant criteria are satisfied.
[edit] The XHTML 2.0 draft specification
Work on XHTML 2.0 is, as of 2007, still ongoing. The current XHTML 2.0 Working Draft is controversial because it breaks backward compatibility with all previous versions, and is therefore, in effect, a new markup language created to circumvent (X)HTML's limitations rather than being simply a new version. Many issues with compatibility are easily addressed, however, by parsing XHTML 2.0 the same way a user agent would parse XHTML 1.1: via an XML parser and a default CSS document conforming to the current XHTML 2.0 Working Draft.
New features brought into the HTML family of markup languages by XHTML 2.0:
- HTML forms will be replaced by XForms, an XML-based user input specification allowing forms to be displayed appropriately for different rendering devices.
- HTML frames will be replaced by XFrames.
- The DOM Events will be replaced by XML Events, which uses the XML Document Object Model.
- A new list element type, the
nl
element type, will be included to specifically designate a list as a navigation list. This will be useful in creating nested menus, which are currently created by a wide variety of means like nested unordered lists or nested definition lists. - Any element will be able to act as a hyperlink, e.g.,
<li href="articles.html">Articles</li>
, similar to XLink. However, XLink itself is not compatible with XHTML due to design differences. - Any element will be able to reference alternative media with the
src
attribute, e.g.,<p src="lbridge.jpg" type="image/jpeg">London Bridge</p>
is the same as<object src="lbridge.jpg" type="image/jpeg"><p>London Bridge</p></object>
. - The
alt
attribute of theimg
element has been removed: alternative text will be given in the content of theimg
element, much like theobject
element, e.g.,<img src="hms_audacious.jpg">HMS <em>Audacious</em></img>
. - A single heading element (
h
) will be added. The level of these headings will be indicated by the nestedsection
elements, each with their ownh
heading. - The remaining presentational elements
i
,b
andtt
, still allowed in XHTML 1.x (even Strict), will be absent from XHTML 2.0. The only somewhat presentational elements remaining will besup
andsub
for superscript and subscript respectively, because they have significant non-presentational uses and are required by certain languages. All other tags are meant to be semantic instead (e.g.<strong>
for strong or bolded text) while allowing the user agent to control the presentation of elements via CSS. - The addition of RDF triple with the
property
andabout
attributes to facilitate the conversion from XHTML to RDF/XML.
[edit] Other members of the XHTML family
- XHTML Basic: A special "light" version of XHTML for devices that can not support the larger, richer XHTML dialects, intended for use in handhelds and mobile phones. This is the intended replacement for WML and C-HTML.
- XHTML Mobile Profile: Based on XHTML Basic, this OMA (Open Mobile Alliance) effort targets hand phones specifically by adding mobile phone-specific elements to XHTML Basic.
[edit] Valid XHTML documents
An XHTML document that conforms to the XHTML specification is said to be a valid document. In a perfect world, all browsers would follow the web standards and valid documents would predictably render on every browser and platform. Although validating XHTML does not ensure cross-browser compatibility, it is a recommended first step. A document can be checked for validity with the W3C Markup Validation Service.
[edit] DOCTYPEs
In order to validate an XHTML document, a Document Type Declaration (or DOCTYPE) may be used. A DOCTYPE declares to the browser which Document Type Definition (DTD) the document conforms to. A Document Type Declaration should be placed before the root element.
The system identifier part of the DOCTYPE, which in these examples is the URL that begins with "http", need only point to a copy of the DTD to use if the validator cannot locate one based on the public identifier (the other quoted string). It does not need to be the specific URL that is in these examples; in fact, authors are encouraged to use local copies of the DTD files when possible. The public identifier, however, must be character-for-character the same as in the examples.
These are the most common XHTML Document Type Declarations:
- XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">- XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">- XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">- XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">- XHTML 2.0
XHTML 2.0 currently (August 2006) is in a draft phase. If an XHTML 2.0 Recommendation is published with the same document type declaration as in the current Working Draft, the declaration will appear as:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml2.dtd">
A placeholder DTD schema exists at the corresponding URI, though it currently only includes the character reference entities from previous recommendations. XHTML 2 contemplates both a version
attribute and an xsi:schemalocation
attribute on the root HTML element that could possibly serve as a substitute for any DOCTYPE declaration.
[edit] XML namespaces and schemas
In addition to the DOCTYPE, all XHTML elements must be in the appropriate XML namespace for the version being used. This is usually done by declaring a default namespace on the root element using xmlns="namespace"
as in the example below.
For XHTML 1.0 and XHTML 1.1, this is
<html xmlns="http://www.w3.org/1999/xhtml">
XHTML 2.0 requires both a namespace and an XML Schema instance declaration. These might be declared as
<html xmlns="http://www.w3.org/2002/06/xhtml2/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2/ http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd">
This example for XHTML 2.0 also demonstrates the use of multiple namespaces within a document. The first xmlns
default namespace declaration indicates that elements and attributes whose names have no XML namespace prefix fall within the XHTML 2.0 namespace. The second namespace prefix declaration xmlns:xsi
indicates that any elements or attributes prefixed with the xsi:
refer to the XMLSchema-Instance namespace. Through this namespace mechanism XML documents allow the use of a mixture of elements and attributes taken from various XML vocabularies while avoiding the potential for clashes of naming between items from independently developed vocabularies.
Similar to the case of DOCTYPE above, the actual URL to the XML Schema file can be changed, as long as the Universal Resource Identifier (URI) before it (which indicates the XHTML 2.0 namespace) remains the same. The namespace URI is intended to be a persistent and universally unique identifier for the particular version of the specification. If treated as a URL, the actual content located at the site is of no significance.
[edit] XML Declaration
A character encoding may be specified at the beginning of an XHTML document in the XML declaration when the document is served using the application/xhtml+xml
MIME type. (If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.)
For example:
<?xml version="1.0" encoding="UTF-8"?>
The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode if it encounters an XML declaration in a document served as text/html
.
[edit] Common errors
Some of the most common errors in the usage of XHTML are:
- Failing to realize that documents won’t be treated as XHTML unless they are served with an appropiate XML MIME type
- Not closing empty elements (elements without closing tags in HTML4)
- Incorrect:
<br>
- Correct:
<br />
Note that any of these are acceptable in XHTML:<br></br>
,<br/>
and<br />
. Older HTML-only browsers will generally accept<br>
and<br />
. Using<br />
gives some degree of backward and forward compatibility.
- Incorrect:
- Not closing non-empty elements
- Incorrect:
<p>This is a paragraph.<p>This is another paragraph.
- Correct:
<p>This is a paragraph.</p><p>This is another paragraph.</p>
- Incorrect:
- Improperly nesting elements (elements must be closed in reverse order)
- Incorrect:
<em><strong>This is some text.</em></strong>
- Correct:
<em><strong>This is some text.</strong></em>
- Incorrect:
- Not putting quotation marks around attribute values
- Incorrect:
<td rowspan=3>
- Correct:
<td rowspan="3">
- Correct:
<td rowspan='3'>
- Incorrect:
- Using the ampersand outside of entities (use
&
to display the ampersand character)- Incorrect:
<title>Cars & Trucks</title>
- Correct:
<title>Cars & Trucks</title>
- Incorrect:
- Using the ampersand outside of entities in URLs (use
&
instead of&
in links also)- Incorrect:
<a href="index.php?page=news&style=5">News</a>
- Correct:
<a href="index.php?page=news&style=5">News</a>
- Incorrect:
- Failing to recognize that XHTML elements and attributes are case sensitive
- Incorrect:
<BODY><P ID="ONE">The Best Page Ever</P></BODY>
- Correct:
<body><p id="ONE">The Best Page Ever</p></body>
- Incorrect:
- Using attribute minimization
- Incorrect:
<textarea readonly>READ-ONLY</textarea>
- Correct:
<textarea readonly="readonly">READ-ONLY</textarea>
- Incorrect:
- Mis-using CDATA, script-comments and xml-comments when embedding scripts and stylesheets.
- This problem can be avoided altogether by putting all script and stylesheet information into separate files and referring to them as follows in the XHTML
head
element.
- This problem can be avoided altogether by putting all script and stylesheet information into separate files and referring to them as follows in the XHTML
<link rel="stylesheet" href="/style/screen.css" type="text/css" /> <script type="text/javascript" src="/script/site.js"></script>
-
- Note: The format
<script ...></script>
, rather than the more concise<script ... />
, is required for HTML compatibility when served as MIME typetext/html
.
- Note: The format
-
- If an author chooses to include script or style data inline within an XHTML document, different approaches are recommended depending whether the author intends to serve the page as
application/xhtml+xml
and target only fully conformant browsers, or serve the page astext/html
and try to obtain usability in Internet Explorer 6 and other non-conformant browsers.
- If an author chooses to include script or style data inline within an XHTML document, different approaches are recommended depending whether the author intends to serve the page as
-
- In the fully conformant
application/xhtml+xml
case, the non-XML code is wrapped in a CDATA section as follows [3]:
- In the fully conformant
<style type="text/css"> <![CDATA[ p { color: green; } ]]> </style> <script type="text/javascript"> <![CDATA[ function nothing() { } ]]> </script>
-
- If the same file may be served or processed as both XML (
application/xhtml+xml
) and HTML compatibletext/html
to target Internet Explorer 6 and as many other historic and non-conforming browsers as possible, constructs as complex as the following may be necessary[4]:
- If the same file may be served or processed as both XML (
<style type="text/css"> <!--/*--><![CDATA[/*><!--*/ p { color: green; } /*]]>*/--> </style> <script type="text/javascript"> <!--//--><![CDATA[//><!-- function nothing() { } //--><!]]> </script>
Although, the following sufficient for the relatively modern browsers (from 2000 onwards):
<style type="text/css"> /*<![CDATA[*/ p { color: green; } /*]]>*/ </style> <script type="text/javascript"> //<![CDATA[ function nothing() { } //]]> </script>
[edit] Backward compatibility
XHTML 1.0 documents are mostly backward compatible with HTML — that is, processible as HTML by a web browser that does not know how to properly handle XHTML — when authored according to certain guidelines given in the specification and served as text/html. Authors who follow the compatibility guidelines essentially create HTML documents that, while technically invalid, are processible by all modern web browsers.
XHTML 1.1's modularity features prevent it from being backward compatible with XHTML 1.0 and HTML. XHTML 2.0, likewise, is not backward compatible with its predecessors.
[edit] Examples
The followings are examples of XHTML 1.0 Strict. Both of them have the same visual output. The former one follows the HTML Compatibility Guidelines in Appendix C of the XHTML 1.0 Specification while the latter one breaks backward compatibility but provides cleaner codes.
Media type | Example 1 | Example 2 |
---|---|---|
application/xhtml+xml | SHOULD | SHOULD |
application/xml | MAY | MAY |
text/xml | MAY | MAY |
text/html | MAY | SHOULD NOT |
Example 1.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>XHTML 1.0 Example</title> <script type="text/javascript"> <!--//--><![CDATA[//><!-- function loadpdf() { document.getElementById("pdf-object").src="http://www.w3.org/TR/xhtml1/xhtml1.pdf"; } //--><!]]> </script> </head> <body onload="loadpdf()"> <p>This is an example of an <abbr title="Extensible HyperText Markup Language">XHTML</abbr> 1.0 Strict document.<br /> <img id="validation-icon" src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Strict" /><br /> <object id="pdf-object" name="pdf-object" type="application/pdf" data="http://www.w3.org/TR/xhtml1/xhtml1.pdf" width="100%" height="500"> </object> </p> </body> </html>
Example 2.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>XHTML 1.0 Example</title> <script type="text/javascript"> <![CDATA[ function loadpdf() { document.getElementById("pdf-object").src="http://www.w3.org/TR/xhtml1/xhtml1.pdf"; } ]]> </script> </head> <body onload="loadpdf()"> <p>This is an example of an <abbr title="Extensible HyperText Markup Language">XHTML</abbr> 1.0 Strict document.<br /> <img id="validation-icon" src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Strict" /><br /> <object id="pdf-object" name="pdf-object" type="application/pdf" data="http://www.w3.org/TR/xhtml1/xhtml1.pdf" width="100%" height="500" /> </p> </body> </html>
Notes:
- For further information on the media type recommendation, please refer to XHTML Media Types, a W3C Note issued on 2002-08-01.
- The "loadpdf" function is actually a workaround for Internet Explorer. It can be replaced by adding
<param name="src" value="http://www.w3.org/TR/xhtml1/xhtml1.pdf" />
within<object>
. img
element does not get aname
attribute in the XHTML 1.0 Strict DTD.
[edit] Notes
[edit] External links
- W3C's Markup Home Page
- XHTML 1.0 Specification
- XHTML 1.1 Specification
- XHTML 2.0 Working Draft
- XHTML 1.0 Strict / 1.1 Online Reference
- Links dealing with the MIME type of XHTML documents:
- Sending XHTML as text/html Considered Harmful
- Serving up XHTML with the correct MIME type
- The Road to XHTML 2.0: MIME Types - Mark Pilgrim (3/19/2003). Includes examples for conditionally serving
application/xhtml+xml
using PHP, Python, and Apache (mod rewrite). - Mozilla Web Author FAQ: How is the treatment of application/xhtml+xml documents different from the treatment of text/html documents? - summarizes one web browser's XHTML processing mode
[edit] Validators
- W3C's Markup Validator
- Firefox page validator Extension to Mozilla Firefox browser
- WDG HTML Validator
- Hermish Web Accessibility Validator
- Off-line HTML Validator A clipbook for NoteTab text editor
- Off-line HTML Validator v1.0 for Windows
- XHTML Validator Module for ASP.NET 2.0
- Validating XML/XHTML Editor
- Multipage Validator