Canonical XML

From Wikipedia, the free encyclopedia

Canonical XML is a normal form of XML, intended to allow relatively simple comparison of pairs of XML documents for equivalence; for this purpose, the Canonical XML transformation removes non-meaningful differences between the documents. Any XML document can be converted to Canonical XML.

For example, XML permits whitespace to occur at various points within start-tags, and attributes to be specified in any order. Such differences are seldom if ever used to convey meaning, and so these forms are generally considered equivalent:

   <p class="a" secure="1">

   <p     secure   = "1"
             class='a'   >

In converting an arbitrary XML document to Canonical XML, attributes are encoded in a normative order (alphabetical by name), and with normative spacing and quoting (though with all namespace declarations placed ahead of regular attributes, and namespaced attributes sorted by namespace rather than prefix or qualified name). Thus, the second form above would be converted to the first.

Canonical XML specifies a number of other details, some of which are:

the UTF-8 encoding is used
line-ends are represented using the character 0x0A
whitespace in attribute values is normalized
entity references and non-special character references are expanded
CDATA marked sections are not used
empty elements are encoded as start/end pairs, not using the special empty-element syntax
default attributes are made explicit
superfluous namespace declarations are deleted

According to the W3C, if two XML documents have the same canonical form, then the two documents are logically equivalent within the given application context (except for limitations regarding a few unusual cases).

However, in a special context users might care about special semantics beyond the generic logical equivalence with which Canonical XML is associated. For example, a steganography system could conceal information in an XML document by varying whitespace, attribute quoting and order, the use of hexadecimal vs. decimal numeric character references, and so on. Obviously converting such a file to Canonical XML would lose those specialized semantics. On the other hand, XML files that differ in their use of upper- vs. lower-case, or that use archaic versus modern spelling, and so on, might be considered equivalent for certain purposes. Such contexts are beyond the scope of Canonical XML.

External links

World Wide Web Consortium

Products and
standards

Recommendations	Canonical XML CDF CSS DOM Geolocation API HTML ITS MathML OWL P3P PLS RDF RDF Schema SISR SKOS SMIL SOAP SRGS SSML SVG SPARQL Timed Text VoiceXML Web Storage WSDL XForms XHTML XHTML+RDFa XInclude XLink XML XML Base XML Encryption XML Events XML Information Set XML namespace XML Schema XML Signature XOP XPath 1.0, 2.0 XPointer XProc XQuery XSL XSL-FO XSLT (elements)

Notes	XAdES XHTML+SMIL XUP

Working drafts	CCXML CURIE HTML5 InkML JSON-LD RIF SCXML SMIL Timesheets sXBL WICD XFDL XFrames XBL XMLHttpRequest

Guidelines	Web Content Accessibility Guidelines

Initiative	Multimodal Interaction Activity Markup Validation Service Web Accessibility Initiative WebPlatform

Deprecated	C-HTML HDML JSSS PGML VML XHTML+MathML+SVG

Organizations

Software

Conference-related

IW3C2
World Wide Web Conference
WWW1

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.

Canonical XML

See also

External links