The Java API for XML Processing, or JAXP ( /ˈdʒækspiː/ jaks-pee), is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents. The three basic parsing interfaces are:
In addition to the parsing interfaces, the API provides an XSLT interface to provide data and structural transformations on an XML document. JAXP was developed under the Java Community Process as JSR 5 (JAXP 1.0) and JSR 63 (JAXP 1.1 and 1.2).
J2SE version | JAXP version bundled |
---|---|
1.4 | 1.1 |
1.5 | 1.3 |
1.6 | 1.4 |
JAXP version 1.4.4 was released on September 3, 2010. JAXP 1.3 was end-of-lifed on February 12, 2008.
Contents |
The DOM interface is perhaps the easiest to understand. It parses an entire XML document and constructs a complete in-memory representation of the document using the classes modeling the concepts found in the Document Object Model(DOM) Level 2 Core Specification.
The DOM parser is called a DocumentBuilder
, as it builds an in-memory Document
representation. The javax.xml.parsers.DocumentBuilder
is created by the javax.xml.parsers.DocumentBuilderFactory
. The DocumentBuilder
creates an org.w3c.dom.Document
instance, which is a tree structure containing nodes in the XML Document. Each tree node in the structure implements the org.w3c.dom.Node
interface. There are many different types of tree nodes, representing the type of data found in an XML document. The most important node types are:
Refer to the Javadoc documentation of the Java package org.w3c.dom
for a complete list of node types.
The SAX parser is called the SAXParser
and is created by the javax.xml.parsers.SAXParserFactory
. Unlike the DOM parser, the SAX parser does not create an in-memory representation of the XML document and so is faster and uses less memory. Instead, the SAX parser informs clients of the XML document structure by invoking callbacks, that is, by invoking methods on a org.xml.sax.helpers.DefaultHandler
instance provided to the parser. This way of accessing document is called Streaming XML.
The DefaultHandler
class implements the ContentHandler
, the ErrorHandler
, the DTDHandler
, and the EntityResolver
interfaces. Most clients will be interested in methods defined in the ContentHandler
interface that are called when the SAX parser encounters the corresponding elements in the XML document. The most important methods in this interface are:
startDocument()
and endDocument()
methods that are called at the start and end of a XML document.startElement()
and endElement()
methods that are called at the start and end of a document element.characters()
method that is called with the text data contents contained between the start and end tags of an XML document element.Clients provide a subclass of the DefaultHandler
that overrides these methods and processes the data. This may involve storing the data into a database or writing it out to a stream.
During parsing, the parser may need to access external documents. It is possible to store a local cache for frequently-used documents using an XML Catalog.
This was introduced with Java 1.3 in May 2000.[1]
StAX was designed as a median between the DOM and SAX interface. In its metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.
The XML Stylesheet Language for Transformations, or XSLT, allows for conversion of an XML document into other forms of data. JAXP provides interfaces in package javax.xml.transform
allowing applications to invoke an XSLT transformation. This interface was originally called TrAX (Transformation API for XML), and was developed by an informal collaboration between the developers of a number of Java XSLT processors.
Main features of the interface are
Two abstract interfaces Source and Result are defined to represent the input and output of the transformation. This is a somewhat unconventional use of Java interfaces, since there is no expectation that a processor will accept any class that implements the interface - each processor can choose which kinds of Source or Result it is prepared to handle. In practice all JAXP processors support the three standard kinds of Source (DOMSource
, SAXSource
, StreamSource
) and the three standard kinds of Result (DOMResult
, SAXResult
, StreamResult
) and possibly other implementations of their own.