Fast Infoset

Fast Infoset (or FI) is an international standard that specifies a binary encoding format for the XML Information Set (XML Infoset) as an alternative to the XML document format. It aims to provide more efficient serialization than the text-based XML format.

FI is effectively a lossless compression, analogous to gzip, for XML, except that while the original formatting is lost, no information is lost in the conversion from XML to FI, and back to XML. While the purpose of compression is to reduce physical data size, FI aims to optimize both document size and processing performance.

The Fast Infoset specification is defined by both the ITU-T and the ISO standards bodies. FI is officially defined in ITU-T Rec. X.891 and ISO/IEC 24824-1, and entitled Fast Infoset. The standard was published by ITU-T on May 14, 2005, and by ISO on May 4, 2007. The Fast Infoset standard document can be downloaded from the ITU website. Though the document does not assert intellectual property (IP) restrictions on implementation or use, page ii warns that it has received notices and the subject may not be completely free of IP assertions.

A common misconception is that FI requires ASN.1 tool support. Although the formal specification uses ASN.1 notation, the standard includes Encoding Control Notation (ECN) and ASN.1 tools are not required by implementations.

An alternative to FI is FleXPath.[1]

Structure

The underlying file format is ASN.1, with tag/length/value blocks. Text values of attributes and elements are stored with length prefixes rather than end delimiters, and data segments do not require escapement for special characters. The equivalent of end tags ("terminators") are needed only at the end of a list of child-elements. Binary data is transmitted in native format, and need not be converted to an transmission format such as base64.

Fast Infoset is a higher level format built on ASN.1 forms and notation. Element and attribute names are stored within the octet stream, unlike traditional ASN.1 encoding schemes. In consequence, The conventional XML file can be recovered from the binary stream without reference the XML Schema, and the XML Schema need not be expressed as an ASN.1 definition. (ASN.1 "Tags" are just type names, e.g. String, Integer, or complex types.) ASN.1 together with ECN is used to define the file format.

An index table is built for most strings, which includes element and attribute names, and their values. This means that the text of repeated tags and values only appears once per document.

Implementations

Reference implementation

A Java implementation of the FI specification is available as part of the GlassFish project. The library is open source and is distributed under the terms of the Apache License 2.0. Several projects use this implementation, including the reference implementation for JAX-WS used in GlassFish Metro. QtitanFastInfoset - implementation for C++ is available under commercial license as a component for Digia Qt Framework.

Performance

Because Fast Infosets are compressed as part of the XML generation process, they are much faster than using Zip-style compression algorithms on an XML stream, although the output is not as well compressed.

SAX-type parsing performance of Fast Infoset is also much faster than parsing performance of XML 1.0, even without any Zip-style compression. Typical increases in parsing speed observed for the reference Java implementation are a factor of 10 over Java Xerces, and a factor of 4 over the Piccolo driver (one of the fastest Java-based XML parsers).[2][3][4]

Typical applications

Portable devices – Mobile devices typically have low bandwidth data connections and slower CPUs. Fast Infoset uses less bandwidth than XML and is faster to process, making it a superior choice.

Storing large volumes of data – When storing XML to either file or database, the volume of data a system produces can often exceed reasonable limits, with a number of detriments: the access times go up as more data is read, CPU load goes up as XML data takes more power to process, and storage costs go up. By storing XML data in Fast Infoset format, data volume may be reduced by as much as 80 percent.

Passing XML through the Internet – When an application passes data over the internet, network bandwidth can be a major bottleneck, seriously degrading the performance of client applications and limiting the server's power to process requests. Reducing the size of data transferred across the internet reduces the time required to send or receive the message, and increases the number of transactions a server can process per hour.

See also

References

  1. Amer-Yahia, Sihem, Laks VS Lakshmanan, and Shashank Pandit. "FleXPath: flexible structure and full-text querying for XML." Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 2004.
  2. "Fast Infoset performance reports". 2005-10-06. Retrieved 2007-10-11.
  3. "Japex Report: ParsingPerformance". 2005-01-10. Retrieved 2007-10-11.
  4. "Japex Report: SizePerformance". 2005-01-10. Retrieved 2007-10-11.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.