Round-trip format conversion

From Wikipedia, the free encyclopedia

The term "round-trip" is commonly used in document conversion particularly involving markup languages such as XML and SGML. A successful round-trip consists of converting a document in format A (docA) to one in format B (docB) and then back again to format A (docA'). If docA and docA' are effectively identical then there has been no information loss and the round-trip has been successful.

Contents

[edit] Information loss

When a document in one format is converted to another there is likely to be information loss. For example if an HTML document is saved as " text file" ("*.txt") all the markup (structure, formatting, superscripts, etc.) will be lost. Compound documents will frequently lose information on images and other embedded objects. If the text file is then converted back to the original format information will necessarily be missing.

A similar effect happens with image formats. Some formats such as JPEG achieve compression through small amount of information loss. If a bitmapped file (e.g. BMP) is converted to JPEG and back again then the result will be different from the original (although it may be visually very similar).

[edit] Markup languages

Markup languages such as XML can, in principle, hold any information and so tne process docA => docX => docA' could be designed to avoid information loss. It is now common to convert legacy formats to XML formats because they have greater interoperability and a wider set of available tools. Thus it is possible to convert Word documents to an XML format and reimport them.

The XML document should contain identical information to the legacy format. An important condition is that the roundtrip (legacy -> XML => legacy') should result in effectively identical documents. Because some document structures allow some flexibility in content order, whitespace, case-sensitivity, etc. it is useful to have a means of canonicalising the legacy format. The full roundtrip may then be:

legacy => canonicalLegacy => XML => legacy' => canonicalLegacy'

If canonicalLegacy == canonicalLegacy' then the rondtrip has been successful.

[edit] Usage

The term appears to be common, but not reported in dictionaries. A typical usage occurs in [1] but the term is likely to have been used before this.

[edit] See also

Lossy data compression

[edit] External links

[1]| Round-trip issues on the XML-DEV mailing list


Template:Computer stub