Comparison of OpenDocument and Microsoft Office Open XML formats

From Wikipedia, the free encyclopedia

Microsoft Office Open XML and OpenDocument are two competing XML-based formats for documents intended for use in office productivity software. Both formats combine XML content with other (media) files into compressed ZIP archives. In both formats, the main office document content and presentation information is stored as XML, with the ability to reference embedded and external (binary) content such as BMP, GIF, JPEG. Both support a subset of the Dublin Core metadata standard.

There is debate about technical merit between supporters of each format. Also an issue in terms of the success of the formats is the politics of adoption. The technical arguments, as in other battles for standards, could turn out to be less important than customer perception.

OpenDocument is defined by OASIS, a not-for-profit, international consortium that drives the development, convergence, and adoption of e-business standards. OpenDocument is an approved ISO standard. It was approved as an ISO and IEC International Standard in May 2006, designated, ISO/IEC 26300. It has been a published ISO/IEC standard since November 2006.

Microsoft Office Open XML is defined by Microsoft and has been validated as a standard by Ecma International, an industry association dedicated to the standardization of Information and Communication Technology (ICT) and Consumer Electronics (CE) in December 2006. Control of the Ecma standard will rest with Ecma International.

The OpenDocument format is implemented in several applications; at the time of writing Microsoft Office Open XML is being tested with the release candidate of Microsoft Office 2007.

The OpenDocument format is the native format of both OpenOffice.org 2.0 and KDE KOffice, and is targeted as a native format for multiple applications. Microsoft Office Open XML will be used as the native format for Microsoft Office 2007. As well as Microsoft Office 2007 providing native support for the format, a compatible plug-in will be released for some earlier editions of the Microsoft Office suite. An OSS plug-in for Microsoft Office is being developed that will add support for opening and saving files in the OpenDocument format. It is not clear at this stage what level of interoperability either plugins will provide.

Here are some of the typical arguments about the technical merits of each format.

Contents

[edit] Advantages of OpenDocument over Microsoft XML formats

Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship wrote an article published by the online journal GrokLaw that argues OpenDocument has several technical advantages over Office Open XML (Hudson, 2005[1]). The article examined some problems based on the original draft of the Office Open XML standard (which has since been superseded), and claims the following differences:

  1. OpenDocument uses a mixed content model, whereas the Office Open XML format does not. "Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). This sort of mismatch leads to awkward markup... The mixed-content model makes more sense, and is closer to what a developer will be familiar to."
  2. OpenDocument is similar to XHTML, while MS XML is not. OpenDocument uses mixed content and marks styles in a similar way. This makes it easier to transform data accurately between OpenDocument and XHTML, and also simplifies the reuse of existing skills.
  3. OpenDocument gives better separation of style and content. "Both formats give you some separation, and neither format gives you perfect separation. But OpenDocument goes much further in that direction."
  4. OpenDocument hyperlink URLs are embedded in the main file, whereas in Office Open XML the URL is placed in a separate file.
  5. OpenDocument reuses existing standards whenever possible. It uses SVG for drawings, MathML for equations, XLink for linking, Dublin Core for metadata, etc. "This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards", whereas "MS XML re-invents the wheel".

Since the article written by the OpenDocument Fellowship, Office Open XML specification now incorporates the Dublin Core metadata as well. However, OpenDocument still claims the advantage of using a mixed model similar to XHTML, as well as separation of content from presentation. Perhaps most importantly, OpenDocument continues to reuse more existing standards wherever possible (such as SVG, MathML, and so on), instead of recreating their own unique format, simplifying implementation and interoperability (as well as reusing significant work from each of those pre-existing standards).

[edit] Advantages of Office Open XML formats over OpenDocument

Proponents of the Office Open XML format have addressed some of the criticism due to comparisons with OpenDocument, and offered their own criticisms of the OpenDocument format. Much of this criticism has been offered by Brian Jones, a program manager for Microsoft Office who works on the XML functionality and file formats in the Office product.

  1. Microsoft has stated that a design goal for its formats was 100% compatibility with the existing base of documents and formatting used by its customers. In particular, Jones states that OpenDocument is not able to capture all the information potentially held by a binary Office file, whereas OpenXML can do that.[2]
  2. Microsoft also states, relatedly to the previous point, that the OpenDocument format lacks support for the complete set of functionality in Microsoft Office applications (such as VBA and OLE support, support for highlighting[3], international numbering[4], tables in presentations[5], and other features), so any converter that saved information from an Office file (either binary or OpenXML) into an OpenDocument format would potentially be lossy.
  3. Microsoft Excel has a well-known formula language that has been defined in its entirety in the new XML formats, whereas the OpenDocument TC is still working on such a specification. MS Office program manager Brian Jones notes on his Open XML blog that the Open XML draft specification has about 200 pages on the subject, whereas the OpenDocument specification has a few lines.[6]Currently ODF cannot be considered interoperable for spreadsheet documents as it allows for vendor specific implementations of the formulas in spreadsheets.
  4. It has been suggested that Office Open XML supports several non-western languages better than ODF - specifically, that it has better Arabicization and Internationalization .[7][8]. These issues have probably arisen due to the comments of the Egypt ISO member as part of the OpenDocument ISO standardisation process. The OpenDocument TC has addressed these comments, though, and states "OpenDocument v1.0 has BiDi support, as well as support for text orientations, directions, numeric digits presentations and calendars [...] The TC intents to add a non-normative appendix which explains these features to a future version of the OpenDocument specification". Text for this appendix is available [9] and explains where the TC suggest this support is derived.
  5. The OpenXML spreadsheet format appears to be much faster than the ODF spreadsheet format. ZDNet has tested both XML formats in their native applications OpenOffice.org 2.0 and Microsoft Office 2003. Office Open XML takes a distinctly different approach to the storage of spreadsheet data to OpenDocument, and implements several optimizations (such as sparsely populating worksheets, and sharing strings). Microsoft's Brian Jones had added some information[10] on this subject as well. Note also that the native proprietary Excel XLS binary format appears to be much faster than both XML implementations.[11]
  6. All external references, such as hyperlinks or linked files, reside in a single relationships XML file contained in the document archive. This allows for easy access to all external references in the document. This makes it much easier to do link fix-up if you are moving files from one server to another. Or if you want to remove all external references for security reasons, you just edit the relationships.[12]

[edit] Cross-platform interoperability

Microsoft Office 2007 for Windows, scheduled for general release in early 2007, is expected to use Microsoft Office Open XML as its native file format. Microsoft Office 2007 for Mac OS X, scheduled for release in late summer 2007, will also use Microsoft Office Open XML as its native file format.[13]

Corel has indicated that the Wordperfect Office X3 suite will include support Microsoft Office Open XML once Microsoft Office 2007 is released.[14] Gnumeric has included preliminary support for the Microsoft Office Open XML spreadsheet format since version 1.7.0a.

No publicly available interoperability test suite exists for Microsoft Office Open XML format. Since no currently released office suites provide native support for the format, it is not known to what extent documents saved in the Microsoft Office Open XML format will be properly formatted in other office suites.


OpenDocument Format is currently used as the native file format in several office suites and individual applications.[15] Support for OpenDocument was implemented independently, first in the KOffice suite[16] and later in OpenOffice.org. Office suites which natively support OpenDocument Format are available on Windows, Mac OS X, GNU/Linux, Solaris, and Symbian OS.

The ODF Test Suite is a publicly available interoperability test suite developed by Intel and the University of Central Florida. Automated results are available for interoperability testing of KOffice and OpenOffice.org.

[edit] Example XML comparisons

First an example of the mixed vs non mixed examples as provided in the groklaw comparison of the two formats. Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing).

Non-Mixed (Open XML)

<w:p>
 <w:r>
  <w:t>This is a </w:t>

 </w:r>
 <w:r>
  <w:rPr>
   <w:b />
  </w:rPr>
  <w:t>very basic</w:t>

 </w:r>
 <w:r>
  <w:t> document </w:t>
 </w:r>
 <w:r>

  <w:rPr>
   <w:i />
  </w:rPr>
  <w:t>with some</w:t>
 </w:r>

 <w:r>
  <w:t> formatting, and a </w:t>
 </w:r>
 <w:hyperlink w:rel="rId4" w:history="1">
  <w:r>

   <w:rPr>
    <w:rStyle w:val="Hyperlink" />
   </w:rPr>
   <w:t>hyperlink</w:t>
  </w:r>

 </w:hyperlink>

</w:p>

Mixed (ODF):

 
<text:p text:style-name="Standard">
   This is a
   <text:span text:style-name="T1">very basic</text:span>
   document
   <text:span text:style-name="T2"> with some </text:span>
   formatting, and a
   <text:a xlink:type="simple" xlink:href="http://example.com">hyperlink</text:a>
</text:p>

Secondly an example (provided by Brian Jones weblog) to support Microsoft's choice for smaller tagging. For this example, the top example is using SpreadsheetML from the Ecma Office Open XML format. The second example is using the OpenDocument format.

Short tag example (Open XML):

<row><c><v>1</v></c><c><v>2</v></c><c><v>3</v></c></row>
<row><c><v>4</v></c><c><v>5</v></c><c><v>6</v></c></row> 

Long tag example (ODF):

<table:table-row table:style-name="ro1">
 <table:table-cell office:value-type="float" office:value="1">
  <text:p>1</text:p>
 </table:table-cell>
 <table:table-cell office:value-type="float" office:value="2">
  <text:p>2</text:p>
 </table:table-cell>
 <table:table-cell office:value-type="float" office:value="3">
  <text:p>3</text:p>
 </table:table-cell>
</table:table-row>
<table:table-row table:style-name="ro1">
 <table:table-cell office:value-type="float" office:value="4">
  <text:p>4</text:p>
 </table:table-cell>
 <table:table-cell office:value-type="float" office:value="5">
  <text:p>5</text:p>
 </table:table-cell>
 <table:table-cell office:value-type="float" office:value="6">
  <text:p>6</text:p>
 </table:table-cell>
</table:table-row> 

In the second example, it is important to note that the size of the document is only marginally impacted by the length of its tags, because OpenDocument files are usually compressed. However, according to Brian Jones, the length of tags does impact compression and parse time when manipulating big documents. Remarks have been made that in the long run, a non-mixed content (such as in OpenDocument) is likely to be more compact than a mixed one.

[edit] References

  1. ^ Alex Hudson; J. David Eisenberg, Bruce D'Arcus, Daniel Carrera (2005-11-25). Format comparison between ODF and MS XML. Groklaw. Retrieved on 2006-10-18.
  2. ^ Brian Jones (June 5, 2006). Thoughts on Open XML in ISO. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-06-26.
  3. ^ Brian Jones (June 1, 2006). Highlighting in a document. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-06-26.
  4. ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-07-22.
  5. ^ Brian Jones (July 20, 2006). Quick question for ODF experts. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-07-22.
  6. ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-06-26.
  7. ^ Rick Jeliffe (May 27, 2006). Open XML at ISO sideshow. O'Reilly network blog article. O'Reilly.com. Retrieved on 2006-06-27.
  8. ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-06-26.
  9. ^ Appendix to future OpenDoocument version (in OpenDocument format).
  10. ^ Brian Jones (May 29, 2006). Spreadsheet performance - Shared Formulas. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-07-22.
  11. ^ George Ou (May 25, 2006). Does the OpenDocument religion make sense?. ZDNet Blog articles. ZDNet. Retrieved on 2006-06-25.
  12. ^ Brian Jones (June 20, 2005). Example Office 12 XML File. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on 2006-07-18.
  13. ^ First details of Office 2007 for Mac.. APC (2006)..
  14. ^ New Wordperfect will support Office 12 formats.. Test Bed (2006)..
  15. ^ Application support for the OpenDocument format.. OpenDocument Fellowship (2006)..
  16. ^ KOffice 1.4 Announcement. KOffice (2005)..

[edit] See Also