Comparison of OpenDocument and Office Open XML formats
From Wikipedia, the free encyclopedia
Office Open XML and OpenDocument are two competing XML-based formats for documents intended for use in office productivity software. Both formats combine XML content with other (media) files into compressed ZIP archives. In both formats, the main office document content and presentation information is stored as XML, with the ability to reference embedded and external binary content such as PNG, BMP, GIF, and JPEG. Both support a subset of the Dublin Core metadata standard.
There is debate about technical merit between supporters of each format. Also an issue in terms of the success of the formats is the politics of adoption. The technical arguments, as in other battles for standards, could turn out to be less important than customer perception.
Contents |
[edit] Overview
The OpenDocument format was originally defined by StarDivision (later acquired by Sun Microsystems) for their StarOffice product and was brought to OASIS by Sun and IBM who wanted it ratified as a standard. OpenDocument was approved as an ISO and IEC International Standard in May 2006, designated ISO/IEC 26300. It has been a published ISO/IEC standard since November 2006.
Office Open XML is defined by Microsoft and was approved as a standard by Ecma International in December 2006[1], designated ECMA 376.[2] Control of the Ecma standard will rest with Ecma International. It has been submitted to ISO/IEC for adoption under the ISO/IEC JTC 1 process.
The OpenDocument format is the native format of both OpenOffice.org 2.0 and KDE KOffice 1.5, and is targeted as a native format for multiple applications. Office Open XML is the native format for Microsoft Office 2007. A compatible plug-in has been released for some earlier editions of the Microsoft Office suite as well. At least three different OSS plug-ins for Microsoft Office [1] [2] [3] are being developed that will add support for opening and saving files in the OpenDocument format.
[edit] Advantages of OpenDocument over Office Open XML formats
Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship wrote an article published by the online journal Groklaw that argues OpenDocument has several technical advantages over Office Open XML (Hudson, 2005[3]). The article examined some problems based on the original draft of the Office Open XML standard (which has since been superseded), and claims the following differences:
- OpenDocument uses a mixed content model,[4] whereas the Office Open XML format does not. "Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). "This sort of mismatch leads to awkward markup [...] The mixed-content model makes more sense, and is closer to what a developer will be familiar to."
- OpenDocument is similar to XHTML, while MS XML is not. OpenDocument uses mixed content and marks styles in a similar way. "This makes it easier to transform data accurately between OpenDocument and XHTML, and also simplifies the reuse of existing skills."
- OpenDocument gives better separation of style and content. "Both formats give you some separation, and neither format gives you perfect separation. But OpenDocument goes much further in that direction."
- OpenDocument hyperlink URLs are embedded in the main file, whereas in Office Open XML the URL is placed in a separate file.
- OpenDocument reuses existing standards whenever possible. It uses parts of SVG for drawings, MathML for equations, XLink for linking, Dublin Core for metadata, etc. "This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards", whereas "MS XML re-invents the wheel" (date format incompatible with ISO 8601, language specification incompatible with ISO 639). ODF's use of SVG is limited to some attributes and SVG content as such is not supported, even though this is widely touted.
Since the article written by the OpenDocument Fellowship, Office Open XML specification now incorporates the Dublin Core metadata as well. However, OpenDocument still claims the advantage of using a mixed model similar to XHTML, as well as separation of content from presentation. Perhaps most importantly, OpenDocument continues to reuse more existing standards wherever possible (such as SVG, MathML, and so on), instead of recreating their own unique format, simplifying implementation and interoperability (as well as reusing significant work from each of those pre-existing standards).
[edit] Advantages of Office Open XML formats over OpenDocument
Proponents of the Office Open XML format have addressed some of the criticism due to comparisons with OpenDocument, and offered their own criticisms of the OpenDocument format. Much of this criticism has been offered by Brian Jones, a program manager for Microsoft Office who works on the XML functionality and file formats in the Office product.
- Microsoft has stated that a design goal for its formats was 100% compatibility with the existing base of documents and formatting used by its customers. In particular, Jones states that OpenDocument is not able to capture all the information potentially held by a binary Office file, whereas OpenXML can do that.[5]
- Microsoft also states that the OpenDocument format lacks support for the complete set of functionality in Microsoft Office applications (such as VBA and OLE support, support for highlighting,[6] international numbering,[7], tables in presentations[8] and other features), so any converter that saved information from an Office file (either binary or OpenXML) into an OpenDocument format would potentially be lossy. Counter arguments raised on GrokDoc on this matter claim that such features are allowed as part of the OpenDocument format as namespace extensions therefore negating this argument. Although Microsoft assert that ISO/IEC 26300 (ODF) does not fully support their legacy formats (2nd point above), and give this as the purpose for OOXML (1st point above), the Sun Microsystems' plugin for Microsoft Office [4] and the OpenDocument Foundation's daVinci plugin for Microsoft Office [5] both support "seamless two-way conversion of Microsoft Office documents to ODF" and hence they strongly contradict these points.
- Microsoft Excel has a well-known formula language that has been defined in its entirety in the new XML formats, whereas the OpenDocument TC is still working on such a specification. MS Office program manager Brian Jones notes on his Open XML blog that the Open XML draft specification has about 200 pages on the subject, whereas the OpenDocument specification has a few lines.[9] Currently ODF cannot be considered interoperable for spreadsheet documents as it allows for vendor specific implementations of the formulas in spreadsheets.
- It has been suggested that Office Open XML supports several non-western languages better than ODF - specifically, that it has better Arabicization and Internationalization.[10][11] These issues have probably arisen due to the comments of the Egypt ISO member as part of the OpenDocument ISO standardisation process. The OpenDocument TC has addressed these comments, though, and states "OpenDocument v1.0 has BiDi support, as well as support for text orientations, directions, numeric digits presentations and calendars [...] The TC intents to add a non-normative appendix which explains these features to a future version of the OpenDocument specification". Text for this appendix is available [12] and explains where the TC suggest this support is derived.
- The OpenXML spreadsheet format appears to be much faster than the ODF spreadsheet format. ZDNet has tested both XML formats in their native applications OpenOffice.org 2.0 and Microsoft Office 2003. Office Open XML takes a distinctly different approach to the storage of spreadsheet data to OpenDocument, and implements several optimizations (such as sparsely populating worksheets, and sharing strings). Microsoft's Brian Jones had added some information[13] on this subject as well. Note also that the native proprietary Excel XLS binary format appears to be much faster than both XML implementations.[14]
- All external references, such as hyperlinks or linked files, reside in a single relationships XML file contained in the document archive. This allows for easy access to all external references in the document. This makes it much easier to do link fix-up if you are moving files from one server to another. Or if you want to remove all external references for security reasons, you just edit the relationships.[15]. However this may cause problems with manipulation of OOXML using standard tools such as XSLT (a W3C standard).[citation needed]
[edit] Shortcomings of OpenDocument
- OpenDocument has no macro language specification. See OpenDocument#Lack_of_standard_macro.2Fscripting
- The specification is incomplete : no syntax description of formulas, no description of passwords hashing
- No native support of tables in presentations
- ODF 1.1 has no digital signature, which is only expected in 1.2
[edit] Shortcomings of Office Open XML
- The specification is incomplete : some parts are referencing the (not publicly specified) behaviour of other software, like "autoSpaceLikeWord95", without further explanation.
- In SpreadsheetML, a markup language for spreadsheets used in Office Open XML one of the two numeric formats used for storing dates interpretes the number 60 as 1900-02-29 as if year 1900 would a leap year. Any implementation of this date1900 format needs to skip the number 60 when interpreting the numeric datevalue. This issue originated from Lotus 1-2-3, and was preserved by Microsoft Excel for backwards compatibility.
[edit] Cross-platform interoperability
- Microsoft Office 2007 for Windows uses Office Open XML as its native file format. Microsoft Office 2008 for Mac OS X, scheduled for release in late summer 2007, will also use Office Open XML as its native file format.[16] An ODF converter plugin for Microsoft Office XP/2003/2007 for Windows allows one to open and save OpenDocument word processing (.odt) files.
- Corel has indicated that the WordPerfect Office X3 suite will include support for OpenDocument Format as well as Office Open XML by mid-2007.[17]
- Gnumeric has included support for OpenDocument spreadsheet and preliminary support for Microsoft Office Open XML spreadsheet format since version 1.7.
- IBM announced that Lotus Notes will use OpenDocument as the native format for its office productivity editors in the next release, due in 2007. IBM Workplace 2.6 already supports OpenDocument format.
- Google Docs and Spreadsheets supports OpenDocument word processing and spreadsheet formats.
- AbiWord 2.4 supports OpenDocument word processing format.
- Scribus 1.3.3, a multi-platform, open source, page layout application, supports import of OpenDocument word processing files.
- OpenDocument Format is currently supported in several office suites and individual applications[18], including as the native file format for KOffice 1.5, OpenOffice.org 2.0 and StarOffice 8. Support for OpenDocument was implemented independently, first in the KOffice 1.4 suite[19] and later in OpenOffice.org 2.0. Office suites which natively support OpenDocument Format are available on Windows, Mac OS X, Linux, BSD, Solaris, and Symbian OS.
[edit] Interoperability testing
The ODF Test Suite is a publicly available interoperability test suite developed by Intel and the University of Central Florida. Automated results are available for interoperability testing of KOffice and OpenOffice.org.
As of January 2007, no publicly available interoperability test suite exists for Office Open XML format. Since no currently released office suites provide native support for the format, it is not known to what extent documents saved in the Office Open XML format will be properly formatted in other office suites.
[edit] Example XML comparisons
First an example of the mixed vs non mixed examples as provided in the groklaw comparison of the two formats. Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing).
Non-Mixed (Open XML)
<w:p> <w:r><w:t>This is a </w:t></w:r> <w:r><w:rPr><w:b /></w:rPr><w:t>very basic</w:t></w:r> <w:r><w:t> document </w:t></w:r> <w:r><w:rPr><w:i /></w:rPr><w:t>with some</w:t></w:r> <w:r><w:t> formatting, and a </w:t></w:r><w:hyperlink w:rel="rId4" w:history="1"> <w:r><w:rPr><w:rStyle w:val="Hyperlink" /></w:rPr><w:t>hyperlink</w:t></w:r> </w:hyperlink> </w:p>
Mixed (ODF):
<text:p text:style-name="Standard"> This is a <text:span text:style-name="T1">very basic</text:span> document <text:span text:style-name="T2"> with some </text:span> formatting, and a <text:a xlink:type="simple" xlink:href="http://example.com">hyperlink</text:a> </text:p>
Secondly an example (provided by Brian Jones weblog) to support Microsoft's choice for smaller tagging. For this example, the top example is using SpreadsheetML from the Office Open XML format. The second example is using the OpenDocument format.
Short tag example (Open XML):
<row><c><v>1</v></c><c><v>2</v></c><c><v>3</v></c></row> <row><c><v>4</v></c><c><v>5</v></c><c><v>6</v></c></row>
Long tag example (ODF):
<table:table-row table:style-name="ro1"> <table:table-cell office:value-type="float" office:value="1"> <text:p>1</text:p> </table:table-cell> <table:table-cell office:value-type="float" office:value="2"> <text:p>2</text:p> </table:table-cell> <table:table-cell office:value-type="float" office:value="3"> <text:p>3</text:p> </table:table-cell> </table:table-row> <table:table-row table:style-name="ro1"> <table:table-cell office:value-type="float" office:value="4"> <text:p>4</text:p> </table:table-cell> <table:table-cell office:value-type="float" office:value="5"> <text:p>5</text:p> </table:table-cell> <table:table-cell office:value-type="float" office:value="6"> <text:p>6</text:p> </table:table-cell> </table:table-row>
In the second example, it is important to note that the size of the document is only marginally impacted by the length of its tags, because OpenDocument files are usually compressed. However, according to Brian Jones, the length of tags does impact compression and parse time when manipulating big documents. A non-mixed content (such as in OOXML) is likely to be more compact than a mixed one.[citation needed]
Also noted that, in the second example, ODF holds extra two attributes about the value in each cell, attributes office:value-type and office:value, for cell's type and cell's value. Cell's type can be one of "float", "currency", "percentage", "date", or "time"[20]. These attributes explicitly describe the textual representation kept in <text:p />
element. These information are not captured by the OOXML tags shown in the example.
Example of ODF spreadsheet value vs its textual representation, a cell stored "45.6%":
<table:table-cell office:value-type="percentage" office:value="0.456"> <text:p>45.6%</text:p> </table:table-cell>
[edit] References
- ^ Ecma International approves Office Open XML standard. Ecma International (2006-12-07). Retrieved on December 8, 2006.
- ^ http://www.ecma-international.org/publications/standards/Ecma-376.htm
- ^ Alex Hudson; J. David Eisenberg, Bruce D'Arcus, Daniel Carrera (2005-11-25). Format comparison between ODF and MS XML. Groklaw. Retrieved on October 18, 2006.
- ^ An XML element with mixed content may contain character data intermixed with child elements, see the formal specification.
- ^ Brian Jones (June 5, 2006). Thoughts on Open XML in ISO. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on June 26, 2006.
- ^ Brian Jones (June 1, 2006). Highlighting in a document. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on June 26, 2006.
- ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on July 22, 2006.
- ^ Brian Jones (July 20, 2006). Quick question for ODF experts. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on July 22, 2006.
- ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on June 26, 2006.
- ^ Rick Jeliffe (May 27, 2006). Open XML at ISO sideshow. O'Reilly network blog article. O'Reilly.com. Retrieved on June 27, 2006.
- ^ Brian Jones (May 26, 2006). Numbering formats in ODF. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on June 26, 2006.
- ^ Appendix to future OpenDoocument version (in OpenDocument format).
- ^ Brian Jones (May 29, 2006). Spreadsheet performance - Shared Formulas. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on July 22, 2006.
- ^ George Ou (May 25, 2006). Does the OpenDocument religion make sense?. ZDNet Blog articles. ZDNet. Retrieved on June 25, 2006.
- ^ Brian Jones (June 20, 2005). Example Office 12 XML File. Brian Jones: Open XML Formats. MSDN Blogs. Retrieved on July 18, 2006.
- ^ First details of Office 2007 for Mac.. APC (2006)..
- ^ New Wordperfect will support Office 12 formats.. Test Bed (2006)..
- ^ Application support for the OpenDocument format.. OpenDocument Fellowship (2006)..
- ^ KOffice 1.4 Announcement. KOffice (2005)..
- ^ OASIS OpenDocument Essentials Chapter 5. Spreadsheets
- Valoris (2004). Comparative Assessment of Open Documents Formats Market Overview aka the "Valoris Report".
- Jones (2005). Brian Jones: Office XML Formats, Microsoft.
- Groklaw (2005). Format comparison between ODF and MS XML
- Akass (2006). New Wordperfect will support Office 12 formats Personal Computer World