OpenFormula
From Wikipedia, the free encyclopedia
OpenFormula is the name of a draft open standard for exchanging recalculated formulas in spreadsheets, as well as the name of the project to refine this specification. OpenFormula is a draft addition to the OpenDocument standard (ISO/IEC 26300). OpenFormula was proposed and initially drafted by David A. Wheeler.
Contents |
[edit] History
[edit] Discussion of Need
OpenDocument 1.0 is a specification for the exchange of office documents, and is fully capable of describing mathematical formulas that are displayed on the screen (through its reuse of the MathML standard). It is also fully capable of exchanging spreadsheet data, formats, pivot tables, and other information typically included in a spreadsheet. OpenDocument can exchange spreadsheet formulas (formulas that are recalculated in the spreadsheet); formulas are exchanged as values of the attribute table:formula.
However, many believed that the syntax and semantics of table:formula was not defined in sufficient detail. The OpenDocument version 1.0 specification defines spreadsheet formulas using a set of simple examples which show, for example, how to specify ranges and the SUM() function. Some critics argue that a more detailed, precise specification for spreadsheet functions, including syntax and semantics, should be created to augment these examples. (Wheeler, 2004) (Fioretti, 2005) (Welinder, 2005) (Rathke, 2005).
The OpenDocument committee argued that this was outside their scope at that time. They declared, "A comment was submitted concerning the (inclusion) of a grammar for spreadsheet formulas which conforming implementations should support. While we think that having interoperability on that level would be of great benefit to users, we do not believe [sic] that this is in the scope of the current specification. Especially since it is not specifically related to the actual XML format the specification describes. The TC will work on a solution concerning the documentation of interoperabilty standards that go beyond what is defined in the specification" (OASIS, 2005).
Others have argued that, while the specification is less specific than one might like, the intent is fairly clear (especially since formulas tend to follow decades-long traditions), and also because the vast majority of spreadsheets only use a small set of functions (such as SUM) which are universally supported by all spreadsheet implementations anyway. In practice, many developers look to OpenOffice.org as a "canonical implementation"; since its code is public for anyone to review, and its XML output can be trivially inspected, this can resolve many questions.
[edit] OpenFormula Project
One of the external commentors on OpenDocument, David A. Wheeler, began drafting a specification for formulas; his first draft was released in February 2005. This began a process of discussion with various spreadsheet implementors and developers.
In October 2005 Wheeler publicly began an informal project, backed by the OpenDocument Fellowship, to create a draft formula specification, based on the initial draft and on discussions since that time with various implementors. By January 2006 the group had developed a lengthy specification, and implementors had begun changing their implementations to meet the draft specification.
[edit] OASIS Formula Subcommittee
In February 2006, OASIS formally created the formula subcommittee, naming Wheeler as the subcommittee chair. After discussion, the subcommittee agreed to use the OpenFormula project's document as their base document. Thus, by February 2006, OASIS had a draft formula specification with a detailed framework and over 100 functions defined.
[edit] Microsoft Response
In 2005, Microsoft's Brian Jones noted that OpenDocument did not define spreadsheet formulas in detail (Jones, 2005). However, at the time Microsoft's competing proprietary XML format also did not include this kind of detailed specification for formulas (Wheeler, November 7, 2005).
Microsoft continued to protest that OpenDocument could not be used because it did not define a format for spreadsheet formulas, yet its own specification continued to omit any specification about formulas through April 2006. Finally, in May 2006, Microsoft also began defining formulas in its XML format, 15 months after the first version of OpenFormula and 3 months after OASIS posted its first official draft of its specification.
[edit] OpenFormula Attributes
Key attributes of the OpenFormula specification and development process, many of which are unique to OpenFormula as a recalculated formula format, are:
- Developed by many different implementors. OpenFormula is being developed by representatives from many different implementors, working together, including OpenOffice.org and Sun StarOffice (Eike Rathke), KDE KOffice (David Faure and Tomas Mecir), Gnumeric (Dr. Andreas J. Guelzow and Jody Goldberg), IBM/Lotus 1-2-3 (Rob Weir), and wikiCalc (Dan Bricklin, co-creator of the spreadsheet).
- Developed with experienced users. Many experienced users (such Tom Metcalf, a scientist specializing in the astrophysics of the Sun) take part. The group includes several mathematicians, both users and developers.
- Open development. The discussions of the group, and weekly drafts, are available to the public.
- Fully open standard The specification meets all widely-accepted definitions of being an "open standard", including those by Bruce Perens and the European Union. For example, (1) both open source software and proprietary software can implement it, and (2) the work is based on consensus, not domination by any single supplier.
- Implementors are already implementing it. Implementors have already made changes to their applications due to the work of this body, such as changing how they handle signed values in MOD, the association of exponentiation, and even implementing new functions to conform to the draft standard.
- Focused development. The subcommittee is a large group focused specifically on spreadsheet formulas, and nothing else.
- Not rushed. OpenFormula is based on specification work that was first released on 2005-02-26, as well as a large body of research into different applications.
- Future-proofed format The syntax has been carefully designed to work indefinitely into the future. For example, it allows an arbitrary number of columns, while also allowing arbitrary names of values.
- Embedded test cases. OpenFormula includes a large number of test cases, ones that test and demonstrate the specification including "edge cases" that people often forget. More importantly, they are specially formatted so they can be automatically extracted and placed in a test spreadsheet to test applications. Rob Weir reports, "This gives us a self-testing specification, a great labor savings, as well as a demonstration of the innovative things you can do with ODF (OpenDocument format)."
- Rigorous definitions The test cases (noted previously) help it be far more rigorous. In addition, OpenFormula defines the types for each function (as prototypes of each function). Function definitions are examined deeply, e.g., YEARFRAC() has subtle behavior in the leap years, which were carefully examined and defined.
- Doesn't mandate mistakes. The specification is carefully written to not require certain bugs, just because someone has a bug. For example, Excel incorrectly believes that 1900 was a leap year, and at least draft version 1.3 of the Excel specification claims that compatible applications must make the same mistake, and requires that applications cannot be more capable than Excel by supporting dates before 1900. By comparing many different independent implementations, the OpenFormula group can often detect when an application makes a mistake, and ensure that applications are not overly restricted.
- Innovations from many sources. OpenFormula covers the functions of Excel and OpenOffice.org, plus important functions not found in either one but instead found in other spreadsheet applications, such as Gnumeric and KSpread. For example, the specification includes the functions DECIMAL and BASE, which are much better ways to handle different bases than the old BIN2DEC (etc.) functions. It also includes bit operations like BITAND. These sources include Excel, OpenOffice.org Calc, Sun StarOffice Calc, KDE KOffice Kspread, GNOME Gnumeric, IBM/Lotus 1-2-3, Corel Word Perfect Suite Quattro Pro, wikiCalc, and DocumentToGo's SheetToGo. The subcommittee argues that by including the innovations from around the world of many different independent applications, they produce a better result that is far more inclusive.
- Room for innovation by anyone. Application-specific "namespaces" are defined for functions. This allows spreadsheet applications to add new functions, without interfering with current standard functions, future standard functions, or functions defined by other applications. As a result, different applications can add new functions without interfering with others; once a consensus arises about the new function, it can be standardized. The namespace is based on the Internet's naming service (reversed domain names), so ORG.OPENOFFICE.STYLE would be an OpenOffice.org-unique function.
- Internationalization. The specification does not assume that everyone uses "." as the decimal point, and indeed does not constrain user interfaces at all. Named expressions can have names in local character sets.
- Subset support. Applications can implement a subset or superset. To prevent user confusion, various "groups" are defined so that users can request specific sets of capabilities.
[edit] OpenFormula Groups
One important aspect of OpenFormula is that it provides a predefined set of "groups"; the most important of these groups are small, medium, and large:
- The small group includes a little over 100 functions, including functions for trigonometry, database, finance, and statistics. The vast majority of spreadsheet documents are ably handled by applications that implement the "small" group. At least one PDA application (SheetToGo) has this level of capability, and wikiCalc added the functions in the small group specifically to meet the set defined by OpenFormula.
- The medium group includes all the capabilities of the small group, and adds about 100 more functions.
- The large group includes all the capabilities of the medium group, adding around 130 more functions, as well as capabilities such as complex numbers.
It is expected that users will often request implementations that meet a particular group, based on their needs.
[edit] Expected Completion Time
OpenFormula is expected to have all functions defined by October 2006, and to complete its quality assurance review through its subcommittee by December 2006. Note that many implementors are implementing the specification while it is being written, modifying their applications where necessary to comply with the draft standard.
[edit] References
- Fioretti, Marco (September 20, 2005). OpenDocument office suites lack formula compatibility. NewsForge.
- Jones, Brian (October 4, 2005). Comments from Tim Bray on OpenDocument.
- OASIS (January 14, 2005). Open Office XML Format TC Meeting Minutes 10-Jan-05.
- Rathke, Eike (June 23, 2005). OpenDocument For Spreadsheets. (reply to Morten Welinder).
- Welinder, Morten (June 16, 2005). OpenDocument for Spreadsheets. (complains that the spreadsheet spec doesn't define anything about formulas).
- Wheeler, David A. (November 1, 2004). Proposal: More detailed specification for formulas.
- Wheeler, David A. (November 7, 2005). FYI: Formulas not specified by Microsoft XML, either
[edit] External links
- About OpenFormula, a summary on the OASIS Wiki site
- OASIS OpenDocument Formula subcommitee, website of the subcommittee developing the specification