Open Packaging Convention
From Wikipedia, the free encyclopedia
The Open Packaging Conventions (OPC) is a file packaging format created by Microsoft for storing a combination of XML and non-XML files that together form a single entity like an XML Paper Specification (XPS) document in a single compressed file container. This format combines the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.
The OPC is specified in Part 2 of the Ecma Office Open XML standard, Ecma 376.[1]
Contents |
[edit] Usage
Both the XML Paper Specification (XPS)[2] and Office Open XML (OOXML) use Open Packaging Conventions (OPC), which provide a profile of the common ZIP format. As well as XML data and document files the ZIP package can also include other text and binary files in formats such as PNG, BMP, AVI, PDF, RTF or even an already packaged ODF file. OPC also defines some naming conventions and an indirection method to allow position-independence of binary and XML files in the ZIP archive.
OPC files can be opened using common ZIP utilities. Open source libraries for .NET and Java are available for using Open Packaging Conventions. The Open Packaging Conventions is specified in Part 2 of Ecma 376 (131 pages) but is not dependent on other parts of Office Open XML. The Open Packaging Conventions specification also includes details of the ZIP format since ZIP has not actually been specified by any international standard previously, but has widespread community and developer acceptance."
Microsoft has submitted a draft to the Internet Engineering Task Force to for a "pack" URI Scheme (pack://
) to be used for URI references to OPC-based packages.[3]
[edit] Parts and Relationships
An OPC-aware application will use relationships metadata rather than directory names and file names to locate individual files. In OPC terminology, a file is a part. A part also has accompanying metadata, in particular MIME metadata.
An OPC file can contain any arbitrary data, but reserves the ".rel" extension for certain purposes. The locations (/_rels/.rels) and /[Content_Types].xml are the only two reserved locations for parts in files that adhere to Open Packaging Conventions.
- _rels
- The root level _rels folder has the relationships for the OPC file as a whole. The _rels folder always contains a part called .rels. This is where the "package relationships" are located. Whenever one opens a file using these conventions, one always starts by going to the _rels/.rels file. All relationship files are represented with XML. If one opens it in a text editor, one will see a bunch of XML that outlines each relationship for that part.
- [Content_Types].xml file
- This file describes the content of the ZIP package. It also contains a mapping for file extensions and overrides for specific URIs.
- [part].rels
- Each part may have its own relationships. The_rels folders are where one goes to find the relationships for any given part within the package. To find the relationships for a specific part, one looks for the _rels folder that is a sibling of one's part. If the part has relationships, the _rels folder will contain a file that has one's original part name with a .rels appended to it. For example, if the content types part had any relationships, there would be a file called [Content_Types.xml.rels] inside the _rels folder.
[edit] Advantages of using the Open Packaging Conventions
OPC has several advantages like indirection, chunking and relative indirection.[4]
[edit] Indirection
Take the example of a catalog where a logo is repeated 1,000 times. Using an indirection mechanism, if we want to change the logo we only need to change one entry in one file, with no searching involved because we know where to look. This increases maintainability substantially. If you want to change the layout of, say, the ZIP directories where your files are stored, it becomes a trivial matter, because you don't need to know every element that can point to file, they are all in one spot.
[edit] Chunking
It encourages documents to be split into small chunks. This is better for reducing the effect of file corruption. And better for data access: for example, all the style information in one XML part, each separate worksheet or table in their own different parts. This allows faster access and less object creation for clients, and makes it easier for multiple processes to be working on the same document.
Chunking also benefits programmers. Replacing one stylesheet with another becomes a ZIP file operation, not an XML operation. And it reduces the amount of things that a programmer needs to understand, because they can approach the chunks assuming that all the information on a topic is in that chunk: they are spared the mental toil of having to search through a big file with lots of extraneous elements.
[edit] Relative indirection
In the Open Packaging Conventions each file that has reference has its own _rels file with the indirection lists. This makes it easier to cut and paste some information with all its associated resources in some cases, provides name scoping to remove the chance of name clashing between files, and so on.
[edit] References
- ^ Ecma International TC45 (2006-12). Standard ECMA-376 Office Open XML File Formats. Ecma International. Retrieved on 2007-04-04.
- ^ XPS team (2006-09-01). Open Packaging Conventions & Open XML Markup Compatibility. XPS team blog. Retrieved on 2007-04-04.
- ^ The "pack" URI Scheme
- ^ Rick Jeliffe (2007-07-29). Comment on Can a file be ODF and Open XML at the same time?. O'Reilly net XML blogs.
[edit] External links
- Working with OPC Parts
- OPC implementation test documents
- An OPC package explorer that allows you to edit XML parts.