Filename extension | .epub |
---|---|
Internet media type | application/epub+zip (unofficial[1]) |
Developed by | International Digital Publishing Forum (IDPF) |
Initial release | September 2007 |
Latest release | 3.0 / October 11, 2011[2] |
Type of format | e-book file format |
Contained by | OEBPS Container Format (OCF) (ZIP) |
Extended from | Open eBook, XHTML, CSS, DTBook |
Website | IDPF Home Page |
EPUB (short for electronic publication; alternatively capitalized as ePub, ePUB, EPub, or epub, with "EPUB" preferred by the vendor) is a free and open e-book standard by the International Digital Publishing Forum (IDPF). Files have the extension .epub.
EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book, although EPUB now also supports fixed-layout content. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale. It supersedes the Open eBook standard.[3]
Contents |
EPUB became an official standard of the International Digital Publishing Forum (IDPF) in September 2007, superseding the older Open eBook standard.[4]
In August 2009, the IDPF announced that they would begin work on maintenance tasks of the EPUB standard.[5] Two broad objectives were defined by this working group: "One set of activities governs maintenance of the current EPUB Standards (i.e. OCF, OPF, and OPS), while another set of activities addresses the need to keep the Standards current and up-to-date." The working group was expected to be active through 2010, publishing updated standards throughout its lifetime.[6] On April 6, 2010, it was announced that this working group would complete their update in April 2010. The result was to be a minor revision to EPUB 2.0.1 which "corrects errors and inconsistencies and does not change functionality".[7] On July 2, 2010, drafts of the version 2.0.1 standards appeared on the IDPF website.[2]
On April 6, 2010, it was announced that a working group would be formed to revise the EPUB specification.[7] In the working group's charter draft, 14 main problems with EPUB are identified which the group will address. The group was chartered through May 2011, and was scheduled to submit a final draft on May 15, 2011.[8] An initial Editors Draft for EPUB3 was published on November 12, 2010,[9] and the first public draft was published on February 15, 2011.[10] On May 23, 2011, the IDPF released its proposed specification for final review.[11] On October 10, 2011, the IDPF announced that its membership had approved EPUB 3 as a final Recommended Specification.
EPUB 3 consists of a set of four specifications:[12]
Detailed descriptions of the differences between 3.0 and 2.0.1 can be found on the IDPF website.
EPUB version 2.0.1 consists of three specifications:
.epub
file in XML.[14]EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of the content document, and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents, and EPUB metadata. Finally, the files are bundled in a zip file as a packaging format.
An EPUB file uses XHTML 1.1 (or DTBook) to construct the content of a book as of version 2.0.1. This is different from previous versions (OEBPS 1.2 and earlier) which used a subset drawn from XHTML. There are, however, a few restrictions on certain elements. The mimetype for XHTML documents in EPUB is application/xhtml+xml
.[13] For a table of the required XHTML modules and a description of the restrictions, please see Section 2.2 of the specification.
Styling and layout are performed using a subset of CSS 2.0, referred to as OPS Style Sheets. This specialized syntax requires only a portion of CSS properties to be supported by reading systems and adds a few custom ones. Custom properties include oeb-page-head, oeb-page-foot,
and oeb-column-number
. Font-embedding can be accomplished using the @font-face
property, as well as including the font file in the OPF's manifest (see below). The mimetype for CSS documents in EPUB is text/css
.[13] For a table of supported properties and detailed information, please see Section 3.0 of the specification.
EPUB also requires that PNG, JPEG, GIF, and SVG images be supported using the mimetypes image/png, image/jpeg, image/gif, image/svg+xml
. Other media types are allowed, but creators must include alternative renditions using supported types.[13] For a table of all required mimetypes, see Section 1.3.7 of the specification.
Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding.[13] This is to support international and multilingual books. However, reading systems are not required to provide the fonts necessary to display every unicode character, though they are required to display at least a placeholder for characters that cannot be displayed fully.[13]
An example skeleton of an XHTML file for EPUB looks like this:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" /> <title>Pride and Prejudice</title> <link rel="stylesheet" href="css/main.css" type="text/css" /> </head> <body> ... </body> </html>
The OPF specification's purpose is to "[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication."[14] This is accomplished by two XML files with the extensions .opf
and .ncx
.
.opf file
The OPF file, traditionally named content.opf
houses the EPUB book's metadata, file manifest, and linear reading order. This file has a root element package
and four child elements: metadata, manifest, spine,
and guide
. All of these except guide
are required. Furthermore, the package
node must have the unique-identifier
attribute. The .opf file's mimetype is application/oebps-package+xml
.[14]
The metadata
element contains all the metadata information for a particular EPUB file. Three metadata tags are required (though many more are available): title, language,
and identifier
. title
contains the title of the book, language
contains the language of the book's contents in RFC 3066 format or its successors, such as the newer RFC 4646 and identifier
contains a unique identifier for the book, such as its ISBN or a URL. The identifier
's id
attribute should equal the unique-identifier
attribute from the package
element.[14] For a full listing of EPUB metadata, please see Section 2.2 of the specification.
The manifest
element lists all the files contained in the package. Each file is represented by an item
element, and has the attributes id, href, media-type
. All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here. Only the .opf
file itself, the container.xml
, and the mimetype
files should not be included.[14] Note that in the example below, an arbitrary media-type
is given to the included font file, even though no mimetype exists for fonts.
The spine
element lists all the XHTML content documents in their linear reading order. Also, any content document that can be reached through linking or the table of contents must be listed as well. The toc
attribute of spine
must contain the id
of the NCX file listed in the manifest. Each itemref
element's idref
is set to the id
of its respective content document.[14]
The guide
element is an optional element for the purpose of identifying fundamental structural components of the book. Each reference
element has the attributes type, title, href
. Files referenced in href
must be listed in the manifest, and are allowed to have an element identifier (e.g. #figures in the example).[14] A list of possible values for type
can be found in Section 2.6 of the specification.
An example OPF file:
<?xml version="1.0"?> <package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:title>Pride and Prejudice</dc:title> <dc:language>en</dc:language> <dc:identifier id="BookId" opf:scheme="ISBN">123456789X</dc:identifier> <dc:creator opf:file-as="Austen, Jane" opf:role="aut">Jane Austen</dc:creator> </metadata> <manifest> <item id="chapter1" href="chapter1.xhtml" media-type="application/xhtml+xml"/> <item id="stylesheet" href="style.css" media-type="text/css"/> <item id="ch1-pic" href="ch1-pic.png" media-type="image/png"/> <item id="myfont" href="css/myfont.otf" media-type="application/x-font-opentype"/> <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/> </manifest> <spine toc="ncx"> <itemref idref="chapter1" /> </spine> <guide> <reference type="loi" title="List Of Illustrations" href="appendix.html#figures" /> </guide> </package>
.ncx file
The NCX file (Navigation Control file for XML), traditionally named toc.ncx
, contains the hierarchical table of contents for the EPUB file. The specification for NCX was developed for Digital Talking Book (DTB), is maintained by the DAISY Consortium, and is not a part of the EPUB specification. The NCX file has a mimetype of application/x-dtbncx+xml
.
Of note here is that the values for the docTitle, docAuthor,
and meta name="dtb:uid"
elements should match their analogs in the OPF file. Also, the meta name="dtb:depth"
element is set equal to the depth of the navMap
element. navPoint
elements can be nested to create a hierarchical table of contents. navLabel
's content is the text that will appear in the table of contents generated by reading systems that use the .ncx. navPoint
's content
element points to a content document listed in the manifest and can also include an element identifier (e.g. #section1).[14][16]
A description of certain exceptions to the NCX specification as used in EPUB can be found in Section 2.4.1 of the specification. The complete specification for NCX can be found in Section 8 of the Specifications for the Digital Talking Book.[16]
An example .ncx file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd"> <ncx version="2005-1" xml:lang="en" xmlns="http://www.daisy.org/z3986/2005/ncx/"> <head> <!-- The following four metadata items are required for all NCX documents, including those conforming to the relaxed constraints of OPS 2.0 --> <meta name="dtb:uid" content="123456789X"/> <!-- same as in .opf --> <meta name="dtb:depth" content="1"/> <!-- 1 or higher --> <meta name="dtb:totalPageCount" content="0"/> <!-- must be 0 --> <meta name="dtb:maxPageNumber" content="0"/> <!-- must be 0 --> </head> <docTitle> <text>Pride and Prejudice</text> </docTitle> <docAuthor> <text>Austen, Jane</text> </docAuthor> <navMap> <navPoint class="chapter" id="chapter1" playOrder="1"> <navLabel><text>Chapter 1</text></navLabel> <content src="chapter1.xhtml"/> </navPoint> </navMap> </ncx>
An EPUB file is a group of files conforming to the OPS/OPF standards that is wrapped in a ZIP file.[3] The OCF specifies how these files should be organized in the ZIP, and defines two additional files that must be included.
The mimetype
file must be a text document in ASCII and must contain the string application/epub+zip
. It must also be uncompressed, unencrypted, and the first file in the ZIP archive. The purpose of this file is to provide a more reliable way for applications to identify the mimetype of the file than just the .epub
extension.[15]
Also, there must be a folder named META-INF
which contains the required file container.xml
. This XML file points to the file defining the contents of the book. This will be the OPF file, though additional alternative rootfile
elements are allowed.[15]
Appart from mimetype
and META-INF/container.xml
, the other files (OPF, NCX, XHTML, CSS and images files) are traditionally put in a directory named OEBPS
.
An example file structure:
--ZIP Container-- mimetype META-INF/ container.xml OEBPS/ book.opf chapter1.xhtml ch1-pic.png css/ style.css myfont.otf
An example container.xml, given the above file structure:
<?xml version="1.0" encoding="UTF-8" ?> <container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>
An EPUB file can optionally contain DRM as an additional layer, but it is not required by the specifications.[17] In addition, the specification does not name any particular DRM system to use, so publishers can choose a DRM scheme to their liking. However, future versions of EPUB (specifically OCF) may specify a format for DRM.[15]
When present, DRMed EPUB files must contain a file called rights.xml
within the META-INF
directory at the root level of the ZIP container.[15]
An open source tool called epubcheck exists for validating and detecting errors in the structural markup (OPS, OPF, OCF) as well as the XHTML and image files. The tool can be run from the command line, or used in webapps and applications as a library. A large part of the original work on the tool was done at Adobe Systems.[18]
The EPUB specification does not enforce or suggest a particular DRM scheme. This could affect the level of support for various DRM systems on devices and the portability of purchased e-books. Consequently, such DRM incompatibility may segment the EPUB format along the lines of DRM systems, undermining the advantages of a single standard format and confusing the consumer.[19][20][21][22][23][24]
The EPUB 3.0 format is intended to address the following criticisms:
Software that reads, and presumably displays, EPUB files is called a reading system. An EPUB reading system is defined as:
“A combination of hardware and/or software that accepts OPS Publications and makes them available to consumers of content. Great variety is possible in the architecture of Reading Systems. A Reading System may be implemented entirely on one device, or it may be split among several computers....”[3]
Software | Platform | DRM formats supported | Notes |
---|---|---|---|
Adobe Digital Editions | Windows, Mac OS X | Adobe Content Server | Requires online activation. |
Aldiko | Android | Adobe Content Server | Supports ePub for Android phones. |
BookGlutton | Web | ? | Free online ePub reader focussing on the social aspects of reading. |
calibre | Windows, Mac OS X, GNU/Linux | None | Primarily for library management, conversion, and transferring to devices, it includes a reader. "Calibre: About". http://calibre-ebook.com/about. |
CoolReader | Windows, GNU/Linux, Android | None | XML/CSS based E-Book reader for desktops and handheld devices. Supported formats: FB2, TXT, RTF, TCR, HTML, EPUB, CHM. Has GUI implementation for E Ink base devices. Most popular SourceForge epub application. |
Dorian | Symbian | ? | Free ePub reader. |
EPUBReader | Firefox add-on | None | Enables reading ePub-files from within Firefox. |
FBReader | Windows, GNU/Linux, PDAs | ? | Incomplete ePub support.[28] |
FBReaderJ | Android | ? | Open source. |
Google Books | Web application, Android, iOS | ? | Supports downloading purchased books as ePub and/or PDF. |
iBooks | iOS | FairPlay[29] | Books not readable directly on computers (Mac or PC) yet. |
Lexcycle Stanza | iOS, Windows, Mac OS X | Yes | Acquired by Amazon in 2009 |
Mobipocket | Windows, BlackBerry, Symbian, Windows Mobile | None | Converts EPUB into .PRC on import. |
NOOK for Mac | Mac OS X | ? | Need Barnes & Noble account just to read (free) |
Okular | KDE Platform | ? | |
readMe | iOS | ? | EPUB, FB2 and PDF support |
Software | Platform | Notes | |
---|---|---|---|
ABBYY FineReader | Windows | Commercial license. Version 11 exports to EPUB format. | |
Adobe InDesign | Windows, Mac OS X | Commercial license. Exports to EPUB format. Note that versions prior to 5.5 create EPUBs that require significant editing in order to pass ePubCheck or ePubPreFlight. Plan on using Sigil or studying Liz Castro's EPub Straight to the Point book if you want to make EPUBs using Indesign 5.0. | |
Atlantis Word Processor | Windows, Portable app | Converts any document to EPUB; supports multilevel TOCs, font embedding, and batch conversion. Shareware. | |
calibre | Windows, Mac OS X, GNU/Linux | Conversion software and e-book organizer. Free Software under the GPL license. | |
eLML | Windows, Mac OS X, GNU/Linux | The eLesson Markup Language is a platform-independent XML-based open source framework to create eLearning content. It supports various output formats like SCORM, HTML, PDF and also eBooks based on the ePub format. | |
Feedbooks | Web | Free cloud service for downloading public domain works and for self-publishing | |
iStudio Publisher | Mac OS X | Desktop publishing and page layout application. Commercial license. | |
Lulu.com | Web | Upload and convert .doc, .docx, or PDF manuscripts to an ePub. Then choose a title, create a cover, describe your ePub, and pick a price. It's free to publish and sell. | |
oXygen XML Editor | Mac OS X, Windows, Linux | XML Editor is the first tool which offers support for creating, transforming and validating documents composing the EPUB package. | |
Pages | Mac OS X | Word processor (part of the iWork '09 suite) that can export to EPUB format (Pages '09 only, and only with the iWork 9.0.4 update). | |
QuarkXPress | Mac OS X, Windows | Desktop Publishing Tool, Page Layout Application. Exports also to the ePUB format. Commercial license. | |
Serif PagePlus X6 | Windows | Desktop Publishing Program. Exports also to the ePUB format. Commercial license. | |
Scrivener | Windows, Mac OS X | Commercial program for writers. Includes organization capabilities for fiction writers. Publishes to multiple formats. | |
Sigil | Windows, GNU/Linux, Mac OS X | Free, Open source under GPLv3. Currently the only application that can also open and edit EPUB books, instead of just converting from other formats to EPUB. Does not currently support embedding video or audio in EPUB. | |
Jutoh | Windows, Mac OS X, Linux | WYSIWYG ebook editor-compiler. Exports to ePUB and Mobipocket (Kindle) formats. Commercial license. |
The boundary between hardware and software is not clear cut. Some of these devices are dedicated to e-book tasks while others are platforms that include e-book readers or can have them added. See Comparison of e-book readers for details of dedicated devices (not all support EPUB).
|