Comparison of e-book formats

From Wikipedia, the free encyclopedia

The following is a comparison of e-book formats used to create and publish e-books.

A writer or publisher has many options when it comes to choosing a format for production. While the average end-user might arguably simply want to read books, every format has its exponents and champions, and debates over "which format is best" can become intense. The myriad of e-book formats is sometimes referred to as the "Tower of eBabel". For the average end user to read a book, every format has its advantages and disadvantages. Formats available include, but are by no means limited to:

Contents

[edit] Plain text files

E-books in plain text exist and are very small in size.

[edit] Hypertext Markup Language

Commonly known as HTML

HTML is the markup language used for most web pages. E-books using HTML can be read using a standard browser (e.g., Mozilla, Firefox, or Microsoft Internet Explorer), with no need for special equipment. These files can be in ASCII format or in Unicode formats like UTF-8.

HTML has the disadvantage that it is somewhat complex and offers an additional learning hurdle before one can write in it. While it offers considerably superior layout control compared to plain text, it is significantly more difficult for the lay person to create pages in it. WYSIWYG HTML editors overcome this to a large degree, but not completely. Naturally, for one simply interested in reading something, this is not a consideration, as the complexity of the file is behind the scenes.

However, it does enlarge the document considerably, requiring more storage space for a given work, even if images are not used to illustrate it. With modern memory being less and less expensive, this is not the challenge it once was.

[edit] Open Electronic Book Package Format

OPF is an XML-based e-book format created by E-Book Systems.

[edit] TomeRaider

Published as a .tr2 or .tr3

The TomeRaider e-book format is a proprietary format. There are versions of TomeRaider for Windows, Windows Mobile (aka Pocket PC), Palm, Symbian and more. Several Wikipedias are available as TomeRaider files with all articles unabridged, some even with nearly all images. Capabilities of the TomeRaider3 ebook reader vary considerably per platform: the Windows and Windows Mobile editions support full HTML and CSS. The Palm edition supports limited HTML (e.g. no tables, no fonts), and CSS support is missing. For Symbian there is only the older TomeRaider2 format, which does not render images or offer category search facilities. Despite these differences any TomeRaider ebook can be browsed on all platforms. Tomeraider is popular among readers because of its huge free document base. According to their records the Tomeraider Website has over 4000 free ebooks to read. The IMDB movie database is also as a regularly updated Tomeraider ebook. Tomeraider developers have recently developed full Wikipedia (English version up to 2007 December data) as an ebook, which is 3.3GB file. You can download the file here.

[edit] Arghos Diffusion

Published as .arg

The ARG format is an XML-based proprietary format developed by the french firm Arghos Diffusion.

ARG files use a proprietary DRM and encryption method and are readable only in the Arghos Player.

It supports various input formats for text, audio or video, such as PDF, WMA, MP3, WMV, and allows multiple interactive functions such as bookmarking, advanced plain-text searching, dynamic text highlighting, etc.

[edit] Flip Books

A "Flip Book" is a type of E-Book distinguished by virtual pages that actually "flip", much like turning pages of paper in a real book or magazine. The first dynamic Flip Book Reader was developed in 2003/2004 by Interaxive Media for Nishe Media (Canada) and was therefore called "Nishe Pages". The first version was produced in part by Cybaris (Canada) and was first publicly showcased in August 2004. Soon thereafter, many copycat "flip books" started appearing thanks to technological advances in Macromedia Flash, mostly hardcoded using Flash components. The original software remains unique in that it is powered by a complete server-based CMS system that allows the books to be created, published, and viewed remotely from a web server without requiring any custom software to be installed. Nishe Media went defunct in 2004, leaving the unfinished software to Interaxive Media who continued its development in Hong Kong. Though not widely used outside of Asia, it is now at version 3.0 and is arguably the most advanced server-based E-Book platform. It remains privately held by the original developer, Ryan Sutherland, owner and founder of Interaxive Media.

[edit] NISO Z39.86 Format

Commonly known as DAISY

DAISY is an XML-based e-book format created by the DAISY international consortium of libraries for people with print disabilities.

DAISY implementations have focused on two main types: audio e-books and text e-books. A subset of the DAISY format has been adopted by law in the United States as the National Instructional Material Accessibility Standard, and K-12 textbooks and instructional materials are now required to be provided to students with disabilities.

[edit] FictionBook

Published as a .fb2

FictionBook is a popular XML-based e-book format, supported by free readers such as Haali Reader and FBReader. See http://haali.cs.msu.ru/pocketpc/FictionBook_description.html

[edit] Text Encoding Initiative

TEI Lite is the most popular of the TEI-based (and thus XML-based or SGML-based) electronic text formats.

[edit] Plucker

Plucker is a free e-book reader application with its own associated file format and software to automatically generate plucker files from HTML files, web sites or RSS feeds. The format is a compressed HTML archive, somewhat like Microsoft's CHM.

[edit] CHM Format

Also known as Microsoft Compressed HTML Help

CHM format is a proprietary format based HTML. Multiple pages and embedded graphics are distributed along with proprietary metadata as a single compressed file. In contrast, in HTML, a site consists of multiple HTML files and associated image files in standardized formats.

[edit] Portable Document Format

Published as a pdf

A file format created by Adobe Systems, initially to provide a standard form for storing and editing printed publishable documents. Because documents in .pdf format can easily be seen and printed by users on a variety of computer and platform types, they are very common on the World Wide Web. But since they are designed to reproduce page images, and the text cannot be re-flowed to fit the screen width, PDF files designed for printing on standard paper sizes are hard to view on screens with limited size or resolution.

Adobe has addressed the issue of people viewing PDF files on smaller screens as are found on PDAs (Personal Digital Assistants). Adobe's Acrobat Reader for the PDA now has a re-flow facility. Unfortunately, certain settings need to be in place at the time the PDF document is created for it to be re-flow-able, which means many existing PDF documents won't benefit from this new feature. These settings can be found at Reflow the contents of Adobe PDF documents: Tutorial.

PDF files are created mainly using Adobe Acrobat, but Acrobat Capture and other Adobe products also support their creation, as do third-party products such as PDFCreator, OpenOffice.org, and FOP. Acrobat Reader (now simply called Adobe Reader) is Adobe's product used to view PDF files. PDF files typically contain product manuals, brochures, magazine articles, or flyers as they can embed fonts, images, and other documents. A PDF file contains one or more page images, each of which you can zoom in on or out from. The PDF format can include interactive elements such as buttons for forms entry and for triggering sound and Quicktime or AVI movies. Acrobat PDF files are optimized for the Web by rendering text before graphic images and hypertext links. Adobe's PDF-like e-book format is incorporated into their reader.

PDF files are supported on the following e-book readers: Sony Reader, Bookeen Cybook and indirectly also Amazon Kindle.

[edit] PostScript

Published as .ps

PostScript is a page description language used primarily in the electronic and desktop publishing areas for describing the contents of a printed page in a higher level than the actual output bitmap.

[edit] DjVu

Published as .djvu

DjVu is a file format that has been long in obscurity, but that is starting to surface now that free tools to manipulate the files are available.

DjVu is a format that particularly excels in storing scanned images. There are even advanced compressors especially specializing in low-color images, such as text documents. Individual files may contain single pages, or they can be collections of multiple pages.

The images are divided in separate layers (such as multi-color, low-resolution, lossily-compressed background layer, and few-colors, high-resolution, tightly-compressed foreground layer), each compressed in best applicable method. The files are also designed to decompress very fast, even faster than vector-based formats.

The advantage of DjVu is that it is possible to take a high-resolution scan (300-400 DPI), good enough for both on-screen and printing, and store it very efficiently. Several dozens of 300 DPI black-and-white scans can be stored in less than a megabyte.

[edit] Microsoft LIT

Published as an .lit

LIT files are only readable in the proprietary Microsoft Reader program, as the .LIT format, otherwise similar to Microsoft's CHM format, includes Digital Rights Management features.

There is however a tool, Convert Lit, which can convert .lit files to HTML files or OEBPS files.

The MS reader uses patented ClearType display technology. In Reader navigation works with a keyboard, mouse, stylus, or through electronic bookmarks. The Catalogue Library records reader books in a personalized "home page", and books are displayed with ClearType to improve readability. A user can add annotations and notes to any page, create large-print e-books with a single command, or create free-form drawings on the reader pages. A built-in dictionary allows the user to look up words.

[edit] eReader (formerly Palm Digital Media/Peanut Press)

Published as a .pdb

eReader is a program for viewing Palm Digital Media electronic books. Versions are available for PalmOS, Symbian OS, Windows Mobile Pocket PC/Smartphone, desktop Windows, and Macintosh. The reader shows text one page at a time as paper books do. eReader supports embedded hyperlinks and images. Most eReader formatted books are encrypted, with the key being the purchaser's full name and credit card number. This information is not stored in the ebook though. A one-way hash is used, so there is little or no risk of the user's information being extracted.

[edit] Desktop Author

Published as a .DNL or EXE

Desktop Author is an electronic publishing suite that allows creation of digital web books with virtual turning pages. Digital web books of any publication type can be written in this format, including brochures, e-books, digital photo albums, e-cards, digital diaries, online resumes, quizzes, exams, tests, forms and surveys. DesktopAuthor packages the e-book into a ".dnl" or ".exe" book. Each can be a single, plain stand-alone executable file which does not require any other programs to view it. DNL files can be viewed inside a web browser or stand-alone via the DNL Reader.

[edit] DNL Reader

DNL format is an e-Book format, one which replicates the real life alternative, namely page turning Books. The DNL e-Book is developed by [DNAML Pty Limited] an Australian company established in 1999. THE DNL e-Book can be produced using DeskTop Author or DeskTop Communicator.

[edit] Newton eBook

Published as an ."pkg" and more commonly known as an Apple Newton book; a single Newton package file can contain multiple books (for example, the three books of a trilogy might be packaged together).

All systems running the Newton operating system (the most common include the Newton MessagePads, eMates, Siemens Secretary Stations, Motorola Marcos, Digital Ocean Seahorses and Tarpons) have built-in support for viewing Newton books. The Newton package format was released to the public by Newton, Inc. prior to that company's absorption into Apple Computer. The format is thus arguably open and various people have written readers for it (writing a Newton book converter has even been assigned as a university-level class project[1]).

Newton books have no support for DRM or encryption. They do support internal links, potentially multiple tables of contents and indexes, embedded grayscale images, and even some scripting capability (for example, it's possible to make a book in which the reader can influence the outcome).[2]

Newton books utilize Unicode and are thus available in numerous languages.

An individual Newton book may actually contain multiple views representing the same content in different ways (such as for different screen resolutions).

[edit] Apabi

Published as ".xeb" or ".ceb".

Apabi is a format deviced by Founder Electronics. It is a popular format for Chinese e-books. It can be read using the Apabi Reader software, and produced using Apabi Publisher. Both .xeb and .ceb files are encoded binary files. The Iliad e-book device includes an Apabi 'viewer'

[edit] iPod Notes

Notes is a feature of iPod that allows short text notes to be displayed on the iPod screen. As the size limit for one note is 4096 bytes, there are some tools that create the notes from the longer plain text file. Basic HTML is allowed, but otherwise the format is plain text only.

[edit] Libris

Published as ".lbr" or ".bin".

Libris is a Java based eBook reader for mobile devices such as cell phones. Libris will run on most Java enabled devices that support MIDP. The reader formats books to fit the device screen, and shows one page at a time using high quality anti-aliased fonts. Books may employ encryption or be unrestricted. Libris content may be produced using the MakeLibris tool. The Libris reader also supports the PalmDoc format.

[edit] Mobipocket

Published as a .prc or .mobi.

The Mobipocket e-book format based on the Open eBook standard using XHTML can include JavaScript and frames. It also supports native SQL queries to be used with embedded databases. There is a corresponding e-book reader. A free e-book of the German Wikipedia has been published in Mobipocket format; see [3].

The Mobipocket Reader has a home page library. Readers can add blank pages in any part of a book and add free-hand drawings. Annotations — highlights, bookmarks, corrections, notes, and drawings — can be applied, organized, and recalled from a single location. Mobipocket Reader has electronic bookmarks, and a built-in dictionary

The reader has a full screen mode for reading and support for many PDAs, Communicators, and Smartphones. Mobipocket products support most Windows, Symbian, BlackBerry and Palm operating systems, but not Linux or Macintosh.

The Amazon Kindle's AZW format is basically just the Mobipocket format with a slightly different serial number scheme (it uses an asterisk instead of a Dollar sign).

Mobipocket is working on an .epub to .mobi converter called mobigen. See [4].

[edit] IDPF

Published as .epub

The .epub or OEBPS format is an open standard for eBooks created by the International Digital Publishing Forum (IDPF). It combines three IDPF open standards:

  • Open Publication Structure (OPS) 2.0, which describes the content markup (either XHTML or Daisy DTBook)
  • Open Packaging Format (OPF) 2.0, which describes the structure of an .epub in XML
  • OEBPS Container Format (OCF) 1.0, which bundles files together (as a renamed ZIP file)

Currently, the format can be read by Adobe Digital Editions, Lexcycle Stanza, and the Mozilla Firefox plugin OpenBerg Lector. Several other reader software programs are currently implementing support for the format, such as dotReader, FBReader, Mobipocket and Okular.

[edit] SSReader 超星 数字图书馆

Published as .pdg

The digital book format used by a popular digital library company 超星数字图书馆[5] in China. Basically it's a proprietary raster image compression and binding format, with reading time OCR plugin modules. The company scanned a huge number of Chinese books in the China National Library and this becomes the major stock of their service. The detailed format is not published. There are also some other commercial ebook formats used in Chinese digital libraries.

[edit] See also

  • e-book device
  • ebookwise-1150 ebook reader device [6]
  • ebook reader articles at Mobile Read Wiki [7]