Ascii85

From Wikipedia, the free encyclopedia

Ascii85 is a form of binary to text encoding developed by Adobe Systems. It is more efficient at encoding binary data as ASCII characters than Base64, resulting in only an approximately 25% increase in data size versus 33% for base64.

It is mainly used by Adobe's PostScript and Portable Document Format file formats.

[edit] Encoding and decoding method

When encoding, the binary data is divided into groups of four bytes, which are treated as a 32-bit number, the first byte being the most significant (i.e., big-endian order). This number is then encoded as five numbers between 0 and 84 by repeatedly dividing it by 85. In fact, it is converted to base 85. Again, the first number is the most significant. Five numbers is enough, because 855 is greater than 232. These numbers are then encoded as printable characters by adding 33 to them, giving the ASCII characters 33 ("!") to 117 ("u").

Decoding uses the reverse procedure.

Further details:

  • Conventionally, the encoded data starts with "<~", but this is not always the case.
  • If a group contains only zero bytes, it is encoded as "z" instead of "!!!!!".
  • At the end of the data, the last group can have fewer than four bytes. Virtual zero bytes are appended to the data, and after encoding, if there was one byte, only two characters are output; if there were two bytes, three characters are output; and if there were three bytes, four characters are output. The "z" case does not apply here. This way, the length of the original data can be determined by a reader of the encoded data.
  • After the encoded data, the two characters "~>" are output, to further mark the end of the data.
  • Whitespace characters, like newlines, can be added at will. Other characters cause a decoding error.
  • Groups of characters that decode to a value greater than 232 − 1 will cause a decoding error, as will "z" characters in the middle of a group, and a final group consisting of only one character.

[edit] Example

The (historic) slogan of Wikipedia:

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

encoded in Ascii85 is as follows:

 <~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
 O<DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
 i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
 l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
 >uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>
Text content M a n
ASCII 77 97 110 32
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 0 0 0
32-bit Value 1,298,230,816
Base 85 (+33) 24 (57) 73 (106) 80 (113) 78 (111) 61 (94)
ASCII 9 j q o ^

[edit] Resources

In other languages