Quoted-printable
From Wikipedia, the free encyclopedia
Quoted-printable is an encoding using printable characters (i.e. alphanumeric and the equals sign "=") to transmit 8-bit data over a 7-bit data path. It is defined as a MIME content transfer encoding for use in Internet e-mail.
Contents |
[edit] Introduction
The basic Internet e-mail transmission protocol, SMTP, supports only ASCII characters (see also 8BITMIME). MIME defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English, using character encodings other than ASCII. However these encodings often use byte values outside the ASCII range so they need to be encoded further before they are suitable for use in e-mail. Quoted-printable encoding is one method used for mapping arbitrary bytes into sequences of ASCII characters. This encoding is reversible, meaning the original bytes and hence the non-ASCII characters they represent can be recovered.
Quoted-printable and Base64 are the two basic MIME content transfer encodings. If the input text is mostly ASCII, quoted-printable results in a fairly readable and compact encoded result. On the other hand if the input is not mostly ASCII then quoted-printable becomes both unreadable and extremely inefficient. Base64 is not readable but has a predictable overhead for all data and is the more sensible choice for binary formats or text in non Latin based languages.
[edit] Quoted-printable encoding
Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimal digits (0–9 or A–F) representing the byte's numeric value. For example, a US-ASCII form feed character (decimal value 12) can be represented by "=0C", and a US-ASCII equal sign (decimal value 61) is represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.
Printable ASCII characters except "=", i.e. those with decimal values between 33 and 126 excepting decimal value 61 (=), may be represented by themselves.
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values. Conversely if byte values 10 and 13 have meanings other than end of line then they must be encoded as =0A and =0D.
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line and does not cause a line break in the decoded text.