Uuencoding
From Wikipedia, the free encyclopedia
Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers' character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc. It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are transferred with base64 encoding.
Contents |
[edit] Encoded format
A file in uuencoded format starts with a header line of the form:
begin <mode> <file>
Where <mode> is the file's Unix read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data. The file ends with two trailer lines:
` end
The accent grave indicates a line encoding zero characters.
Lines between the header and trailer encode data. Each starts with a byte indicating the number of data bytes encoded on that line and ends with a newline character. All lines, except perhaps the last, encode 45 bytes of data. The corresponding encoded length value is 'M', so most lines begin with 'M'. If the count of data bytes is not divisible by three, one or two additional bytes of zero are appended. These are not included in the count at the beginning of the last line.
The line count is encoded by adding 32. In ASCII the first thirty-two characters are unprintable and controlled data transmission. They could be modified or deleted by transmission. The next ninety-five characters at code 32 and above are all printable. Since the line count is in the range 0-45, adding 32 converts it into a printable character. The ASCII code for 'M' is exactly 45+32. For a zero length line, adding 32 to 0 gives a space character. This character was also problematic for data transmission, so accent grave (`, code 96) is used instead. Subtracting 32 produces a value whose lower six bits are 0.
As a complete file, the uuencoded output for Cat would be
begin 644 cat.txt #0V%T ` end
The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.
The add-32 trick is used for encoding data bytes as well. Each three bytes of data are assembled as a 24 bit value. These 24 bits are split into four groups of six which are treated as numbers between 0 and 63. Decimal 32 is added to each number and they are output as ASCII characters which will lie in the range 32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.
Sometimes each data line has extra dummy characters (often the grave accent) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent (ASCII 96) can also be used in place of a space character.
[edit] Sample uuencoding
The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".
Original characters | C |
a |
t |
|||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Original ASCII, decimal | 67 | 97 | 116 | |||||||||||||||||||||
ASCII, binary | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
New decimal values | 16 | 54 | 5 | 52 | ||||||||||||||||||||
+32 | 48 | 86 | 37 | 84 | ||||||||||||||||||||
Uuencoded characters | 0 |
V |
% |
T |
[edit] Uuencode table
The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent (in octal).
six bits |
code char |
six bits |
code char |
six bits |
code char |
six bits |
code char |
|||
---|---|---|---|---|---|---|---|---|---|---|
00 | SP | 20 | 0 | 40 | @ | 60 | P | |||
01 | ! | 21 | 1 | 41 | A | 61 | Q | |||
02 | " | 22 | 2 | 42 | B | 62 | R | |||
03 | # | 23 | 3 | 43 | C | 63 | S | |||
04 | $ | 24 | 4 | 44 | D | 64 | T | |||
05 | % | 25 | 5 | 45 | E | 65 | U | |||
06 | & | 26 | 6 | 46 | F | 66 | V | |||
07 | ' | 27 | 7 | 47 | G | 67 | W | |||
10 | ( | 30 | 8 | 50 | H | 70 | X | |||
11 | ) | 31 | 9 | 51 | I | 71 | Y | |||
12 | * | 32 | : | 52 | J | 72 | Z | |||
13 | + | 33 | ; | 53 | K | 73 | [ | |||
14 | , | 34 | < | 54 | L | 74 | \ | |||
15 | - | 35 | = | 55 | M | 75 | ] | |||
16 | . | 36 | > | 56 | N | 76 | ^ | |||
17 | / | 37 | ? | 57 | O | 77 | _ | |||
00 | ` |
[edit] POSIX Base64 coding
Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. One attempt to fix the problem was the Xxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64 format; it can also be generated by the uuencode program. The header is changed to
begin-base64 <mode> <file>
the trailer becomes
====
and lines between are encoded with characters chosen from
ABCDEFGHIJKLMNOP QRSTUVWXYZabcdef ghijklmnopqrstuv wxyz0123456789+/
[edit] Trivia
Microsoft's E-mail-program Outlook Express once erroneously accepted "begin <filename>" as the start of UUEncoded attachments (i.e., not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom used[citation needed] and plain text is preferred, some people would embed begin, space, space in their messages in order to maliciously hide the rest of the message from Outlook Express users (e.g., they configured their news-client to quote starting with the line "begin quote from xxx")[1].
[edit] See also
[edit] References
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
[edit] External links
- Online UUencoder / UUdecoder
- GNU sharutils - The Free Software Foundation's sharutils bundle includes uuencode, uudecode, and others.
- UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
- UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
- StUU - Open Source fast UUDecoder for Macintosh by Stuart Cheshire