Uuencoding

From Wikipedia, the free encyclopedia

Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers' character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc. It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are transferred with base64 encoding.

Contents

[edit] Encoded format

A file in uuencoded format starts with a header line of the form:

begin <mode> <file>

Where <mode> is the file's Unix read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data. The file ends with two trailer lines:

`
end

The accent grave indicates a line encoding zero characters.

Lines between the header and trailer encode data. Each starts with a byte indicating the number of data bytes encoded on that line and ends with a newline character. All lines, except perhaps the last, encode 45 bytes of data. The corresponding encoded length value is 'M', so most lines begin with 'M'. If the count of data bytes is not divisible by three, one or two additional bytes of zero are appended. These are not included in the count at the beginning of the last line.

The line count is encoded by adding 32. In ASCII the first thirty-two characters are unprintable and controlled data transmission. They could be modified or deleted by transmission. The next ninety-five characters at code 32 and above are all printable. Since the line count is in the range 0-45, adding 32 converts it into a printable character. The ASCII code for 'M' is exactly 45+32. For a zero length line, adding 32 to 0 gives a space character. This character was also problematic for data transmission, so accent grave (`, code 96) is used instead. Subtracting 32 produces a value whose lower six bits are 0.

As a complete file, the uuencoded output for Cat would be

begin 644 cat.txt
#0V%T
`
end

The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.

The add-32 trick is used for encoding data bytes as well. Each three bytes of data are assembled as a 24 bit value. These 24 bits are split into four groups of six which are treated as numbers between 0 and 63. Decimal 32 is added to each number and they are output as ASCII characters which will lie in the range 32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.

Sometimes each data line has extra dummy characters (often the grave accent) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent (ASCII 96) can also be used in place of a space character.

[edit] Sample uuencoding

The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".

Original characters C a t
Original ASCII, decimal 67 97 116
ASCII, binary 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0
New decimal values 16 54 5 52
+32 48 86 37 84
Uuencoded characters 0 V % T

[edit] Uuencode table

The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent (in octal).

six
bits
code
char
six
bits
code
char
six
bits
code
char
six
bits
code
char
00 SP   20 0   40 @   60 P
01 !   21 1   41 A   61 Q
02 "   22 2   42 B   62 R
03 #   23 3   43 C   63 S
04 $   24 4   44 D   64 T
05 %   25 5   45 E   65 U
06 &   26 6   46 F   66 V
07 '   27 7   47 G   67 W
10 (   30 8   50 H   70 X
11 )   31 9   51 I   71 Y
12 *   32 :   52 J   72 Z
13 +   33 ;   53 K   73 [
14 ,   34 <   54 L   74 \
15 -   35 =   55 M   75 ]
16 .   36 >   56 N   76 ^
17 /   37 ?   57 O   77 _
                  00 `

[edit] POSIX Base64 coding

Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. One attempt to fix the problem was the Xxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64 format; it can also be generated by the uuencode program. The header is changed to

begin-base64 <mode> <file>

the trailer becomes

====

and lines between are encoded with characters chosen from

ABCDEFGHIJKLMNOP
QRSTUVWXYZabcdef
ghijklmnopqrstuv
wxyz0123456789+/

[edit] Trivia

Microsoft's E-mail-program Outlook Express once erroneously accepted "begin <filename>" as the start of UUEncoded attachments (i.e., not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom used[citation needed] and plain text is preferred, some people would embed begin, space, space in their messages in order to maliciously hide the rest of the message from Outlook Express users (e.g., they configured their news-client to quote starting with the line "begin quote from xxx")[1].

[edit] See also

[edit] References

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

[edit] External links