Talk:Base64
From Wikipedia, the free encyclopedia
The http://www.spammimic.com/encodepgp.shtml (PGP Mimic) page means one crazy use of base64-encoding.
Contents |
[edit] Split
The page currently is about two completely different things:
- the numeral system in base 64
- the base64 encoding
Actually, I think that the numeral system should be "Base 64" with a space, but I am not sure. Anyway, I think that a split with a disabiguation page is in order. Paolo Liberatore (Talk) 15:05, 30 November 2005 (UTC)
- I'm not sure i'd call them completely different after all a binary file is really just a very big number if you think about it. The page describes a numeral system and then goes onto describe its uses I see nothing wrong with this structure. Plugwash 15:47, 30 November 2005 (UTC)
- I rather agree, particularly since the numeral system isn't particularly notable except for its use in the encoding. —Ilmari Karonen (talk) 15:52, 30 November 2005 (UTC)
Sorry folks, I made a mistake here: in spite of what I remembered, the numeral system used by the Babylonians was base 60, not 64 (we also divide time in 60th for this reason). Obviously, there is not much to say about the numeral system, except that it is the base of the base64 encoding. I will remove the split tag. Paolo Liberatore (Talk) 17:00, 30 November 2005 (UTC)
[edit] Base64 in freeware applications
I updated recently this article to include the vast usage possible for Base64, including in freeware applications like Mozilla and Thunderbird.
The simplest of examples:
C:\Documents and Settings\<UNAME>\Application Data\Thunderbird\Profiles\<PRNUMBER>.default type signons.txt mailbox://henrique@venus \=username=\ ~ *\=password=\ TW9ua2V5
<UNAME> stands for your username in a Windows XP distribution, for example.
The password can be easily decoded, and is: Monkey.
This does not detract nor diminishes the great software provided by Mozilla (subjective opinion I know... that's why I posted this opinion in this discussion article). The majority of users will not notice these security flaws, nor even bother their personal data is subject of trojans in their desktops... that can be able to decode these passwords quite easily and deliver them worldwide.
Of course both Mozilla and Thunderbird offer an option for simetric cyphers (increasingly more difficult to decode) on all Managed Passwords.
I know this is not the right placeholder for software considerations: but I found outstandingly interesting to find even MUAs use the basic concepts of Mail-encoding (as is Base64 mainly used!) for obscuring plain-text passwords.
- Unless the user is asked to enter a key then the ONLY purpose encrypting the key serves is to prevent someone accidently remembering a password they shouldn't when poking arround in a config file. If someone has access to the encrypted password then they almost certainly have access to the key as well! Plugwash 01:26, 19 December 2005 (UTC)
[edit] mIRC trojans
mIRC trojans often use Base 64 as mIRC has functions for this inbuilt: $encode(text,m) and $decode(text,m). The trojans are spread over /amsg (message to all channels) or private messages and rely on the naiv trust of the users. They try to make users run commands encoded in Base 64 by claiming things like it will get them the latest Matrix movie, or operator (administrator) status in a certain channel. Some of them comes in the form '//write somename $decode(Base 64 encoded script,m) || .load -rs somename' and installs a script that keeps spreading this code, and sometimes comes with a backdoor. Other trojans hid the whole code by making use of brackets: //[ $decode(Base 64 encoded commands,m) ]' and can run any commands. Then there's the ones who make use of $findfile to execute commands and appearing to be a harmless /echo: '//echo $($decode(Base 64 encoded $findfile mostly executing /amsg $cb,m),2)' where $cb is the clipboard content which mostly is the command, and $(...,2) evaluates the $findfile decoded.
Perhaps someone could add a note on this in the article. I have never written in a WP article and feel abit lost.
[edit] UUU becomes VVVV
maybe mention $ echo -n UUU|base64-encode ;echo VVVV and say why, just for the fun of it.
[edit] UTF, really?
"This data encoding scheme is used to encode the UTF-16" Is it really doing this? I doubt. It's encoding unicode codepoints, just like utf-8, utf-16, ucs-2 do.
- The rfc for UTF-7 seems to actually date back to the days before supplementry characters so its no help, using UTF-16 surrogates would be the only sane way to support those planes in UTF-7 without massive changes but i do not know if current implementations do so. Plugwash 18:24, 17 June 2006 (UTC)
-
- UTF-7 is generally deprecated these days. Rootless 13:44, 18 June 2006 (UTC)
[edit] MIME Line breaks are <CR><LF>
From the article:
As newlines are inserted in the encoded data every 76 characters, the actual length of the encoded data is approximately 135.1% of the original.
To the best of my knowledge, MIME defines a line break as the character pair <CR><LF> (in that order). Therefore, every 57 bytes from the source is expanded to 76 Base64 characters + <CR> + <LF>, or 78 characters. This gives an expansion of approximately 136.8%.
Thiadmer Riemersma (thiadmer at compuphase dot com)
- Googling and reading the article newline seem to verify this, so I modified the article accordingly. –Mysid(t) 18:21, 8 August 2006 (UTC)
[edit] modified Base64 for URL
The section URL Applications contains a little paragraph about "modified Base64 for URL". However acording to the referenced page http://tools.ietf.org/html/rfc3548#page-6, it is wrong.
rfc3548 seems to think that URL and file name encodings use '-' and '_' instead of '+' and '/'. Not '*' and '-'.
And unless I am missing something they should also be with the padding '=' but as far as I know '=' is reserved for URLs... which would indicate that the current wiki text is more correct.
[edit] Example
I felt the example wasn't quite as intuitive as it could be, so I created the table version. (Sometime reader, new to editing.) aes
[edit] Material added by Ultimater
I have reverted the addition of the following material by user:Ultimater. I think there may be some merit in it, but I also think some more attention should be paid to style and formatting, before it is added to the article proper. E.g., use of whole-word capitalization, rethorical questions, sentences starting by "Notice...", "Remember..." and the like should be avoided or limited.--Niels Ø 11:31, 23 August 2006 (UTC)
[edit] Added before heading "An example"
Also notice that the length of each of the outputs are multiples of 4. Not only MUST every base64-encoded string consist of an even number of characters, the number of total characters MUST be evenly divisible by 4. The reason is because base64 is used to represent an exact binary sequence of data in groups 8 bits.
[edit] Added before heading "UTF-7"
Remember ; The text doesn't need to be exactly 3 characters in length. Notice the usage of the padding character.
Text content | M | a | ||||||||||||||||||||||
ASCII | 77 | 97 | ||||||||||||||||||||||
Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ||||||
Index | 19 | 22 | 4 | |||||||||||||||||||||
Base64-Encoded | T | W | E | = |
Notice that the equals character (the padding character) is appended to the generated base64-encoded string and ONLY when there is an empty slot in the Text content. The padding character will NEVER appear in the middle or beginning of a base64-encoded string. The padding character can be totally OMITTED from your base64-encoded string and it will not harm the string's contents. The reason is because the number of un-used bits can be recalcuated. However it's always a good idea to include the padding character in your strings.
It's possible to have two padding characters but NEVER three:
Text content | M | |||||||||||||||||||||||
ASCII | 77 | |||||||||||||||||||||||
Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | ||||||||||||
Index | 19 | 16 | ||||||||||||||||||||||
Base64-Encoded | T | Q | = | = |
Why won't you encounter 3 padding characters? Because the string is read 3 characters at a time and 3 padding characters would translate as 000000 000000 000000 000000 which is "AAAA" and can be totally ignored -- however feel free to add as many extra A's or padding characters to the end of your base64-encoded string as you wish.
Let's have a second look at our previous example base64-encoded string again:
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
Tell me, if you were to decode that string back into it's original ASCII form, how many characters would it consist of? How long would it take you to turn that into a sequence of 0's and 1's and to count the number of bits and divide it by 4 then calcuate the remainder so you know the number of unused bits? Who needs to count it!? Just count the number of padding characters at the end of the string (in this case one) and you will know the number of un-used bits (one padding character per every two un-used bits). Hence in this case, the length of the original string was 1 character short (the padding character is a blank slot) of being a multiple of three.
[edit] I don't understand the need for padding
I don't understand the need for padding. You can always tell the number of bytes of decoded chars from the encoded.
- 1 byte clear-> 2 bytes encoded
- 2 bytes clear -> 3 bytes encoded
- 3 bytes clear -> 4 bytes encoded
Why did they think padding was needed? -- Chris Q 13:25, 13 December 2006 (UTC)
- Good question, afaict the specs are silent on the matter too, the only reasons i can think of would possibbly be either poor understanding on the parts of the creators or possiblly an intention to allow encoded data to be concatenated without decoding. Plugwash 13:48, 13 December 2006 (UTC)
- It is a decoding optimisation. For decoding the input is always a multiple of 4, when you take padding into account. This means you can "read" the input as an int32_t in C. It also allows you to do a minor consitency check using the length of the input. -- James Antill 18:10, 14 December 2006 (UTC)