Talk:MD5

From Wikipedia, the free encyclopedia

This article is part of WikiProject Cryptography, an attempt to build a comprehensive and detailed guide to cryptography in the Wikipedia. If you would like to participate, you can choose to edit the article attached to this page, or visit the project page, where you can join the project and see a list of open tasks.

It is intended that this article be included in WikiReader Cryptography, a WikiReader on the topic of cryptography. Help and comments for improving this article would be especially welcome. A tool for coordinating the editing and review of these articles is the daily article box.

To-do list for MD5:	edit · history · watch · refresh
Summarise results of Berson Complete the (non-pseudocode) description of the MD5 algorithm Add information about the md5 collisions http://cryptography.hyperlink.cz/MD5_collisions.html

1 License
2 RSA's MD5 disclaimer
3 Sfv format article
4 Input
5 Round nonsense
6 What an effort - Unrealistic
7 Diagram
8 Implementations section
9 Disputed
10 Link to IBM p690 is broken
11 Move Down Photo
12 Example?
13 Pseudocode
14 key strengthening, wtf!?
15 MD5 with SHA-1?
16 Infinite collisions
17 Colliding executable files.

[edit] License

What kind of licence is MD5 under? Can it be used in properterial software?

I don't believe MD5 is patented, so you wouldn't need a license to use it. You might need a license to use Rivest's source code (in the RFC), though, since it's copyrighted. Some pieces of proprietary software (such as mIRC) use various prewritten libraries to perform MD5 hashing, so you might be able to use one of those libraries. -- Olathe November 17, 2003

"md5-announcement.txt" is the announcement from RSA Data Security that MD5 is being placed in the public domain for free general use. Anyone may write a program implementing the MD5 algorithm for any purpose.

RSA has written a reference implementation which is the source code in this directory. This source code is copyrighted by RSA. Here are the few copyright restrictions *with using this source code*. There is no restriction on any code which implements MD5 that you write yourself.

[edit] RSA's MD5 disclaimer

License to copy and use this software is granted provided that it is identified as the "RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing this software or this function.

License is also granted to make and use derivative works provided that such works are identified as "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing the derived work.

RSA Data Security, Inc. makes no representations concerning either the merchantability of this software or the suitability of this software for any particular purpose. It is provided "as is" without express or implied warranty of any kind.

These notices must be retained in any copies of any part of this documentation and/or software.

[edit] Sfv format article

Hello, can someone help clean up Sfv checksum format article, as well as perhaps have a list of checksum formats as well? crc32 is a format, which sfv uses, there must be others. --ShaunMacPherson 04:33, 14 Apr 2004 (UTC)

[edit] Input

We need to reword this better:

Wikipedia --> 20ee8f504f73e6894f328d1194280bcb

WIKIPEDIA --> b2f4895c3df311be0e3b07edc0974534

Firstly, we should probably avoid self-references, so we might be better off changing Wikipedia to something else; secondly, we need to be explicit in how "Wikipedia" is interpreted into the input bitstring used by MD5. Is this string represented as ASCII? — Matt

Feel free to have at it. I'll tell you what I was trying to do. I'd like to show somehow to the casual reader that hashes carry no easily observable characteristics of the inputs, so that something like ABC and ABCD or ABD will likely have dissimilar looking hashes. Probably belongs in cryptographic hash function or something like that, but might be a nice exercise here. Or if not, we can scrap it entirely :) Jewbacca 22:32, Aug 18, 2004 (UTC)

P.S. Found this in cryptographic hash function which expresses it well:

"Broadly speaking, the security properties are required to ensure that the digest is 'random' to prospective attackers, and does not leak any information about the message itself, and that other messages cannot be found that produce the same digest. Any change to the message, even a single bit, should result in a dramatically different message digest when re-generated from the received message."

Jewbacca 22:35, Aug 18, 2004 (UTC)

[edit] Round nonsense

Has anyone ever noticed that the concept of breaking X rounds out of a particular hash function's total is a little silly? For example, if you make a hash function out of just one group of 16 rounds in MD5, a chosen-hash attack can be done with Windows Calculator. The concept of "round" really ought to be taken to mean the number of times each input bit is reused. It is the simultaneous congruences that gives hash functions their security, and how you make those is by using each input bit more than once. -- Myria 07:21, 19 Oct 2004 (UTC)

What? --Ihope127 21:05, 11 July 2005 (UTC)

[edit] What an effort - Unrealistic

I understand that people are busy cracking these algorithms but to me, the effort required is just impossible and unrealistic. You really have to work hard perhaps for the rest of your life to get anything meaningful and by the way there is no system that is fool proof!

Simba

Hey Simba; what makes you think that the people that are busy cracking these algorithms are the ones writing encyclopedia articles, or that they will read this talk page? I suggest you duplicate your comments on sci.crypt for a better interaction with your target audience. — Matt 12:17, 19 Oct 2004 (UTC)

Does this mean that is possible to get the reverse of an MD5 hash within minutes now?

What do you mean by a reverse? mic 01:33, 29 May 2006 (UTC)

Sorry, I meant a collsion. Basically if I gave you an MD5 hash, could you tell me a collsion for it?

That's not a collision. A collision is where you create two texts which have the same hash - this has been done for MD5 many times and can be done within minutes. You're talking about a preimage, or a second preimage; no practical algorithm is currently known for this, except for generic algorithms based in guessing the original string.

As Matt Crypto says, sci.crypt is a better place for these questions; this page is for discussion of the article, not for discussion of MD5. — ciphergoth 07:26, 10 June 2006 (UTC)

[edit] Diagram

The picture does not correspond with the description of the algorithm:

to b the long expression is assigned to; in the algorithm this is a.

You're right... I fixed it. Sorry about that. -- Myria 18:19, 28 Dec 2004 (UTC)

[edit] Implementations section

I'm worried about this new section becoming a spam trap; Wikipedia isn't really meant to be a link farm. If people start adding lots of things to external links, then they can be easily removed under the argument that we only select a few high-quality links. However, if they are in an "Implementations" section, then that argument is weakened; it's hard to argue that someone's VB script MD5 school project (or whatever!) is not a valid addition to such a section. My suggestion is that we keep it as before. There's likely hundreds of MD5 implementations, and we don't really want to be listing them all. — Matt Crypto 17:12, 24 October 2005 (UTC)

We cannot keep it as it was before, because some of the links have become internal and cannot be listed under external links anymore. I didn't name the section "All implementations" however, and I'm all for culling and keeping only links that really add value. Personally I think one good implementation in C++, Java and VB is enough. People shouldn't use {VB|Java}Script for crypto in any case. Shinobu 16:41, 25 October 2005 (UTC)

Just get rid of it. It's just a spamtrap that will need constant surveillance, and Wikipedia is as always not a linkfarm. Haakon 20:07, 26 March 2006 (UTC)

I think I agree with you. We did this on SHA hash functions, and it's better for it. People can't seem to resist the temptation to advertise on our hash function pages, for some reason. It's been happening for months, if not years. Ideally, we should link instead to a page that lists MD5 implementations, and indeed we do, the "Unofficial MD5 homepage". — Matt Crypto 22:39, 26 March 2006 (UTC)

If you like you can direct these people to my wiki LiteratePrograms.org where I would be happy to accept their implementations as contributions. Sometimes they just need an outlet. Deco 10:13, 10 June 2006 (UTC)

[edit] Disputed

"... because the current collision-finding techniques allow the preceding hash state to be specified arbitrarily, a collision can be found for any desired prefix."

This seems to be claiming that preimage attacks exist for MD5, whereas I thought only collision attacks had been demonstrated. If it's not saying that, I think it needs to be rephrased for clarity. -- Antaeus Feldspar 22:33, 16 November 2005 (UTC)

It's not saying that preimage attacks exist, but that you can specify an arbitrary prefix in your collision. I believe that's accurate, or at least if the length of the prefix is a multiple of the message block size. That is, if you have a prefix X, you can find a collision of the form X || Y₁ and X || Y ₂ such that H(X || Y₁) = H(X || Y₂) (where || means concatenation, H is the hash function). — Matt Crypto 23:01, 16 November 2005 (UTC)

I think that's called length-extension, and it is a feature of most digest functions, so they can operate on unbounded streams of data, whithout requiring unbounded memory. 193.230.245.6 13:28, 17 November 2005 (UTC)

OK, I think I see what you're saying -- it's not saying that for any prefix, you can find another prefix which collides with it (which equates to a preimage attack); rather, it's saying that you can start with any one prefix and create two colliding files which share that prefix. Can we rephrase it to make the distinction more clear? -- Antaeus Feldspar 23:05, 17 November 2005 (UTC)

MD5 is no longer safe. plz look at: MD5 Collision Generation http://www.stachliu.com.nyud.net:8090/collisions.html

Well, we're aware that MD5 is vulnerable. The problem is that many people wrongly interpret what they've heard about MD5's vulnerability. Many of the security applications which use MD5 which people think are now broken are not, because in order to exploit them, you would have to find a way to derive, for a given MD5 hash, a file which has that hash. This is called a "preimage attack"; no preimage attacks against MD5 are known. What can be done against MD5 now, which compromises some security applications, is a "collision attack"; which means being able to create two files which will have the same hash -- even though it is not (currently) possible to control which hash that will be. Therefore, while some attacks are now possible with MD5, it's an exaggeration to declare with no clarification that it is "no longer safe". -- Antaeus Feldspar 18:59, 20 November 2005 (UTC)

[edit] Link to IBM p690 is broken

I am just trying to report a broken link IBM p690. According to IBM (http://www-03.ibm.com/servers/eserver/pseries/hardware/highend/p690.html) the p690 series is no longer on the market.

[edit] Move Down Photo

uhh motion to move down that photo to the first paragraph under heading 1... at first glance, it looked like i was looking at a biography of some dude named MD5...--Htmlism 00:11, 13 February 2006 (UTC)

Agreed and done. Deco 01:09, 13 February 2006 (UTC)

[edit] Example?

I've added (or just about to) add an external link to a site that gives someone an md5 hash. that is one thing that this page is missing http://www.instantmd5.com/

Moonrat506

I removed your link. There are countless such sites online, and they don't have much value for this article. Also, instantmd5.com is your own site, and you should generally not use Wikipedia as a vehicle for promoting your own sites. Please refer to WP:EL if you want to learn more about Wikipedia's policy for external linking. Thanks. Haakon 18:57, 11 April 2006 (UTC)

Agreed: please don't add your own site. — Matt Crypto 23:24, 11 April 2006 (UTC)

Thought it would be handy. Moonrat506 14:19, 12 April 2006 (UTC)

How come the external links section currently has a link to one site that can be used to lookup MD5 hashes, whereas all the other links to similiar sites have been removed? Any particular reason for doing so?

It's useful to have a link to one, but not particularly useful to have a dozen. — Matt Crypto 05:49, 9 May 2006 (UTC)

[edit] Pseudocode

That isn't pseudocode.

I second that! Is it possible for someone with an understanding of the algorithm to rewrite pls?

To the above, I don't know if you're getting unexpected results or not, but at first I did. But then I noticed it said "LITTLE-ENDIAN", after converting my big-endians to little-endians, everything worked. The pseudocode is good, just be aware of endian-ness.

[edit] key strengthening, wtf!?

"Also, it is a good idea to apply the hashing function (MD5 in this case) more than once—see key strengthening. It increases the time needed to encode a password and discourages dictionary attacks."

That is stupid. double md5 makes dictionary attacks and rainbow crack easier, because it makes content have known, fixed size and limited character set. It makes input with (theoretically) infinite complexity a simple string with 2^128 known combinations.

You are mistaken. 2^128 combinations is far too many for any table lookup approach to be practical today. However, applying a hash function multiple times increases the cost of a dictionary attack proportionately and can be an effective security measure. The term you are looking for is key stretching, not key strengthening.

Of course MD5 is no longer recommended for any application, but the point stands.

Also, please sign your messages by appending ~~~~ when you write them. Thanks! — ciphergoth 08:19, 13 June 2006 (UTC)

MD5 encryption? Should it not be MD5 coding.

[edit] MD5 with SHA-1?

This article says SHA-1 is now prefered over MD5. Naively, it seems that an MD5 sum and a SHA-1 sum, together, would be stronger than either one, even if MD5 is suspect on its own. Is that right in theory? In practice? —Ben FrantzDale 15:24, 15 June 2006 (UTC)

By "together", do you mean the new hash function being the concatenation of the two sums, or function composition? — Matt Crypto 19:21, 15 June 2006 (UTC)

Yes. As in, the MD5+SHA-1 of the empty file would be

   d41d8cd98f00b204e9800998ecf8427e da39a3ee5e6b4b0d3255bfef95601890afd80709

—Ben FrantzDale 20:35, 15 June 2006 (UTC)

I have been wondering about almost the same thing - see Talk:SHA_hash_functions#Combining_SHA1_.2F_MD5 - this would be function composition. I assume that there is an answer to this question around somewhere, as it appears to not be a new idea - any ideas where?

[edit] Infinite collisions

Since there are an infinite number of inputs but only a finite number of ouputs, does that mean MD5 technically has an infinite number of colliding inputs? If not, is there a way to calculate exactly how many collisions there are? --Tim1988 ^talk 17:39, 21 August 2006 (UTC)

Yep. It must have at least one collision by the pigeonhole principle. Further, assume you have a complete finite list of any inputs that are part of at least one colliding pair. Then consider the remaining inputs not on this list, of which there are a still-infinite number. Using the pigeonhole principle, you can again prove there's another collision amongst those remaining, contradicting the assumption that the finite list is complete. Hence there must be an infinite number of colliding inputs. Of course, proving existence is easy; finding them is, or rather was, the tricky bit. — Matt Crypto 19:05, 21 August 2006 (UTC)

Thank you for the explanation :) --Tim1988 ^talk 21:42, 22 August 2006 (UTC)

[edit] Colliding executable files.

User:Superm401 added a ^{[citation needed]} tag to this text in secition "Applications": "Now that it is easy to generate MD5 collisions, though, it is possible for the person who creates the file to create a second file with the same checksum, so this technique cannot protect against some forms of malicious tampering."

I removed the tag since it is a true statement and we already do have have a refence for it. There is a link to a detailed description of how to do it in the "External links" sections. That is the link Two colliding executable files.

If you know a little about how executables work and how the 2005 MD5 attack works and a little about hacking it is a pretty easy trick to do. Here is an explanation how to do that attack I came up with even before seing that link:

The 2005 MD5 attack works by manipulating a small number of bytes in a special fixed position near the beginning of a file. So only some bytes are changed when doing collisions with that attack. The rest of the file will keep the exact same content. So the attacker codes up a nice executable that contains a bunch of functions that does good work. But he also designs the functions so they can do evil work if called in the right order or with the right parameters. And in the special position in the executable that he will have to change to create a colliding executable he puts a constant value in the source code. That is, he stores some constant in that position. When he manipulates that position to find collisions that constant will change, but no other part of the program. Then at runtime the program can check that constant in an if-statement. If the constant is the original one the executable behaves nice. If the constant is the changed one the executable then choses to do its evil work. So the attacker first publishes the nice version of the file and people will test it and say it is nice and publish the MD5 sum of it. Later he will publish the evil version of the file instead. And that version has the same MD5 sum but does evil work instead.

There are many other ways to use the (random) change of bytes in that position. This was just one simple example.

--David Göthberg 23:53, 25 October 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../m/d/5/Talk%7EMD5_265d.html"

Categories: To do | To do, priority undefined