Talk:Cryptographic hash function

From Wikipedia, the free encyclopedia

	*On 8 February 2007, Cryptographic hash function* was linked from Wired News, a high-traffic website.** All prior and subsequent edits to the article are noted in its revision history.

This article is part of WikiProject Cryptography, an attempt to build a comprehensive and detailed guide to cryptography on Wikipedia. If you would like to participate, you can choose to edit the article attached to this page, or visit the project page, where you can join the project and see a list of open tasks.

It is intended that this article be included in WikiReader Cryptography, a WikiReader on the topic of cryptography. Help and comments for improving this article would be especially welcome. A tool for coordinating the editing and review of these articles is the daily article box.

To-do list for Cryptographic hash function:	edit · history · watch · refresh
Terminology; a lot of alternative names for the crypto properties and functions which hold them; distinction between Keyed and unkeyed hash functions (stick to unkeyed here); ~~Discussion of the "Merkle-Damgård structure" that MD4/5, SHA etc follow; a diagram would be appropriate.~~ ~~Hash functions constructed from block ciphers - Davies-Meyer etc. (Applied Cryptography goes into detail on these)~~ ~~Hash functions used to construct other primitives; e.g. block ciphers from hash functions (e.g. SHACAL, BEAR and LION), stream ciphers (SEAL), MACs from hash functions (HMAC) and PRNGs.~~ Discuss recommended sizes for hash functions; quantify "hard", MD5CRK. Perhaps mention the birthday paradox? Provide a little detail about specific, popular hash functions Give an example of Yuval's collision attack on signing hashed messages. History? regarding this statement in the article, " Therefore, Alice writes down her solution, appends a random nonce, computes its hash and tells Bob the hash value (whilst keeping the solution secret)." Please clarify if Alice gives Bob the nonce in addition to the hash. Discuss reverse lookup tables (such as http://md5.crysm.net/)

To-do list for Cryptographic hash function:

edit · history · watch · refresh

Terminology; a lot of alternative names for the crypto properties and functions which hold them; distinction between Keyed and unkeyed hash functions (stick to unkeyed here);
~~Discussion of the "Merkle-Damgård structure" that MD4/5, SHA etc follow; a diagram would be appropriate.~~
~~Hash functions constructed from block ciphers - Davies-Meyer etc. (Applied Cryptography goes into detail on these)~~
~~Hash functions used to construct other primitives; e.g. block ciphers from hash functions (e.g. SHACAL, BEAR and LION), stream ciphers (SEAL), MACs from hash functions (HMAC) and PRNGs.~~
Discuss recommended sizes for hash functions; quantify "hard", MD5CRK. Perhaps mention the birthday paradox?
Provide a little detail about specific, popular hash functions
Give an example of Yuval's collision attack on signing hashed messages.
History?
regarding this statement in the article, " Therefore, Alice writes down her solution, appends a random nonce, computes its hash and tells Bob the hash value (whilst keeping the solution secret)." Please clarify if Alice gives Bob the nonce in addition to the hash.
Discuss reverse lookup tables (such as http://md5.crysm.net/)

1 Revision of article
2 Online references for editing
3 nor should an attacker learn anything useful about a message given only its digest besides the digest itself
4 Preimage resistance
5 CRCs
6 Hash functions based on block ciphers
7 Jacksum
8 Effective size?
9 Revealing properties?
10 Are hash methods theoretically viable?
11 NIST has been having some workshops on cryptographic hashes
12 One-way encryption
13 Definition of collision resistance
14 Image choise
15 Clarification required

[edit] Revision of article

I agree that the account given here currently is accurate. It is however, quite abstract and so less than useful to a large class of readers. Since the topic is a fundamental one in modern cryptographic practice, I suggest that this article be revised to include some text giving more context. In particular, problems in using such function should be noted. The article message digest errs in almost the inverse way. Perhaps a combination of the two with redirect pointers?

I will consider taking it on, as time allows, if no one else does.

I've merged these (though it needs reorganising, I think). — Matt 22:18, 19 Jul 2004 (UTC)

— Matt 01:14, 21 Jul 2004 (UTC)

Crikey! This article needs some serious editing! I will also be coming back from time to time to touch it up. Ruptor 04:03, 21 February 2007 (UTC)

Feel free to put these observations on the todo list as you wish. The article needs to discuss at least qualitatively what "hard" is meant in the properties section. Examples or just order of magnitude of the computation involved would be fine. A link to and/or a quick summary of what computational infeasibility means are needed too. Also of course, a summary of the different CHF's and their features/similarities. - Taxman 20:23, Jul 29, 2004 (UTC)

[edit] Online references for editing

[2] [3] [4]

I feel the intro is too long (i.e a section heading should be inserted after the first paragraph) but I can't think of anything suitable. Arvindn 07:33, 30 Jul 2004 (UTC)

I'd say the puzzle solution example is about commitment more than timestamping. Lunkwill 00:27, 5 Aug 2004 (UTC)

On "dramatically different" - this is a requirement of hash functions, it's the Strong Avalanche Criterion.

On "no information about" - an attacker can trivially determine one piece of information about M from H(M), namely the value of H(M). I understand the spirit of what you're trying to get across but no-one's ever found a way to express it formally in a way that's not impossible to satisfy.

ciphergoth 11:24, 2005 Jan 10 (UTC)

"Some of the following algorithms are known to be insecure"

please note the ones that are and how significant each is. - Omegatron 16:44, Apr 14, 2005 (UTC)

There is some discussion of the security elsewhere in the article, but here I would suggest we just remove the warning altogether. Just because we list a hash function design doesn't carry any implication that it's somehow endorsed (any more than a "list of prisons" article would carry the disclaimer that "some of the following prisons are known to have had inmates escape"). — Matt Crypto 19:10, 14 Apr 2005 (UTC)

Why not? That's the information I was looking for in this case. Several of the articles have something like "this version is no longer considered secure" in the first description paragraph. - Omegatron 19:18, Apr 14, 2005 (UTC)

A list of which algorithms are how insecure is useful in choosing which one to apply for a particular purpose. A list of prisons, however, would not be quite so useful in this way in that the location of a prison can be more important than the security: a prisoner shipped off to the moon is quite unlikely to escape, however putting the prisoner there in the first place is unfeasible. --Ihope127 19:05, 10 August 2005 (UTC)

[edit] nor should an attacker learn anything useful about a message given only its digest besides the digest itself

Two editors (one anon) have argued that the last four words should be removed. I'm really against this; to my eyes, it's a vitally important caveat. Without that caveat, we would be able to define precisely what it meant for the attacker to learn nothing about the message. As it is, they definitely learn at least one thing: the digest of the message is x. This seemingly trivial fact completely frustrates any effort to come to a more precise definition of what it means for the attacker not to learn anything about the message. It's an absolutely vital difference.

I won't revert it a third time if it's removed again, but I'd really like to discuss it here first if possible. Thanks. — ciphergoth 08:30, 24 October 2005 (UTC)

I agree with your revert and edits ciphergoth. It is a vital difference since in some cases knowing the digest itself can make a difference. (I have worked professionally with crypto protocol design and teached it at universities.) --David Göthberg 11:13, 24 October 2005 (UTC)

[edit] Preimage resistance

User:C4chandu removed the discussion of preimage resistance from the statement of what the hash function is supposed to be able to do. Why? If anything, we should be extending that section to add second preimage resistance as well. — ciphergoth 08:30, 24 October 2005 (UTC)

[edit] CRCs

CRCs are not just used because of speed and convenience. They have the property that any error of less than a certain number of bits, or spanning less than a certain portion of the message, is guaranteed to change the CRC. In such circumstances a pseudorandom function may by chance give the same value.

I'm not sure how much that advantage matters (more if the checksum is short), but that's always how they've been sold as I understand it. — ciphergoth 15:04, 7 February 2006 (UTC)

Well, first of all if we should mention why CRCs are used in network protocols instead of cryptographic hashes then we should first mention the primary reasons: CRCs are easier to implement and much faster then cryptographic hash sums. This I am sure of and would not revert if added to the article. Perhaps we should add that?

And yes, I think CRCs to some extent are supposed to guarantee that short bursts of damaged bits (shorter then the length of the CRC output) should not cause the same CRC. But hashes as you point out are more random in nature and thus might cause the same hash in that case. But I am not sure if that is true so I would not add that to the article. And I also think it might be too much details on CRCs.

Thirdly: Yes, there might be some properties in CRCs that makes them more suitable for trying to figure out which bits got damaged and try to repair them. Since CRCs are sort of more predictable. Or it is just that they are faster and thus it is faster to try different fixes and see which one that matches the CRC. I don't know which is true and you don't seem to be certain either. And I think this also is the most unusual reason to use CRCs. So I don't think we should have it in the article.

--David Göthberg 02:34, 8 February 2006 (UTC)

CRCs can't directly be used for error correction, because there are typically many equally likely corrections that would fix the CRC. However, error correcting codes are usually based on the same kind of mathematics (ie Galois fields): see for example convolutional codes.

I am sure about this assertion. I am also sure that with a CRC, you can assert whole useful classes of error that are guaranteed to cause a change in the CRC, while with an n-bit cryptographically strong hash (or MAC), any error has a 2^-n chance of going undetected.

CRCs are not that fast or easy to implement. Something like Adler32 is easier to implement and much faster than any CRC. CRCs are used because there's a mathematical reason to think they'll make particularly good error detecting codes in a channel that introduces very few one-bit errors. I am pretty sure about this too.

FTR, I'm not advocating the use of CRCs - I'd like to see real MACs used on most channels. Note that CRC32 takes 4.8 cycles/byte; Poly1305 runs at 3.8 cycles/byte; Adler32 1.3 cycles/byte. — ciphergoth 09:53, 8 February 2006 (UTC)

I can't comment on Adler32, but CRC is far simpler and faster than a cryptographic hash function, and much easier to implement in hardware. It's also far more likely to let an error slip through than a cryptographic hash, for reasonable sizes of each (32 bits for CRC, 128-256 for a hash). There are standard ways of creating errors which will pass CRC; find /any/ error which passes HMAC and you can become instantly famous. You certainly won't find it by random guessing; 2^-128 is far too large of a large number. Here are benchmarks for CRC-32 vs. SHA-1 on a 100MB file on my machine:

[jason@erg] ~/tmp$ cfv -t crc -C foo

tmp.crc: 1 files, 1 OK. 0.427 seconds, 228625.5K/s

[jason@erg] ~/tmp$ cfv -t sha1 -C foo

tmp.sha1: 1 files, 1 OK. 2.241 seconds, 43585.7K/s

As to using truncated (to say, 32 bits) cryptographic hashes as a CRC replacement, I'm not even going to go there because it confuses the purpose of both CRC and hash functions. Use hash functions if you want serious integrity against both glitches and malicious adversaries. Use CRCs if you want a cheap way to just avoid common glitches. Lunkwill 20:15, 8 February 2006 (UTC)

I agree with everything you say here; I'm just trying to set out why CRCs are popular for certain kinds of protocol. Against an active attacker a CRC is no good as you rightly point out, but against a channel that introduces occasional one-bit errors a CRC might be the right check to choose for reasons other than speed or convenience of implementation. TBH, I think the time when they might be useful is past; you should always be using a MAC to protect against deliberate attack (and you will then also be protected against natural errors), and if you have a problem with frequent bit errors then you need an error correcting code, not a CRC. But it's a misrepresentation to say that CRCs are used solely for reasons of speed and convenience; they have certain advantages when detecting natural sources of error and that's why they've historically been used. That's all my point is.

Oh, and given the MAC key, it's now not too hard to construct an error which passes HMAC-MD5, and HMAC-SHA1 can't be far off. Of course things are rather different if you don't have the key :-) — ciphergoth 22:27, 8 February 2006 (UTC)

[edit] Hash functions based on block ciphers

I have a slight discomfort with this section that I don't know how to resolve. The fact is that most modern hash functions (MDx, SHA-x, Tiger, Whirlpool at least) are based on block ciphers in either Davis-Meyer or Miyaguchi-Preneel mode; it's just that they use custom block ciphers designed for the application rather than, say, AES. I don't know how to change the article to express this. — ciphergoth 09:06, 6 March 2006 (UTC)

We do mention that fact in the Merkle-Damgård article: "The compression function can either be specially designed for hashing or be built from a block cipher. In many cases, including the SHA-1 and SHA-0 ciphers, the compression function is based on a block cipher that is specially designed for use in a hash function."

But saying "a block cipher specially designed for the hash function together with Miyaguchi-Preneel" at least to me means about the same thing as "the compression function can be specially designed for hashing". (Of course with a little less detail.) And actually, the compression function can either be specially designed, or an existing block cipher, or even be made from a stream cipher.

Personally I think we should keep it simple here in the hash article (as we do now) since otherwise it becomes far to long. Instead we should leave the indepth details to the Merkle-Damgård article.

--David Göthberg 10:45, 6 March 2006 (UTC)

[edit] Jacksum

On March 27 I removed about a dozen of links from Wikipedia like this one from WHIRLPOOL:

Jacksum — by Dipl.-Inf. (FH) Johann N. Löfflmann in Java. Various message verification functions, including all three revisions of WHIRLPOOL. Released under the GPL.

Jonelo (talk · contribs), the author of Jacksum has in most cases restored these links; see his contributions. I would be interested to know what other Wikipedia editors feel about this dispute. There is some discussion of this on his Talk page My own feelings follow.

Omit the links - I do not think it is appropriate to pepper Wikipedia with links to your software so even when it is open source. — ciphergoth 19:47, 30 March 2006 (UTC)
The only place where I would be comfortable with a link to this software is in List of cryptographic software. Arvindn 00:13, 31 March 2006 (UTC)
Jonelo should not be reverting links to his site. We can be a little heavy-handed with that, if necessary, because our policy on external links is very clear, saying that editors should avoid adding: "3. Links that are added to promote a site" and "9. A website that you own or maintain (unless it is the official site of the subject of the article). If it is relevant and informative, mention it as a possible link on the talk page and wait for someone else to include it, or include the information directly in the article." It goes on to say that, "NOTE relating to items #3 and #9: Because of neutrality & point-of-view concerns, a primary policy of wikipedia is that no one from a particular site/organization should post links to that organization/site etc. Because neutrality is such an important -- and difficult -- objective at wikipedia, this takes precedence over other policies defining what should be linked. The accepted procedure is to post the proposed links in the Talk section of the article, and let other - neutral - wikipedia editors decide whether or not it should be included." — Matt Crypto 06:52, 31 March 2006 (UTC)

As the author of Jacksum I would like to add a few comments. It is true, that I have added the Jacksum links in October 2004. See also the discussion at my Talk page. I have also changed the description from time to time to keep correctness. However, the need to restore a link occur just only twice since 2004. See also my contributions. And also according to the feedback that I have received, it seems that users have found those links quite useful. Well, all Wikipedia links to Jacksum are removed for now, and I'm not going to add the links again, because I really don't want to violate the policy of Wikipedia ("editors should be avoid adding a website that you own or maintain"). However I think that an open source, free, and platform independent checksum software is something which the Wikipedia readers could help. Therefore I hope that there are at least a few Wikipedia users who agree with me, so that the Jacksum links can be restored some time again. The link to Jacksum is http://www.jonelo.de/java/jacksum . Maybe, it is wise to shorten the description a little bit. Thanks for your support. Jonelo 19:13, 31 March 2006 (UTC)

Personally, I prefer such links, so long as they are useful. I didn't see a link to "list of cryptographic software" anywhere on the page, and hence didn't find any software for Whirlpool. However, with people posting Whirlpool software, I get my information and applications from a single page. It's one of the things I love about Wikipedia: I get the information, then I get the websites or apps that put it into practice. I do see good reason to regulate these things, but I think that at least the best software should be on the page. If anyone disagrees, then put a link to that list of software on the Wikipedia article of every cryptographic algorithm or topic, that way the casual Wikipedia user won't miss out. I am glad Crypto++ was on the AES article, or I would have never known it existed. —Preceding unsigned comment added by 76.107.167.109 (talk) 05:56, 13 March 2008 (UTC)

[edit] Effective size?

We need a column for effective size to give an indication of each algorithm's quality. For instance, a perfect 128-bit hash would take 2^64 operations to find a collision. Because e.g. SHA-1 takes 2^63 operations to find a collision, its effective size would be 126 bits. Otherwise, the reader has to visit each article one-by-one in order to get an idea of how cryptographically broken each algorithm is. --Damian Yerrick (☎) 02:40, 3 June 2006 (UTC)

I think such a summary would be more misleading than useful. My experience with eSTREAM has been that it's best to discuss security status on the page for the relevant primitive, where you have the space to do it properly. — ciphergoth 08:56, 3 June 2006 (UTC)

[edit] Revealing properties?

Wouldn't a good property of a hash function be the ability to show, given a string X, that only a string of its length (or a string that's an anagram of that one, or which starts with the same character of as that one, or something) could result in the hash it results in without revealing the string itself (like revealing that the MD5 hash 6cd3556deb0da54bca060b4c39479839 must belong to a string 13 characters long)? --Ihope127 00:57, 4 July 2006 (UTC)

[edit] Are hash methods theoretically viable?

There is no underlying theoretical discussion included in the article. Many claim it is not possible at all to design a perfectly collision-free and one-way hash method , so all hash algorythms are futile for security use. I think this is related to P <-->NP problem. 195.70.48.242 20:36, 5 December 2006 (UTC)

Actually, I'd say the article does a reasonable job of expressing the theoretical basics. If you want more, you can read up on random oracles or go to the Handbook of Applied Cryptography. "Perfectly collision-free" is meaningless with respect to hash functions (consider a hash with 128-bit output. Feed it all 256-bit strings, and each will have an average of 2^128 collisions, even if it's a true random oracle.) P<->NP is irrelevant. And "futile for security use" doesn't really make sense either -- almost all cryptography is based on well-tested assumptions about what's hard and what's not. That said, there is lots of work to be done in strengthening our definition and implementation of cryptographic hashes, and that work is being aggressively undertaken now that sha1 and md5 are creaking ominously. Lunkwill 19:46, 7 December 2006 (UTC)

[edit] NIST has been having some workshops on cryptographic hashes

NIST has been having some workshops on cryptographic hashes. There have been a few ideas for new hash functions submitted; three have web pages and reference code: LASH, RadioGatún, and EDON-C/EDON-R (scroll down and look for "Two infinite classes of cryptographic hash functions"). Samboy 05:29, 21 December 2006 (UTC)

[edit] One-way encryption

Many people in the industry refer to cryptographic hash functions as "one-way encryption". I added this and also added references to a couple reputable books that talk about this. My addition was removed. This removal was unwarranted. The fact that they are called this is not incorrect (obviously because I referenced it). The notion that it is an incorrect definition is arguable and valid. The person who removed my edit should add a reputable source that defines it otherwise and state that it is disputed and/or controversial. Javidjamae 04:33, 18 June 2007 (UTC)

I see what you mean. However, there is some considerable confusion on how the term is used. In some cases one-way encryption refers to an encryption scheme that simply is one-way as opposed to stronger security notions such as security against chosen ciphertext attacks. As far as I can see, the most frequent use of "one-way encryption" is for applying a one-way function to passwords. Cryptographic hash functions are frequently (but not always) used for this process. An example where not a hash function is used is the DES-based password storage in UNIX (I.e. encrypt an all zero block with the password as a key). This function is reasonably one-way, but not collision resistant and hence not a cryptographic hash function. Hence what many people mean when they talk about "one-way encryption" is something slightly different than a cryptographic hash function. There are people that do not distinguish between the two concepts, but that does not mean we have to copy that into wikipedia. 85.2.88.39 12:21, 18 June 2007 (UTC)

Oh, this was interesting. I often think of and use hashes as a kind of "one-way encryption" functions. And I use them in several ways as one-way encryption, not only for password hashing. And I think many crypto users/programmers think of hashes as one-way encryption.

Currently we don't have a one-way encryption article here at Wikipedia. I think a solution for the moment could be to make a separate section about "one-way encryption" where the different views on this can be expressed. And if the "one-way encryption" section grows large we can make it a separate article. For now I made a redirect from one-way encryption to this article. I also added a mentioning of "one-way encryption" in the section "Applications of hash functions". Both to show a usage of the expression and make any one searching for the expression find this hash article.

--David Göthberg 14:44, 18 June 2007 (UTC)

Encryption, one-way function and cryptographic hash functions are distinct concepts that should not be mixed with each other. In order to design a protocol it is important to recognize what the (assumed) properties of the primitives used in the protocol are, what properties are not present and what properties are needed for the protocol to be secure. For example, I think of cryptographic hash functions simply of functions having the property that it is difficult to find two inputs that hash to the same result. Some hash functions such as the SHA-family may have further properties that make them suitable for applications such as HMAC or password storage. Other hashes don't. An example is VSH [5]. VSH comes with a strong argument for its collision resistance. But it has an undesirable property allowing some short messages to be inverted. (This is pointed out in the paper together with a warning not to blindly use the function for applications it was not defined for.) Hence not every hash function is a good choice for password storage. I still have problems understanding what people actually mean when they talk about one-way encryption, but I get the impression that most people using the term don't know either. Rather than a redirect it might be better to have some sort of disambiguation page. 85.2.67.208 18:22, 19 June 2007 (UTC)

[edit] Definition of collision resistance

I reverted the 2007 March 18 modification to collision resistance. The attacker is free to choose both m1 and m2. He need not be confined to an m1 fixed by others .

NormHardy 00:21, 20 July 2007 (UTC)

[edit] Image choise

A hash function at work. (DAVID'S IMAGE)

Results of the Whirlpool hash function on some inputs, in hexadecimal. Note the large difference in the last two hash values created by simply changing one letter. (JAFET'S IMAGE)

Some days ago User:Jafet.vixle changed the image used in the article. I disagree with the change.

I think Jafet's image is much harder to understand for beginners. Among other things it does not show generic terms such as "hash function" and "hash sum", instead it uses the name of an arbitrarily chosen hash function "Whirlpool". And it doesn't show the process, that is that input is fed to the hash function that then produces the hash sum.

--David Göthberg 16:26, 21 August 2007 (UTC)

I agree. The older picture is preferable. 169.231.5.121 07:22, 22 August 2007 (UTC)

Since no one else said anything I switched back to my image in the article. --David Göthberg 15:15, 26 August 2007 (UTC)

[edit] Clarification required

Can someone knowledgeable please clarify the units used in the "List of cryptographic hash functions" section? For instance, does SHA-1 output 160 bits or 160 bytes? —Preceding unsigned comment added by 196.7.19.250 (talk) 15:44, 26 October 2007 (UTC)

It's supposed to be bits. (anonymous)