Talk:SHA hash functions

From Wikipedia, the free encyclopedia

This article is part of WikiProject Cryptography, an attempt to build a comprehensive and detailed guide to cryptography in the Wikipedia. If you would like to participate, you can choose to edit the article attached to this page, or visit the project page, where you can join the project and see a list of open tasks.

It is intended that this article be included in WikiReader Cryptography, a WikiReader on the topic of cryptography. Help and comments for improving this article would be especially welcome. A tool for coordinating the editing and review of these articles is the daily article box.

To-do list for SHA hash functions:

edit · history · watch · refresh

Here are some tasks you can do:

[edit] 80 000 CPU hours?!

(80 000/24)/7 = 476.190476190476.. Slightly more than a year.

Has such itanium2-based supercomputers even existed for that long???

The computer had 256 processors running in parallel, so you'll have to divide the time by that number. Fredrik | talk 23:38, 7 Apr 2005 (UTC)

[edit] Patent status?

But primary question is, is this method unencumbered by patents or other protections? ~ender 2003-04-19 01:50 MST

It is generally considered to be patent-free (although that doesn't mean it is). In particular, see this letter from NIST. --Zundark 20:53, 2 Jun 2004 (UTC)

What's the deal with the patent referred to in the article? The article does appear to relate to the SHA variants, yet I can't find anybody talking about it on the web, and the IETF lists no IP disclosures relating to the SHA-224 RFC. What gives? Lunkwill 00:33, 27 September 2005 (UTC)

As I understand - they are free to use. Some implemintations can be covered with patent, not algorithm itself. Check this out: http://grouper.ieee.org/groups/1363/P1363/letters/NIST.txt--213.208.190.162 15:45, 30 October 2005 (UTC)

The letter linked above is from NIST and refers to SHA and SHA-1, not SHA-256 and SHA-512, which were patented by NSA and after NIST's letter was written (patent #6,829,355).

Patent, yes, but there's no copyright in the USA, since works of the United States Government inherently are public domain.

[edit] Naming

This article is about five or six different variants of SHA, which is fine, but we could probably do with a slightly more general name. Suggestions include:

— Matt 07:50, 4 Aug 2004 (UTC)

Matt, One problem will be that people will have only heard of (or read about) SHA or SHA-1 or something. Perhaps a forest of redirects? ww 20:48, 4 Aug 2004 (UTC)

Oh certainly (moving, of course, automatically creates a redirect from the old location anyway). I think SHA might be best, as it's short. — Matt 12:15, 5 Aug 2004 (UTC)

The problem with SHA is that it will very likely end up being a disambiguation page, as there are other things it can stand for (e.g., the Society for Historical Archaeology, which is currently the first Google hit for SHA). So Secure Hash Algorithm seems like a better option. --Zundark 13:02, 5 Aug 2004 (UTC)

Good point; any nay-sayers to Secure Hash Algorithm? — Matt 13:34, 5 Aug 2004 (UTC)

I suppose the best choice would be SHA family as there are several, including the insecure SHA-0. Or maybe Secure Hash Algorithm Family, maybe. ww 18:32, 6 Aug 2004 (UTC)

OK, I've gone with SHA family, and made a few redirects from the other suggestions. — Matt 13:26, 13 Aug 2004 (UTC)

[edit] pseudocode error

There seems to be an error in the Pseudocode: The assignments:

a = h0

b = h1

c = h2

d = h3

e = h4

must be done for every 512 bit chunk, not only at initial startup of the algorithm. At least that's what I had to change when implementing this algorithm before it produced the same results as the SHA-1 checksum tool I already had. 11:25, 10 Aug 2004 80.129.56.5

You're exactly right... I fixed it. You can fix it too, you know =^_^= -- Myria 05:38, 2 Nov 2004 (UTC)

[edit] SHA-2 hashes

Hi folks, do you find it useful to have additional testvectors for the SHA-2 variants on the page? I don't want to add them until I have heard some opinions. Jonelo 08:04, 6 Oct 2004 (UTC)

[edit] Further F3 optimization by Colin Plumb

Colin Plumb discovered that F3,

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) or (d and (b or c))

can also be expressed as

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) + (d and (b xor c))

This allows the compiler to optimize to make use of the associativity of addition, as well as enables the compiler to use the x86 lea instruction to do 3-operand addition (x = y + z).

Someone had changed the range to 60 ≤ i ≤ 79, changed that back. 200.175.25.158 18:32, 24 July 2005 (UTC)
The expressions given above have one more right than left pren in each. So at the very least they've been typo'd. No source is given, so I can check neither the original form, nor whether Colin Plumb authored them. (comment by Nahaj though not originally marked)
- Not anymore =) Because the expression is usually a #define, the expression is traditionally written with parentheses around it. Somehow I forgot the one on the left. I wrote this section and put this F3 function into the article, but forgot to tag it here. Many versions of SHA-1 on the Internet have these comments before the F-functions:
  - The SHA f()-functions. The f1 and f3 functions can be optimized to save one boolean operation each - thanks to Rich Schroeppel, rcs@cs.arizona.edu for discovering this.
  - The f3 function can be modified to use an addition to combine the two halves rather than OR, allowing more opportunity for using associativity in optimization. (Colin Plumb)
  - One such version: [1]

That answers my request for a citation quite well. Nahaj 17:37, 28 October 2005 (UTC)

- ... I would like to cite them but I don't know how and I don't know what we would use as the source, since all I've seen in the source is this. By the way, you can prove that the functions are identical by combining induction with a truth table. -- Myria 06:43, 28 October 2005 (UTC)

[edit] Explanation

The article really doesn't explain why there are so many SHA versions, is SHA-512 better (but slower to do) then SHA-1? If someone knows, please add it in. --ShaunMacPherson 12:52, 6 Nov 2004 (UTC)

(I'll try and check this then add it to the article): As I understand it, I believe that the different SHA versions are used to correspond to different block cipher key lengths, so that all the components in a cryptosystem provide the same level of security. Say you have a block cipher with an n-bit key size, then to match that security with a hash function that you use in the same system, you need a hash size of 2n, because even a perfect hash function is subject to birthday attacks with a complexity corresponding to the square root of the hash size — i.e. n if the hash size is 2n. So SHA-1 has 160-bits, which matches Skipjack (cipher) (keysize 80 bits). SHA-256, SHA-384 and SHA-512 match AES with its possible key sizes of 128, 192 and 256 bits. SHA-224 was introduced to match the security of two-key Triple DES (2×56=112-bit key). — Matt 17:58, 6 Nov 2004 (UTC)

[edit] Design questions

Does anyone have any insight into why one of the arguments of the rotations in the compression function is a factor of the other, and why the 30-bit left rotation (taken as a 2-bit right rotation) divides the 32-bit word size? I don't have time to analyze it in depth at the moment, but it seems under casual inspection that you would get better mixing if the arguments were coprime... User:Inkling 10:51, 17 Jan 2005 (UTC)

[edit] SHA-1 Broken?

Breaking news - [2]. Bovlb 02:35, 2005 Feb 16 (UTC)

[edit] Extra SHA-pseudocode

An anonymous user has kindly contributed pseudocode for SHA-256, in addition to the pseudocode for SHA-1. This is great to have, but I'm concerned that the article is a little over-burdened with pseudo-code at present. We don't want to lose this information, so perhaps we could move the SHA-256 description elsewhere: maybe a SHA-256 pseudocode page, or onto WikiSource...what do people think? — Matt Crypto 19:18, 14 Mar 2005 (UTC)

[edit] External links

It doesn't matter much, but I think a good rule of thumb is to have only one good link for a certain type of thing -- "Wikipedia is not a link repository", and all that. If we already have one link to a Javascript SHA calculator, why do we need another? Eventually, the External Links section starts to clutter up as developers inevitably add more links to their particular SHA implementations. To me, it seems easier just to say, "we only need one of those, thanks" at the outset. — Matt Crypto 01:07, 18 Mar 2005 (UTC)

Yes, right now we have three different Javascript tools, one should be enough also in my humble opinion, because all of them do the same: they calculate a SHA1 from a user input. However, the Javascript tools don't really take the character encoding into account and they are not able to calculate the message digest from a file, but the Java tool can. IMHO a Wikipedia user should have the freedom to choose the tool she/he needs, because usually there is a good reason why she/he went to Wikipedia. Therefore I think we should keep not only one Javascript link, but also the C and the Java link (the Java link has been added by myself ;-) Jonelo 18:57, 18 Mar 2005 (UTC).

I've removed a few links that provide bad hashes. Here are the correct SHA-1 hashes for some characters online calculators usually get tripped up on:

' = bb589d0621e5472f470fa3425a234c74b1e202e8
" = 2ace62c1befa19e3ea37dd52be9f6d508c5163e6
# = d08f88df745fa7950b104e4a707a31cfce7b5841
\ = 08534f33c201a45017b502e90a800f1b708ebcb3
& = 7c4d33785daa5c2370201ffa236b427aa37c9996
? = 5bab61eb53176449e25c2c82f172b82cb13ffb9d
'"#\&? = 699b7d43c2ceb01791d98817222b96c35714d1f6

I'll get around to cleaning up that section, eventually. ~MDD 46 96 23:06, 8 February 2006 (UTC)

at least http://serversniff.net/hash.php should provide correct hashes now - thanks for the hint, Mdd4696! Thomas Springer 12:26, 24 February 2006 (UTC)

[edit] QUESTION

I'd like to know more about how these functions are used and why they are important. Is it true that you run one of these algorithms over some text and have a code that cannot be converted back to the text? How is that useful? What are some common applications?

If this is is explained elsewhere perhaps you could link me in the right direction.

How about the cryptographic hash function article, and the hash function article linked from there? Lunkwill 07:38, 8 September 2005 (UTC)

[edit] Pseudocode clarification.

This line in the code (both versions) is unclear.

"append length of message, in bits as 64-bit big-endian integer to message"

Does it refer to the original message length, or the message length after it has been padded and had the length added? I assume it is probably the original length. If so can someone change it to "appeng length of original message, in bits...."?

I thought it was pretty obvious when I wrote it. Yes, it's the length of the original message. What do others think about this request? -- Myria 04:30, 26 September 2005 (UTC)

I think "...append original length..." is better than "...append length...". Nahaj 12:24, 28 October 2005 (UTC)

[edit] Numbers, cant figure them out.

Taken from the article 25/09/05: "SHA-0 and SHA-1 produce a 160-bit digest"

If it's 160 bit, what does that mean ? does it mean that a input like "Red juice" has 160 bit diffrent outputs?

Taken from the article 25/09/05: "from a message with a maximum size of 2^64 bits" so i cant digest a message above 2^64 bit (2,3 exabytes) ?

Taken from the article 25/09/05: "At CRYPTO 98, two French researchers presented an attack on SHA-0 (Chabaud and Joux, 1998): collisions can be found with complexity 2^61; less than the 2^80 for an ideal hash function of the same size."

Where does 2^80 come from ? SHA0 is 160 bit, that would be 2^160 right ?

160 bit output means that the function mashes, stretches and squishes any input string into a 160-bit (20 byte) long output string. Correct, the hash functions won't work on messages longer than 2^64 bits (which, I might add, are rather hard to come by...). And 2^80 comes from the birthday paradox. Lunkwill 20:58, 25 September 2005 (UTC)

[edit] Optimizations

There are machines where a rotate or shift by N eats N cycles. Obviously the rotate left 30 could be replaced by a rotate right 2, speeding it up on such machines, and not slowing it on any machines I'm aware of. (: And there is a Validated implementation that does the rotate right 2 :) How much optimization is worth doing on pseudo code? (comment by Nahaj, tagged by Myria)

I personally draw the line at the point at which it ought to be the compiler's responsibility. The optimizations to the F-functions are universal - they improve the calculation on all systems, or at least grant a theoretical advantage (such as the associativity of addition). No reasonable compiler could look at the original F-functions and immediately deduce the rather odd changes needed to remove a boolean operation. However, using the symmetry of the rotate commands is very obvious and is easy to program a compiler to do. -- Myria 07:16, 28 October 2005 (UTC)
- On another note, I wonder if we should mention that the expression (x << y) | (x >> (N - y)), where N == sizeof(x) * 8, automatically gets optimized to a rotate instruction on many compilers for platforms that have it like x86-GCC. -- Myria 07:16, 28 October 2005 (UTC)

[edit] How much does adding salt increase security

I'm assuming the SHA-1 collisions were done without adding a salt to the original message. Does anyone know how much a salt of a given size would add to the difficulty of creating a collision? e.g. would adding a 16 bit salt to the original message increase the best collision algorithm iterations from 2 to the 63 power to maybe 2 to the 66 power? -- Jweiler 19:24, 10 January 2006

These answers are based on the attack they found in 2005. But note that I don't know how the new optimised attack works, but I heard it is similar. Also note that I am not a cryptanalyst so I might be totally wrong about this.

Short answer: No, in some usage cases adding a salt does not make the attack harder. But under some circumstances a salt can stop the attack entirely. It depends on when and where the salt is added.

Long answer: As I understood it the attack they found in 2005 on SHA-1 is similar to the MD5 attack from the same year. That means they are changing some bytes in a special position close to the beginning of the message. That is, the attack method changes the same position, no matter what message you are working with at the moment. That means if the attacker creates the message, then he can change bytes in that position to create two messages that collide.

Due to the nature of hashes if you append something to the end of two colliding messages you will get a new hash, but the same hash for both messages. So still a collision. Thus appending the same salt to both messages will keep the collision and thus not protect you.

If you prepend anything to the two messages the position of the changed bytes will move. That means the attack won't work at all. So the two messages will not collide any more. So if you insert a salt or a message header or whatever to the very beginning of the message then you are sort of ok. At least against the 2005 attack.

However, if the attacker at the time he creates the two messages knows what header you are going to prepend to the message he can take that into account when he makes the message pair. That is, he will add the same header and then manipulate bytes in the proper position to find colliding messages. (And then of course remove the header before he gives you the messages.)

So this means that you must create an unpredictable random salt AFTER the messages are given/uploaded/sent to you by the attacker. And prepend it to the message, not append it.

Another approach is to prepend a fixed header that is so long that it runs over the bytes that the attacker needs to manipulate to do the attack. Thus he can not manipulate those bytes. And the header can be well known and public.

But note, now that they found two attacks it is likely that they will find more attacks. For instance, I think there are many more positions further into the message where you can manipulate bytes. So these "fixes" are not even worth the trouble. You better use something better instead. I think the wise thing is to do like the TLS standard does: Use two different hashes and combine them. That approach is probably still secure even if there are some attacks against both the used hashes.

Oh, but of course salts are a good thing for many other reasons.

--David Göthberg 19:48, 30 October 2006 (UTC)

[edit] Removed "implementations" section

There are 370 implementations of SHA-1 according to the article. We can't list them all, and the most notable ones are the ones we don't need to link to because they are bundled with Perl, Python, Java and so forth. The only reasonable way to avoid this section growing without bound is to remove it altogether.

I think the same goes for "web-based tools" which I also removed, but perhaps I'm wrong? — ciphergoth 21:40, 19 February 2006 (UTC)

It would be nice if someone could find an online directory that lists the web-based tools, and link to that instead. While the section was getting quite cluttered, I did find the links useful. ~MDD 46 96 01:38, 20 February 2006 (UTC)

I don't see any reason why we can't arbitrarily pick one or two good implementations and web based tools. It's a natural next step for people reading the article. Lunkwill 08:07, 20 February 2006 (UTC)

I think that would have some advantages, but it's very hard to be arbitrary here. When someone wants to add their pet implementation to the list, what reason can we give for reverting it? — ciphergoth 09:28, 20 February 2006 (UTC)

See WP:EL. We can't link arbitrarily, and there are too many good online hashers available to list them all. ~MDD 46 96 23:44, 20 February 2006 (UTC)

We do seem to have a lot of software authors adding their particular hashing site/program to Wikipedia; a comparable situation is MD5. I think it would be useful to include a link to one or two implementations, but no more. The question is, how do we decide? I believe we can be arbitrary, but the easiest route is to pick a few links which seem to be amongst of the best of its type (as Jason suggests), and then be very strict with new additions. If the authors are upset or think it's unfair, then stuff 'em; there is no obligation for us to advertise their project at all. Most people shouldn't be adding their site anyway as per the WP:EL guideline. The ideal would be to have a link to a web site which lists implementations, but I don't think one exists for SHA. — Matt Crypto 17:07, 21 February 2006 (UTC)

I don't know how someone would know there there are 370 implimentations if there wasn't list somewhere.

I guess I am still trying to understand why this is a problem. I think that the implimentations section and the web-based tools are very helpful to programmers and others who are seeking to do a bit more hands-on research. Removing those sections would greatly reduce the value that this resource provides. Johnmaguire 19:04, 24 February 2006 (UTC)

The thing is, Wikipedia isn't a repository for any data that might be useful to someone; it's first and foremost an encyclopedia. An infinitely better place to list and maintain web links is the Open Directory Project; it's simply outside our scope. — Matt Crypto 19:13, 24 February 2006 (UTC)

I suspect that the 370 number comes from NIST's list of validated implementation at the time (Up to 450 now... although the removed implimentations list leaned to unvalidated versions for some reason. There are at least two open source implementations in NIST's list, so I'd expect that if you were going to list one or two they would be good candidates. (Disclaimer, I'm the author of one.) (My bias towards validated ones are because a number of flat out missimplementations exists on the net.) My personal opinion is that the article shouldn't have an implementations section, it should just link to NIST's list, since they have a legal mandate to keep it up to date... any list in the article is going to be rapidly out of date.Nahaj 02:24, 5 April 2006 (UTC)

I agree if we talk about very popular pages like SHA and MD5, because those algorihms are well known to the public already and too many links (both free and nonfree, both trivial and complex) have been posted in the past to those topics. Furthermore it is easy to find SHA and MD5 software by a seach engine easily today rather than using Wikipedia. However, for not so famous pages, a link where Wikipedia users can find source and/or software, make absolutely sense in my opinion. BTW, I absolutely disagree with your argument called "we don't need to link to because they are bundled with Perl, Python, Java", because actually in most cases those software is open source, platform independent and run also on entirely free platforms rather than just only on one particular OS. Jonelo 18:02, 29 March 2006 (UTC)

[edit] Licensing of the psuedo code

Can I derive proprietary code from the psuedo code on this page? I am concerned that the Free Document License won't allow it, but what is psuedo code for if not to be derived from? 168.215.177.66 21:37, 6 July 2006 (UTC)

[edit] Combining SHA1 / MD5

I am a cryptography novice, but wonder given the recent break of SHA1 and break of MD5 if the security of both can be improved by combining them?

For example, if the function SHA1(data+MD5(data)) were used (i.e. append the MD5 of the data to the data, and take the SHA1 of the combination), would it be a lot harder to find collisions? My (rudimentary) understanding of the collision finding process is that to break the algorithm a data stream to fill the internal state with specific data is desired - and that the complexity of finding this data stream depends on the size of the internal state. Thus by effectively adding MD5 internal state onto SHA1 this problem would become much harder.

This would be practical in the following (common) situation : you have to interface with a legacy application that does not support newer hash functions (e.g. SHA-2 family), and yet you are worried about the potential for a SHA-1 break (you don't use MD5, because breaking that is far easier according to the above quoted articles). A good example would be MySQL, which has internal functions for MD5 and SHA1, while stating in the manual "Exploits for the MD5 and SHA-1 algorithms have become known. You may wish to consider using one of the other encryption functions described in this section instead" - without providing any other cryptographic hash functions in version 5.0.

I assume (possibly naively) that the combination will be at least as strong as the outer function. Is this a reasonable assumption? Second, I intuitively think (which probably means it is wrong) that the combination will be stronger than the outer function. Any thoughts?

Shamus Husheer - s dot husheer at cantab dot net - 22:46, 30 August 2006

I've asked the gurus at sci.crypt under the thread "Probably naive question - SHA1 + MD5 combination" on 31 Aug 2006, and had the following answer:

(1) See http://ai.stanford.edu/~xb/crypto06b/index.html Boneh and Boyen's paper `On the impossiblity of efficiently combining collision resistant hash functions', which shows that if both functions are good hash functions, then combining them in this way is not useful,

(2) The only "at least as strong as" construction I [Kristian Gjøsteen] know of is "g(m) = h_1(m) || ... || h_n(m)" [appending one to the other]. If you find a collision in this hash function, you've also found a collision for all the constituent hash functions. Kristian also suggested a way to more practically break the SHA1(data+MD5(data)) function in the discussion.

Hopefully this helps anyone who has come searching for the same information.

Shamus - 06:59, 4 September 2006

Hi Shamus. I am not a cryptanalyst or so but have some knowledge. To me the method you suggest to combine two hashes seems pretty good: SHA1(data + MD5(data) ). But there is a better variant of it, see below. Another well known method is SHA1(data) XOR MD5(data). Of course, the most secure one is to concatenate the two hashes: SHA1(data) + MD5(data). However that gives a rather big hash and some systems don't use the whole result but instead just chop off enough bytes they want. Thus it is nice if all bits in the result depends on both hashes. So the two first methods here seems better for normal use. For the XOR method there is a slight problem that MD5 gives a 16 byte hash while SHA1 gives a 20 byte hash. If a program then chops of and uses the wrong end of the result then not all bits depend on both hashes. But that can be handled by reusing 4 bytes of the MD5 to XOR over the other part of the SHA1. I have seen many other suggested methods too. Here I list some of them including the three above:

hash = SHA1( data + MD5(data) )

hash = SHA1(data) XOR MD5(data)

longhash = SHA1(data) + MD5(data)

tmp = MD5(data), hash = SHA1(tmp + data + tmp)

hash = MD5(SHA1(data)) XOR SHA1(MD5(data))

hash = SHA1(MD5(data) + SHA1(data))

"+" here means concatenation.

Oh, and shamus and every one else, if you like to talk more about crypto come to the IRC chat #crypto on irc.freenode.net.

--David Göthberg 22:27, 21 September 2006 (UTC)

[edit] Great job!

I just wanted to thank everyone worked on this page because it's probably one of the bests I've found on the wiki. In particular, I needed to implement a checking routine for file integrity (non critical performance) and obviously started from CRC and Md5. I was very surprised SHA-512 to be "so easy" to implement (I'm not in cryptography at all) so I guess I'll give it a try.

As another note, maybe it would be good to note other methods to be "somewhat surpassed"... i bounced on many links before getting this one. I would do this myself, but I'm not really sure what's the best way to do it.
81.211.227.153 20:11, 21 September 2006 (UTC)

[edit] Are SHA hashes keyed?

Hi, I've got what is probably a basic question: Can you key SHA hashes? That is, using a secret key of some sort in the hash algorithm so that no one can calculate the hash in question without knowing the key used in the hashing? (That might be what the initializations are for in the pseudocode.) -- Joebeone (Talk) 16:15, 14 October 2006 (UTC)

I've poked around some more and it appears that hash functions are generally unkeyed... sorry for the clutter. -- Joebeone (Talk) 18:06, 14 October 2006 (UTC)

Don't be sorry, it is a good question. Yes, you can key any cryptographic hash function and that turns it into a message authentication code (MAC). The most popular (and also very secure) method to do that is the HMAC method. Usually when we send or store messages we do MAC them and thus make it "impossible" for an attacker to tamper with (change) our messages without us detecting it. --David Göthberg 05:32, 16 October 2006 (UTC)

[edit] ccv?

Hi,

in the pseudocode for SHA-2, someone changed "for i from 16 to 63" to "for i from 16 to 63ccv" again (I had removed "ccv"); before reverting it, does anyone think that it has a meaning? —Gennaro Prota ^•Talk 23:23, 26 November 2006 (UTC)

[edit] Date linking

Hi again,

would anyone object to delinking all year-only dates in the article? I don't think the links add anything relevant in this context (see Partial dates). —Gennaro Prota ^•Talk 06:27, 2 December 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../s/h/a/Talk%7ESHA_hash_functions_2d36.html"

Categories: To do | To do, priority undefined