Talk:SHA hash functions

From Wikipedia, the free encyclopedia

This article is part of WikiProject Cryptography, an attempt to build a comprehensive and detailed guide to cryptography on Wikipedia. If you would like to participate, you can choose to edit the article attached to this page, or visit the project page, where you can join the project and see a list of open tasks.

It is intended that this article be included in WikiReader Cryptography, a WikiReader on the topic of cryptography. Help and comments for improving this article would be especially welcome. A tool for coordinating the editing and review of these articles is the daily article box.

To-do list for SHA hash functions:

edit · history · watch · refresh

Here are some tasks you can do:

Update: SHA-1, in light of new vulnerabilities

Priority 3

http://en.wikipedia.org/skins-1.5/common/images/button_nowiki.png Ignore wiki formatting

[edit] Operations

Is "rotfl" a binary operation? I'd suggest to revise the alg/operation table :( —Preceding unsigned comment added by 81.182.160.76 (talk) 07:08, 7 April 2008 (UTC)

[edit] 80 000 CPU hours?!

(80 000/24)/7 = 476.190476190476.. Slightly more than a year.

Has such itanium2-based supercomputers even existed for that long???

The computer had 256 processors running in parallel, so you'll have to divide the time by that number. Fredrik | talk 23:38, 7 Apr 2005 (UTC)

[edit] Patent status?

But primary question is, is this method unencumbered by patents or other protections? ~ender 2003-04-19 01:50 MST

It is generally considered to be patent-free (although that doesn't mean it is). In particular, see this letter from NIST. --Zundark 20:53, 2 Jun 2004 (UTC)

What's the deal with the patent referred to in the article? The article does appear to relate to the SHA variants, yet I can't find anybody talking about it on the web, and the IETF lists no IP disclosures relating to the SHA-224 RFC. What gives? Lunkwill 00:33, 27 September 2005 (UTC)

As I understand - they are free to use. Some implemintations can be covered with patent, not algorithm itself. Check this out: http://grouper.ieee.org/groups/1363/P1363/letters/NIST.txt--213.208.190.162 15:45, 30 October 2005 (UTC)

The letter linked above is from NIST and refers to SHA and SHA-1, not SHA-256 and SHA-512, which were patented by NSA and after NIST's letter was written (patent #6,829,355).

Patent, yes, but there's no copyright in the USA, since works of the United States Government inherently are public domain.

[edit] Naming

This article is about five or six different variants of SHA, which is fine, but we could probably do with a slightly more general name. Suggestions include:

— Matt 07:50, 4 Aug 2004 (UTC)

Matt, One problem will be that people will have only heard of (or read about) SHA or SHA-1 or something. Perhaps a forest of redirects? ww 20:48, 4 Aug 2004 (UTC)

Oh certainly (moving, of course, automatically creates a redirect from the old location anyway). I think SHA might be best, as it's short. — Matt 12:15, 5 Aug 2004 (UTC)

The problem with SHA is that it will very likely end up being a disambiguation page, as there are other things it can stand for (e.g., the Society for Historical Archaeology, which is currently the first Google hit for SHA). So Secure Hash Algorithm seems like a better option. --Zundark 13:02, 5 Aug 2004 (UTC)

Good point; any nay-sayers to Secure Hash Algorithm? — Matt 13:34, 5 Aug 2004 (UTC)

I suppose the best choice would be SHA family as there are several, including the insecure SHA-0. Or maybe Secure Hash Algorithm Family, maybe. ww 18:32, 6 Aug 2004 (UTC)

OK, I've gone with SHA family, and made a few redirects from the other suggestions. — Matt 13:26, 13 Aug 2004 (UTC)

[edit] pseudocode error

There seems to be an error in the Pseudocode: The assignments:

a = h0

b = h1

c = h2

d = h3

e = h4

must be done for every 512 bit chunk, not only at initial startup of the algorithm. At least that's what I had to change when implementing this algorithm before it produced the same results as the SHA-1 checksum tool I already had. 11:25, 10 Aug 2004 80.129.56.5

You're exactly right... I fixed it. You can fix it too, you know =^_^= -- Myria 05:38, 2 Nov 2004 (UTC)

[edit] SHA-2 hashes

Hi folks, do you find it useful to have additional testvectors for the SHA-2 variants on the page? I don't want to add them until I have heard some opinions. Jonelo 08:04, 6 Oct 2004 (UTC)

[edit] Further F3 optimizations

Colin Plumb discovered that F3,

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) or (d and (b or c))

can also be expressed as

(40 ≤ i ≤ 59): f(b,c,d) = ((b and c) + (d and (b xor c))

This allows the compiler to optimize to make use of the associativity of addition, as well as enables the compiler to use the x86 lea instruction to do 3-operand addition (x = y + z).

Someone had changed the range to 60 ≤ i ≤ 79, changed that back. 200.175.25.158 18:32, 24 July 2005 (UTC)
The expressions given above have one more right than left pren in each. So at the very least they've been typo'd. No source is given, so I can check neither the original form, nor whether Colin Plumb authored them. (comment by Nahaj though not originally marked)
- Not anymore =) Because the expression is usually a #define, the expression is traditionally written with parentheses around it. Somehow I forgot the one on the left. I wrote this section and put this F3 function into the article, but forgot to tag it here. Many versions of SHA-1 on the Internet have these comments before the F-functions:
  - The SHA f()-functions. The f1 and f3 functions can be optimized to save one boolean operation each - thanks to Rich Schroeppel, rcs@cs.arizona.edu for discovering this.
  - The f3 function can be modified to use an addition to combine the two halves rather than OR, allowing more opportunity for using associativity in optimization. (Colin Plumb)
  - One such version: [1]

That answers my request for a citation quite well. Nahaj 17:37, 28 October 2005 (UTC)

- ... I would like to cite them but I don't know how and I don't know what we would use as the source, since all I've seen in the source is this. By the way, you can prove that the functions are identical by combining induction with a truth table. -- Myria 06:43, 28 October 2005 (UTC)

Hi guys. JulesH slabbed a "citation needed" tag on these f computing "equivalent expressions" and I too wasn't so sure they were equivalent. So I spent some minutes making truth tables with them. And the result is that all of them are correct. Even the "+" one worked right! Here are the tables:

(0 <= i <= 19): 

f := (b and c) or ((not b) and d)

B C D 

0 0 0    (0 and 0) or (1 and 0)  =  0 or 0  =  0
0 0 1    (0 and 0) or (1 and 1)  =  0 or 1  =  1
0 1 0    (0 and 1) or (1 and 0)  =  0 or 0  =  0
0 1 1    (0 and 1) or (1 and 1)  =  0 or 1  =  1
1 0 0    (1 and 0) or (0 and 0)  =  0 or 0  =  0
1 0 1    (1 and 0) or (0 and 1)  =  0 or 0  =  0
1 1 0    (1 and 1) or (0 and 0)  =  1 or 0  =  1
1 1 1    (1 and 1) or (0 and 1)  =  1 or 0  =  1


f := d xor (b and (c xor d))

B C D 

0 0 0     0 xor (0 and (0 xor 0))  =  0 
0 0 1     1 xor (0 and (0 xor 1))  =  1 
0 1 0     0 xor (0 and (1 xor 0))  =  0
0 1 1     1 xor (0 and (1 xor 1))  =  1
1 0 0     0 xor (1 and (0 xor 0))  =  0
1 0 1     1 xor (1 and (0 xor 1))  =  0
1 1 0     0 xor (1 and (1 xor 0))  =  1
1 1 1     1 xor (1 and (1 xor 1))  =  1


Both functions do give  0 1 0 1 0 0 1 1  thus all is ok.

(40 <= i <= 59):

f := (b and c) or (b and d) or (c and d)

B C D 

0 0 0     (0 and 0) or (0 and 0) or (0 and 0)  =  0
0 0 1     (0 and 0) or (0 and 1) or (0 and 1)  =  0
0 1 0     (0 and 1) or (0 and 0) or (1 and 0)  =  0
0 1 1     (0 and 1) or (0 and 1) or (1 and 1)  =  1
1 0 0     (1 and 0) or (1 and 0) or (0 and 0)  =  0
1 0 1     (1 and 0) or (1 and 1) or (0 and 1)  =  1
1 1 0     (1 and 1) or (1 and 0) or (1 and 0)  =  1
1 1 1     (1 and 1) or (1 and 1) or (1 and 1)  =  1


f := (b and c) or (d and (b or c))

B C D 

0 0 0     (0 and 0) or (0 and (0 or 0))  =  0
0 0 1     (0 and 0) or (1 and (0 or 0))  =  0
0 1 0     (0 and 1) or (0 and (0 or 1))  =  0
0 1 1     (0 and 1) or (1 and (0 or 1))  =  1
1 0 0     (1 and 0) or (0 and (1 or 0))  =  0
1 0 1     (1 and 0) or (1 and (1 or 0))  =  1
1 1 0     (1 and 1) or (0 and (1 or 1))  =  1
1 1 1     (1 and 1) or (1 and (1 or 1))  =  1


f := (b and c) or (d and (b xor c))

B C D 

0 0 0     (0 and 0) or (0 and (0 xor 0))  =  0
0 0 1     (0 and 0) or (1 and (0 xor 0))  =  0
0 1 0     (0 and 1) or (0 and (0 xor 1))  =  0
0 1 1     (0 and 1) or (1 and (0 xor 1))  =  1
1 0 0     (1 and 0) or (0 and (1 xor 0))  =  0
1 0 1     (1 and 0) or (1 and (1 xor 0))  =  1
1 1 0     (1 and 1) or (0 and (1 xor 1))  =  1
1 1 1     (1 and 1) or (1 and (1 xor 1))  =  1


f := (b and c) + (d and (b xor c))

B C D 

0 0 0     (0 and 0) + (0 and (0 xor 0))  =  0 + 0  =  0
0 0 1     (0 and 0) + (1 and (0 xor 0))  =  0 + 0  =  0
0 1 0     (0 and 1) + (0 and (0 xor 1))  =  0 + 0  =  0
0 1 1     (0 and 1) + (1 and (0 xor 1))  =  0 + 1  =  1
1 0 0     (1 and 0) + (0 and (1 xor 0))  =  0 + 0  =  0
1 0 1     (1 and 0) + (1 and (1 xor 0))  =  0 + 1  =  1
1 1 0     (1 and 1) + (0 and (1 xor 1))  =  1 + 0  =  1
1 1 1     (1 and 1) + (1 and (1 xor 1))  =  1 + 0  =  1

All four functions do give  0 0 0 1 0 1 1 1  thus all is ok.

Note, we never did get 1+1 thus no bit did overflow
to the next bit in the word. So the "+" here sort of 
just works as a bitwise "or".

As you can see, they are correct. Funny. --David Göthberg 07:09, 27 January 2007 (UTC)

This is original research, you missed the point of the {{fact}} tag. -- intgr 13:18, 27 January 2007 (UTC)

Intgr: Which part do you mean is original research? That people over time added these extra variants to the article? Or that I spent some minutes doing a trivial check if the variants are correct? (That is, trivial for me as a programmer.) If some one states in an article that 1234 * 2345 = 2893730 then it would be silly to ask for a reference, instead we would just check it on our calculators. This case is really the same thing.

But yeah, since those variants do look a bit weird some kind of link to an explanation would probably be good but I don't find it that necessary. By the way, I just checked several very old SHA1 source codes I have lying around and I found all but one of the variants in those source codes. So apparently these optimisations have been known for years.

--David Göthberg 14:08, 27 January 2007 (UTC)

Well, strictly, both are original research as long as they don't come from a source. But I didn't really give them a thought earlier, and they do appear quite straightforward boolean function optimizations. So I agree that they probably don't need to be sourced.

And by the way, why did you omit the 7th combination (b=1, c=1, d=0) from your proof? Although all these alternatives also produce consistent results in this case. I was first confused by this, as I assumed their equivalence was somehow data-dependent. -- intgr 15:43, 27 January 2007 (UTC)

Oops! Slightly embarrassing for me... Ah well, just goes to show that "proofs" are way too susceptible to the human factor to really be proofs. But thankfully this is Wikipedia where many eyeballs make all bugs shallow. Now I updated the truth table with the 1 1 0 case. And yes, thankfully the expressions are still equivalent.

And yeah, if those expressions didn't come from a source then they would in the strict sense be original research. But I have found all but one of them in different SHA1 libs that I have laying around. Thus at least most of them are well known and there are sources. Question is if we should bother to state which crypto libs use which variant? That is, to state some sources? Seems unnecessary to me since this is as you said "straight forward boolean function optimizations". And any one implementing SHA1 should anyway test their implementation carefully by comparing with test vectors and other crypto libs and should then discover any bugs in their implementation.

--David Göthberg 05:37, 28 January 2007 (UTC)

[edit] Explanation

The article really doesn't explain why there are so many SHA versions, is SHA-512 better (but slower to do) then SHA-1? If someone knows, please add it in. --ShaunMacPherson 12:52, 6 Nov 2004 (UTC)

(I'll try and check this then add it to the article): As I understand it, I believe that the different SHA versions are used to correspond to different block cipher key lengths, so that all the components in a cryptosystem provide the same level of security. Say you have a block cipher with an n-bit key size, then to match that security with a hash function that you use in the same system, you need a hash size of 2n, because even a perfect hash function is subject to birthday attacks with a complexity corresponding to the square root of the hash size — i.e. n if the hash size is 2n. So SHA-1 has 160-bits, which matches Skipjack (cipher) (keysize 80 bits). SHA-256, SHA-384 and SHA-512 match AES with its possible key sizes of 128, 192 and 256 bits. SHA-224 was introduced to match the security of two-key Triple DES (2×56=112-bit key). — Matt 17:58, 6 Nov 2004 (UTC)

[edit] Design questions

Does anyone have any insight into why one of the arguments of the rotations in the compression function is a factor of the other, and why the 30-bit left rotation (taken as a 2-bit right rotation) divides the 32-bit word size? I don't have time to analyze it in depth at the moment, but it seems under casual inspection that you would get better mixing if the arguments were coprime... User:Inkling 10:51, 17 Jan 2005 (UTC)

[edit] Extra SHA-pseudocode

An anonymous user has kindly contributed pseudocode for SHA-256, in addition to the pseudocode for SHA-1. This is great to have, but I'm concerned that the article is a little over-burdened with pseudo-code at present. We don't want to lose this information, so perhaps we could move the SHA-256 description elsewhere: maybe a SHA-256 pseudocode page, or onto WikiSource...what do people think? — Matt Crypto 19:18, 14 Mar 2005 (UTC)

[edit] External links

It doesn't matter much, but I think a good rule of thumb is to have only one good link for a certain type of thing -- "Wikipedia is not a link repository", and all that. If we already have one link to a Javascript SHA calculator, why do we need another? Eventually, the External Links section starts to clutter up as developers inevitably add more links to their particular SHA implementations. To me, it seems easier just to say, "we only need one of those, thanks" at the outset. — Matt Crypto 01:07, 18 Mar 2005 (UTC)

Yes, right now we have three different Javascript tools, one should be enough also in my humble opinion, because all of them do the same: they calculate a SHA1 from a user input. However, the Javascript tools don't really take the character encoding into account and they are not able to calculate the message digest from a file, but the Java tool can. IMHO a Wikipedia user should have the freedom to choose the tool she/he needs, because usually there is a good reason why she/he went to Wikipedia. Therefore I think we should keep not only one Javascript link, but also the C and the Java link (the Java link has been added by myself ;-) Jonelo 18:57, 18 Mar 2005 (UTC).

I've removed a few links that provide bad hashes. Here are the correct SHA-1 hashes for some characters online calculators usually get tripped up on:

' = bb589d0621e5472f470fa3425a234c74b1e202e8
" = 2ace62c1befa19e3ea37dd52be9f6d508c5163e6
# = d08f88df745fa7950b104e4a707a31cfce7b5841
\ = 08534f33c201a45017b502e90a800f1b708ebcb3
& = 7c4d33785daa5c2370201ffa236b427aa37c9996
? = 5bab61eb53176449e25c2c82f172b82cb13ffb9d
'"#\&? = 699b7d43c2ceb01791d98817222b96c35714d1f6

I'll get around to cleaning up that section, eventually. ~MDD 46 96 23:06, 8 February 2006 (UTC)

at least http://serversniff.net/hash.php should provide correct hashes now - thanks for the hint, Mdd4696! Thomas Springer 12:26, 24 February 2006 (UTC)

[edit] QUESTION

I'd like to know more about how these functions are used and why they are important. Is it true that you run one of these algorithms over some text and have a code that cannot be converted back to the text? How is that useful? What are some common applications?

If this is is explained elsewhere perhaps you could link me in the right direction.

How about the cryptographic hash function article, and the hash function article linked from there? Lunkwill 07:38, 8 September 2005 (UTC)

[edit] Pseudocode clarification.

This line in the code (both versions) is unclear.

"append length of message, in bits as 64-bit big-endian integer to message"

Does it refer to the original message length, or the message length after it has been padded and had the length added? I assume it is probably the original length. If so can someone change it to "appeng length of original message, in bits...."?

I thought it was pretty obvious when I wrote it. Yes, it's the length of the original message. What do others think about this request? -- Myria 04:30, 26 September 2005 (UTC)

I think "...append original length..." is better than "...append length...". Nahaj 12:24, 28 October 2005 (UTC)

[edit] Numbers, cant figure them out.

Taken from the article 25/09/05: "SHA-0 and SHA-1 produce a 160-bit digest"

If it's 160 bit, what does that mean ? does it mean that a input like "Red juice" has 160 bit diffrent outputs?

Taken from the article 25/09/05: "from a message with a maximum size of 2^64 bits" so i cant digest a message above 2^64 bit (2,3 exabytes) ?

Taken from the article 25/09/05: "At CRYPTO 98, two French researchers presented an attack on SHA-0 (Chabaud and Joux, 1998): collisions can be found with complexity 2^61; less than the 2^80 for an ideal hash function of the same size."

Where does 2^80 come from ? SHA0 is 160 bit, that would be 2^160 right ?

160 bit output means that the function mashes, stretches and squishes any input string into a 160-bit (20 byte) long output string. Correct, the hash functions won't work on messages longer than 2^64 bits (which, I might add, are rather hard to come by...). And 2^80 comes from the birthday paradox. Lunkwill 20:58, 25 September 2005 (UTC)

[edit] Optimizations

There are machines where a rotate or shift by N eats N cycles. Obviously the rotate left 30 could be replaced by a rotate right 2, speeding it up on such machines, and not slowing it on any machines I'm aware of. (: And there is a Validated implementation that does the rotate right 2 :) How much optimization is worth doing on pseudo code? (comment by Nahaj, tagged by Myria)

I personally draw the line at the point at which it ought to be the compiler's responsibility. The optimizations to the F-functions are universal - they improve the calculation on all systems, or at least grant a theoretical advantage (such as the associativity of addition). No reasonable compiler could look at the original F-functions and immediately deduce the rather odd changes needed to remove a boolean operation. However, using the symmetry of the rotate commands is very obvious and is easy to program a compiler to do. -- Myria 07:16, 28 October 2005 (UTC)
- On another note, I wonder if we should mention that the expression (x << y) | (x >> (N - y)), where N == sizeof(x) * 8, automatically gets optimized to a rotate instruction on many compilers for platforms that have it like x86-GCC. -- Myria 07:16, 28 October 2005 (UTC)

[edit] How much does adding salt increase security

I'm assuming the SHA-1 collisions were done without adding a salt to the original message. Does anyone know how much a salt of a given size would add to the difficulty of creating a collision? e.g. would adding a 16 bit salt to the original message increase the best collision algorithm iterations from 2 to the 63 power to maybe 2 to the 66 power? -- Jweiler 19:24, 10 January 2006

These answers are based on the attack they found in 2005. But note that I don't know how the new optimised attack works, but I heard it is similar. Also note that I am not a cryptanalyst so I might be totally wrong about this.

Short answer: No, in some usage cases adding a salt does not make the attack harder. But under some circumstances a salt can stop the attack entirely. It depends on when and where the salt is added.

Long answer: As I understood it the attack they found in 2005 on SHA-1 is similar to the MD5 attack from the same year. That means they are changing some bytes in a special position close to the beginning of the message. That is, the attack method changes the same position, no matter what message you are working with at the moment. That means if the attacker creates the message, then he can change bytes in that position to create two messages that collide.

Due to the nature of hashes if you append something to the end of two colliding messages you will get a new hash, but the same hash for both messages. So still a collision. Thus appending the same salt to both messages will keep the collision and thus not protect you.

If you prepend anything to the two messages the position of the changed bytes will move. That means the attack won't work at all. So the two messages will not collide any more. So if you insert a salt or a message header or whatever to the very beginning of the message then you are sort of ok. At least against the 2005 attack.

However, if the attacker at the time he creates the two messages knows what header you are going to prepend to the message he can take that into account when he makes the message pair. That is, he will add the same header and then manipulate bytes in the proper position to find colliding messages. (And then of course remove the header before he gives you the messages.)

So this means that you must create an unpredictable random salt AFTER the messages are given/uploaded/sent to you by the attacker. And prepend it to the message, not append it.

Another approach is to prepend a fixed header that is so long that it runs over the bytes that the attacker needs to manipulate to do the attack. Thus he can not manipulate those bytes. And the header can be well known and public.

But note, now that they found two attacks it is likely that they will find more attacks. For instance, I think there are many more positions further into the message where you can manipulate bytes. So these "fixes" are not even worth the trouble. You better use something better instead. I think the wise thing is to do like the TLS standard does: Use two different hashes and combine them. That approach is probably still secure even if there are some attacks against both the used hashes.

Oh, but of course salts are a good thing for many other reasons.

--David Göthberg 19:48, 30 October 2006 (UTC)

[edit] Removed "implementations" section

There are 370 implementations of SHA-1 according to the article. We can't list them all, and the most notable ones are the ones we don't need to link to because they are bundled with Perl, Python, Java and so forth. The only reasonable way to avoid this section growing without bound is to remove it altogether.

I think the same goes for "web-based tools" which I also removed, but perhaps I'm wrong? — ciphergoth 21:40, 19 February 2006 (UTC)

It would be nice if someone could find an online directory that lists the web-based tools, and link to that instead. While the section was getting quite cluttered, I did find the links useful. ~MDD 46 96 01:38, 20 February 2006 (UTC)

I don't see any reason why we can't arbitrarily pick one or two good implementations and web based tools. It's a natural next step for people reading the article. Lunkwill 08:07, 20 February 2006 (UTC)

I think that would have some advantages, but it's very hard to be arbitrary here. When someone wants to add their pet implementation to the list, what reason can we give for reverting it? — ciphergoth 09:28, 20 February 2006 (UTC)

See WP:EL. We can't link arbitrarily, and there are too many good online hashers available to list them all. ~MDD 46 96 23:44, 20 February 2006 (UTC)

We do seem to have a lot of software authors adding their particular hashing site/program to Wikipedia; a comparable situation is MD5. I think it would be useful to include a link to one or two implementations, but no more. The question is, how do we decide? I believe we can be arbitrary, but the easiest route is to pick a few links which seem to be amongst of the best of its type (as Jason suggests), and then be very strict with new additions. If the authors are upset or think it's unfair, then stuff 'em; there is no obligation for us to advertise their project at all. Most people shouldn't be adding their site anyway as per the WP:EL guideline. The ideal would be to have a link to a web site which lists implementations, but I don't think one exists for SHA. — Matt Crypto 17:07, 21 February 2006 (UTC)

I don't know how someone would know there there are 370 implimentations if there wasn't list somewhere.

I guess I am still trying to understand why this is a problem. I think that the implimentations section and the web-based tools are very helpful to programmers and others who are seeking to do a bit more hands-on research. Removing those sections would greatly reduce the value that this resource provides. Johnmaguire 19:04, 24 February 2006 (UTC)

The thing is, Wikipedia isn't a repository for any data that might be useful to someone; it's first and foremost an encyclopedia. An infinitely better place to list and maintain web links is the Open Directory Project; it's simply outside our scope. — Matt Crypto 19:13, 24 February 2006 (UTC)

I suspect that the 370 number comes from NIST's list of validated implementation at the time (Up to 450 now... although the removed implimentations list leaned to unvalidated versions for some reason. There are at least two open source implementations in NIST's list, so I'd expect that if you were going to list one or two they would be good candidates. (Disclaimer, I'm the author of one.) (My bias towards validated ones are because a number of flat out missimplementations exists on the net.) My personal opinion is that the article shouldn't have an implementations section, it should just link to NIST's list, since they have a legal mandate to keep it up to date... any list in the article is going to be rapidly out of date.Nahaj 02:24, 5 April 2006 (UTC)

I agree if we talk about very popular pages like SHA and MD5, because those algorihms are well known to the public already and too many links (both free and nonfree, both trivial and complex) have been posted in the past to those topics. Furthermore it is easy to find SHA and MD5 software by a seach engine easily today rather than using Wikipedia. However, for not so famous pages, a link where Wikipedia users can find source and/or software, make absolutely sense in my opinion. BTW, I absolutely disagree with your argument called "we don't need to link to because they are bundled with Perl, Python, Java", because actually in most cases those software is open source, platform independent and run also on entirely free platforms rather than just only on one particular OS. Jonelo 18:02, 29 March 2006 (UTC)

[edit] Licensing of the psuedo code

Can I derive proprietary code from the psuedo code on this page? I am concerned that the Free Document License won't allow it, but what is psuedo code for if not to be derived from? 168.215.177.66 21:37, 6 July 2006 (UTC)

[edit] Combining SHA1 / MD5

I am a cryptography novice, but wonder given the recent break of SHA1 and break of MD5 if the security of both can be improved by combining them?

For example, if the function SHA1(data+MD5(data)) were used (i.e. append the MD5 of the data to the data, and take the SHA1 of the combination), would it be a lot harder to find collisions? My (rudimentary) understanding of the collision finding process is that to break the algorithm a data stream to fill the internal state with specific data is desired - and that the complexity of finding this data stream depends on the size of the internal state. Thus by effectively adding MD5 internal state onto SHA1 this problem would become much harder.

This would be practical in the following (common) situation : you have to interface with a legacy application that does not support newer hash functions (e.g. SHA-2 family), and yet you are worried about the potential for a SHA-1 break (you don't use MD5, because breaking that is far easier according to the above quoted articles). A good example would be MySQL, which has internal functions for MD5 and SHA1, while stating in the manual "Exploits for the MD5 and SHA-1 algorithms have become known. You may wish to consider using one of the other encryption functions described in this section instead" - without providing any other cryptographic hash functions in version 5.0.

I assume (possibly naively) that the combination will be at least as strong as the outer function. Is this a reasonable assumption? Second, I intuitively think (which probably means it is wrong) that the combination will be stronger than the outer function. Any thoughts?

Shamus Husheer - s dot husheer at cantab dot net - 22:46, 30 August 2006

I've asked the gurus at sci.crypt under the thread "Probably naive question - SHA1 + MD5 combination" on 31 Aug 2006, and had the following answer:

(1) See http://ai.stanford.edu/~xb/crypto06b/index.html Boneh and Boyen's paper `On the impossiblity of efficiently combining collision resistant hash functions', which shows that if both functions are good hash functions, then combining them in this way is not useful,

(2) The only "at least as strong as" construction I [Kristian Gjøsteen] know of is "g(m) = h_1(m) || ... || h_n(m)" [appending one to the other]. If you find a collision in this hash function, you've also found a collision for all the constituent hash functions. Kristian also suggested a way to more practically break the SHA1(data+MD5(data)) function in the discussion.

Hopefully this helps anyone who has come searching for the same information.

Shamus - 06:59, 4 September 2006

Hi Shamus. I am not a cryptanalyst or so but have some knowledge. To me the method you suggest to combine two hashes seems pretty good: SHA1(data + MD5(data) ). But there is a better variant of it, see below. Another well known method is SHA1(data) XOR MD5(data). Of course, the most secure one is to concatenate the two hashes: SHA1(data) + MD5(data). However that gives a rather big hash and some systems don't use the whole result but instead just chop off enough bytes they want. Thus it is nice if all bits in the result depends on both hashes. So the two first methods here seems better for normal use. For the XOR method there is a slight problem that MD5 gives a 16 byte hash while SHA1 gives a 20 byte hash. If a program then chops of and uses the wrong end of the result then not all bits depend on both hashes. But that can be handled by reusing 4 bytes of the MD5 to XOR over the other part of the SHA1. I have seen many other suggested methods too. Here I list some of them including the three above:

hash = SHA1( data + MD5(data) )

hash = SHA1(data) XOR MD5(data)

longhash = SHA1(data) + MD5(data)

tmp = MD5(data), hash = SHA1(tmp + data + tmp)

hash = MD5(SHA1(data)) XOR SHA1(MD5(data))

hash = SHA1(MD5(data) + SHA1(data))

"+" here means concatenation.

Oh, and shamus and every one else, if you like to talk more about crypto come to the IRC chat #crypto on irc.freenode.net.

--David Göthberg 22:27, 21 September 2006 (UTC)

[edit] Great job!

I just wanted to thank everyone worked on this page because it's probably one of the bests I've found on the wiki. In particular, I needed to implement a checking routine for file integrity (non critical performance) and obviously started from CRC and Md5. I was very surprised SHA-512 to be "so easy" to implement (I'm not in cryptography at all) so I guess I'll give it a try.

As another note, maybe it would be good to note other methods to be "somewhat surpassed"... i bounced on many links before getting this one. I would do this myself, but I'm not really sure what's the best way to do it.
81.211.227.153 20:11, 21 September 2006 (UTC)

[edit] Are SHA hashes keyed?

Hi, I've got what is probably a basic question: Can you key SHA hashes? That is, using a secret key of some sort in the hash algorithm so that no one can calculate the hash in question without knowing the key used in the hashing? (That might be what the initializations are for in the pseudocode.) -- Joebeone (Talk) 16:15, 14 October 2006 (UTC)

I've poked around some more and it appears that hash functions are generally unkeyed... sorry for the clutter. -- Joebeone (Talk) 18:06, 14 October 2006 (UTC)

Don't be sorry, it is a good question. Yes, you can key any cryptographic hash function and that turns it into a message authentication code (MAC). The most popular (and also very secure) method to do that is the HMAC method. Usually when we send or store messages we do MAC them and thus make it "impossible" for an attacker to tamper with (change) our messages without us detecting it. --David Göthberg 05:32, 16 October 2006 (UTC)

[edit] ccv?

Hi,

in the pseudocode for SHA-2, someone changed "for i from 16 to 63" to "for i from 16 to 63ccv" again (I had removed "ccv"); before reverting it, does anyone think that it has a meaning? —Gennaro Prota ^•Talk 23:23, 26 November 2006 (UTC)

[edit] Date linking

Hi again,

would anyone object to delinking all year-only dates in the article? I don't think the links add anything relevant in this context (see Partial dates). —Gennaro Prota ^•Talk 06:27, 2 December 2006 (UTC)

[edit] "Applications" section

The "applications" section looks like it's starting to get out of hand – it already mentions some specific products where it's used, such as Xbox and git. If we were going to link all the products that SHA functions are used then the list would be incomprehensible. Thus, I'm removing these for now. -- intgr 04:00, 17 December 2006 (UTC)

I support the idea of not listing all the products supporting a specific feature. Being SHA a very common algo, the list would be huge. I believe this is not a problem right now and I think the actual few examples provide useful information, especially to the casual reader. I would consider adding back the notes: a few real world examples are important. The inclusion of another one should be considered on a case by case basis.
MaxDZ8 talk 10:18, 17 December 2006 (UTC)

I feel the same. The section didn't seem out of hand yet, though we should consider the long term: what about creating a list page? —Gennaro Prota ^•Talk 10:50, 17 December 2006 (UTC)

I agree, the list wasn't out of hand. It's true that listing every place where SHA is used would result in a huge list, but some real-world usage examples are needed. rst 14:37, 17 December 2006 (UTC)

I added a link to cryptographic hash function#Applications of hash functions, do you think that is not enough for background, for what hash functions are generally used for? There are a lot more notable uses of SHA algorithms than Xbox and git, which the article used to reflect. (While the git example was indeed useful, the Xbox example did not even mention where or how it was used)

As for a "list of" article: there used to be a category of programs that used the AES cipher, but it was deleted as it was considered useless. -- intgr 16:01, 17 December 2006 (UTC)

All good points, IMHO. I think we need further discussion. What I can notice is: a) the "Applications of hash functions" section gives (as you say, too) a general overview of hash functions applications. The section in this article instead was specifically about "notable usages" of SHA. So they had different intents. In practice a mention like the Xbox one is pretty much useless but a mention of Git like the one we had may well "open a whole new world" to the reader (that's something that has happened often to me: I reached some article only because of a -strictly non necessary- link from some other page and become passioned about the new subject... it's one of the strength of Wikipedias over more traditional encyclopedias) b) linking to a specific section of another article (or even of the same article) is a bit dangerous in that there's no effective way to detect if the link breaks (if the target article's section is renamed, for instance). —Gennaro Prota ^•Talk 16:32, 17 December 2006 (UTC)

[edit] Proununciation

For all these various tech acryonyms, providing a clue as to the common pronounciation would be helpful. Eg, for SHA-1, it's /shah-one/ or /shaw-one/ or the like. Perhaps providing the IPA or whatever the standard is for pronunciation. (I'll defer this to those more knowledgeable of wikipedia standards on the issue.) - Michael (talk|contrib) 02:08, 19 January 2007 (UTC)

Good question. How do you guys really pronounce SHA-1 etc in English? I know how we pronounce it in Swedish but not really what the most common way is in English. (In Swedish we spell it out so translated to English that would mean this pronunciation: Ess Eich Ei One. But it is much harder to say that in English.) --David Göthberg 12:40, 28 January 2007 (UTC)

[edit] New SHA-1 vulnerabilities

Professor Xiaoyun Wang, the same person who found the SHA-1 vulnerability in August 2005, has recently published a paper [2] that appears to make it feasible to find SHA-1 collisions in practice. I don't have the technical understanding to make sense of the paper myself, but perhaps someone else here can read it and update the article appropriately. 71.224.239.10 23:50, 20 January 2007 (UTC)

There is an (English) article about the subject. --JVersteeg 10:43, 21 January 2007 (UTC)

This article is useless and misleading, it doesn't mention a single relevant detail about the attack itself. -- intgr 15:23, 21 January 2007 (UTC)

The paper that you refer to, does not even talk about SHA-1. Rather it shows attaks against MD5, RIPEMD and SHA-0. How is this supposed to show a new SHA-1 vulnerability? The article in epochtimes is so badly written that it is not possible to determine whether they are talking about a new attack or one that has already been published. If there are no other reports of a new attack, then it is probably safe to assume that epochtime is just recycling an old story. 85.2.45.127 21:59, 27 January 2007 (UTC)

I've updated the page with my paper explaining the details of Wang's 2^63 attack on SHA-1. After over 2 years of lack of details about the methods, I decided to figure it out myself from her slides and type it up. Martin.cochran (talk) 23:49, 6 February 2008 (UTC)

[edit] VB implementation

I have trawled through John Taylor's VB implementation quite a bit, and I think it's unnecessarily complicated. I'm currently adapting it for my own purposes, and it quickly gets shorter, more efficient, and more readable as well. Shinobu 07:24, 19 April 2007 (UTC)

I'd agree that this code is too complicated and not efficient. Especially, the rotates implemented using loops must be killing performance. Even tutorial code should take advantage of existing language features and not neglect them. Real code is very often more educational than tutorial code, because the authors have spend some time optimizing the code. 85.2.38.56 10:00, 19 April 2007 (UTC)

That's not all :-) I've got it working, and I changed a lot. Of course I added my own bit of iffyness: since only a few rotates are used, I used three or so special functions like U32RotateLeft30 etc. The generic rotate needs a 2^n function, and although I did write a reasonably fast implementation for that, I decided in the end that it's a waste if not really used. The addition bit was shortened and I changed datatypes and such around. Partly because it's conventional to use Byte arrays for data, not Strings, and partly for efficiency and versatility (didn't tie myself to strings containing hex numbers). Perhaps I should post it somewhere. Thing is I'm half afraid I changed so much, that it might go wonky in some obscure situation. Anyway, it works for me, and in any case, people shouldn't trust random code grabbed from the web - and I would like the feedback. I'll think about it. Shinobu 12:03, 19 April 2007 (UTC)

I'm not sure if I understood you. A rotate left by n bits of a Uint32 variable x is usually implemented as (x<<n)|(x>>(32-n)). Decent compilers recognize this construction, optimize it and generate just one assembly instruction: a rotate. I'd expect to learn such implementation details from a good tutorial code. 85.2.114.9 13:05, 19 April 2007 (UTC)

You probably understood me... but what you perhaps don't know is that VB doesn't have native rotates and shifts, or unsigned operations. I'm not sure, but it probably was a design decision, like in Java. VB numbers are all signed (except bytes, which are unsigned, unlike in Java). It's a bit of a dilemma, really. Leave them out and you get people like us complaining and working around it, but leave them in and you get the rest of the world doing utterly wrong things with it. *sigh* Shinobu 22:32, 19 April 2007 (UTC)

Are you sure? I did check the online documentation before making my comments. VB does seem to support both unsigned int types and shifts since Visual Studio 2005 .net 2.0. I can't say how I'd implement these functions without unsigned support, because the VB documentation lacks details. But as you say, the sample code by Taylor can be improved in a number of places. Assuming, there are many people that use VB versions without unsigned support, it might indeed be interesting to see what the most efficient work arounds look like. 85.2.127.155 09:29, 20 April 2007 (UTC)

I'm pretty sure unsigneds were added in .NET, which is incompatible with old-style VB in a number of ways. For various reasons people sometimes stick with older versions of software. Reasons like compatibility, size, memory usage, etc. As for "the best"... how do you ascertain what the best is? I mean

Function U32Add(ByVal A As Long, ByVal B As Long) As Long
If (A Xor B) < 0 Then
    U32Add = A + B
Else
    U32Add = (A Xor &H80000000) + B Xor &H80000000
End If
End Function

looks okay, and certainly a lot better than

Public Function i32add(operanda As Long, operandb As Long) As Long
'//////////////////////////////////////////////////////////////////////////////////////
'does 32 bit add of two 32 bit numbers as if they were unsigned int32's
'this has to be done this way because of quirk in VB where an add overflow
'into the sign bit is kicked out as an error.  This would not be a problem if
'an unsigned int32 were allowed in VB!
'
'result=operanda + operandb
'
'use this function for (a+b)
'
'Not, And, Or, Xor all work the same for signed and unsigned int32 because there are
'no carries or borrows for VB to deal with
'//////////////////////////////////////////////////////////////////////////////////////
Dim operand_ax As Long
Dim operand_bx As Long
Dim upper_a As Integer
Dim upper_b As Integer
Dim result As Long
Dim topbits As Integer
'//////////////////////////////////////////////////////////////////////////////////////
'trim off offending bits
operand_ax = operanda And &H3FFFFFFF
operand_bx = operandb And &H3FFFFFFF
upper_a = ((operanda And &HC0000000) / &H40000000) And 3
upper_b = ((operandb And &HC0000000) / &H40000000) And 3

'do math on lower order bits
result = operand_ax + operand_bx

'do math on upper order bits
topbits = upper_a + upper_b

'if there was an overflow into upper 2 bits, increment the accumulator
If result And &H40000000 Then
  topbits = topbits + 1
End If
  
'get rid of an overflow into upper 2 bits in lieu of separate math below
result = result And &H3FFFFFFF

'now adjust the upper bits for the side calculation results
If topbits And 1 Then
  result = result Or &H40000000
End If
If topbits And 2 Then
  result = result Or &H80000000
End If

i32add = result

End Function

but how can you tell it's best? Shinobu 03:34, 21 April 2007 (UTC)

At some point it becomes necessary to know the constructs that can be compiled into efficient code by the compiler. E.g., your proposal, which btw is a really nice trick, could also be written as

Function U32Add(ByVal A As Long, ByVal B As Long) As Long
Dim HiBit as Long
HiBit = (A Xor B Xor &H8000000) And &H80000000
U32Add = ((A Xor HiBit) + B) Xor HiBit
End Function

Depending on how good the compiler is this should take 1-2 operations more on average, but it avoids an unpredictable jump and hence might be faster in some cases. However, I don't think all code has to be optimized completely. I'm still looking for some guidelines on what kind of code should be linked to in wikipedia and what should better be avoided. I'd think that if a reader generally could find improvements by comparing his code with some sample code then this sample code would be worth to be listed here. Otherwise there is no point of having a link. Taylor's code doesn't contain any non-obvious tricks, your code does. Hence your code would seem to be better qualified to be listed here. 85.2.69.234 18:36, 22 April 2007 (UTC)

I'd put it in my user space, but there might be a policy against linking from article space to user space. I'll ask around about what to do. Shinobu 10:18, 23 April 2007 (UTC)

Okay, please have a look here: http://vb.wikia.com/wiki/SHA-1.bas

Apparently they don't mind it being there. Before you add a link here, please read it through a few times. It works for me, but I'd rather be sure it works for others as well before the link is added here. Try a few test vectors too. You know what it's like... a little cut 'n paste error can cause horrible bugs. And sorry for the lack of comments. I don't like excessive commenting and most of the code speaks for itself. Shinobu 14:31, 9 May 2007 (UTC)

Your code looks definitively better to me than Taylor's code. I'd support to replace Taylor's code with this implementation. 84.73.231.90 21:26, 23 May 2007 (UTC)

[edit] SHA-1 Expansion

The crypto analysis is not exactly a cogent explanation of how the algorithm works. This article says nothing about the actual working of the SHA-1. I think this needs to be included. -Weedrat 09:40, 21 May 2007 (UTC)

Did you notice the pseudocode close to the end of the article? It looks good enough to me to actually implement SHA-1. 84.73.231.90 10:25, 21 May 2007 (UTC)

[edit] basic language question

The introduction describes SHA-1 as "a condensed digital representation...that is, to a high degree of probability, unique for a given input data sequence." Shouldn't it say "unique *to* a given input..." (i.e. there is likely only a single input that goes with that output) rather than "unique for" (i.e. there likely is only a single output for a given input)? The current use of "for" says simply that there's just one output for each input, which is true of many mathematical functions... MacScoop 18:27, 23 May 2007 (UTC)

Agreed, sounds better to me, though I am not a native English speaker. -- intgr ^#%@! 21:11, 23 May 2007 (UTC)

Sounds incorrect to me. Since there are many more possible inputs than outputs it is very likely that for every possible output there exists many inputs mapping to that output. The whole first sentence is incomprehensible. Hash functions are defined as functions for which it is computationally infeasible to invert them and to find collisions (as described later). 84.73.231.90 21:34, 23 May 2007 (UTC)

I copy-edited the intro to make things clearer, I hope.--agr 14:06, 30 July 2007 (UTC)

[edit] FIPS 180-3 Draft

The draft version of FIPS 180-3 is available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf. he comment period closes on September 10, 2007. Reid Hochstedler 19:11, 2 July 2007 (UTC)

[edit] Question

How do commitment schemes work? They seem handy, but complicated. How do I get a committed identity? ionas68224|talk|contribs|email 11:03, 21 July 2007 (UTC)

Alice takes a random input x, and computes h(x) where h() represents a cryptographic hash function. Alice then asks Bob to guess the first bit of x. Because h(x) is computationally too hard to invert, Bob has only 50% chance of guessing this bit correctly. After Bob gives Alice his guess, Alice reveals x. If Bob guessed the bit right, he wins this "coin flip", otherwise he loses. Note that if Alice can find a collision in h, Alice can always win by giving Bob messages x or x' depending on his guess. If the cryptographic hash function compresses the input and represents a random mapping, Bob cannot gain any information on the first bit of x due to information theory.

—Preceding unsigned comment added by 193.190.253.144 (talk • contribs)

- I wanted to know how to make h(x) to protect my identity. That was helpful, but that is how these are used. I want to know how to make one. ionas68224|talk|contribs|email 07:10, 2 August 2007 (UTC)

- - Maybe you're thinking of zero-knowledge proofs instead of bit commitment schemes? The latter are for commitment to values, but zero-knowledge proofs can be used to prove your identity. Or are you thinking about the use of hash functions in digital signatures? Your question is unclear to me, but I think I've given you enough links now to find the answer.

I think the questioner is referring to the technique described at Template:User committed identity.--agr 11:41, 2 August 2007 (UTC)

[edit] wehre are the links of the different SHA websites?

Hi. There are probably different links for SHA-1, SHA-2, SHA-512, etc. The problem is, what exactly are those links? The Template:User committed identity gives the link to SHA-1, and the article gives the links to a few, but where are the links to all the SHA websites, say for example where is the link for calculating SHA-512? Can someone list me the links of those websites, and where you got it from? I don't want links to sites other than those used to directly calculate a string into a hash code, so I don't want any sites with info about the SHA websites, I want the actual links themselves. I will also ask this question on the refernce desk. Thanks. ~A H 1^(TCU) 20:41, 4 October 2007 (UTC)

[edit] TOC on the right?

Is there a specific reason why the TOC is on the right?

It could be considered a minor issue, though when I opened the article, for a second I was like "errrrm... Where's the TOC?" (spatial memory anyone?), then I saw it was on the right, and it just seemed odd.

It's also not underneath the intro, but rather beside it. It seems inconsistent with about every other article I ever saw in WP (and I've been here for years) --W2bh 20:40, 15 November 2007 (UTC)

You're right—It should be on the left: default. The manual of style suggests right TOC placement for articles where a photo appears at the left (suggested for biographical articles with a right facing headshot). But there is no such consideration in this article. —EncMstr 20:47, 15 November 2007 (UTC)

I agree. --Morten LJ 09:31, 16 November 2007 (UTC)

[edit] This might be it...

The sentence "Gilbert and Handschuh (2003) have studied the newer variants and found no weaknesses" has a fact tag. It looks like it's referring to a paper, I did a quick search, and I think this might be it:

Henri Gilbert, Helena Handschuh: Security Analysis of SHA-256 and Sisters. Selected Areas in Cryptography 2003: 175-193.

Anyway, I did a quick google search and didn't find any other 2003 papers by them. Can someone with more knowledge on the topic who maybe is able to access the paper take a look and see? It'd be nice to get rid of that fact tag. Thanks, 4.21.209.231 (talk) 04:19, 1 January 2008 (UTC)

[edit] Merger proposal

I suggest that sha1sum be merged into this article, under an "implementations" section, or similar.
--Jerome Potts (talk) 19:01, 22 March 2008 (UTC)

I'm not opposed to this in theory, but the content should be seriously revised upon the merge. most of the content is meaningless uninformative or redundant, especially in the context of a subsection of this article. The revised content should also make an effort to establish that this is one of the most common tools used for calculating sha1 hashes at the execution level (a presumption on my part). Similar merges could be possibly proposed for md5sum and chksum. Those are my thoughts anyway. -Verdatum (talk) 20:37, 23 April 2008 (UTC)

[edit] Unclear/incorrect verbage

I'm only vaguely familiar with the subject matter so I hesitate to edit, but I think the first sentence under Cryptanalysis, "For a hash function which violates the first criterion listed above..." should have read, "For a hash function that does NOT violate the first criterion above..."

Here is where I'm on shakier ground, but does that statement not apply to all hashes, whether they violate the first criterion or not? I think 2^L is an upper bound, isn't it? A hash function that violates the first criterion (do they have names? if so we should use them instead of "first criterion" etc...) allows you to find a message for a given hash value in less than 2^L.

If that's correct, then the first sentence should read, "For a hash function which meets the first criterion listed above, finding a message that corresponds to a given message digest requires a brute force search in 2^L..."

Eshafto (talk) 15:34, 23 May 2008 (UTC)

Categories: To do