Merkle-Damgård hash function
From Wikipedia, the free encyclopedia
In cryptography, the Merkle-Damgård hash function is a generic construction of a cryptographic hash function. All popular hash functions follow this generic construction.
A cryptographic hash function must be able to process an arbitrary-length message into a fixed-length output. This can be achieved by breaking the input up into a series of equal-sized blocks, and operating on them in sequence using a compression function that processes a fixed-length input into a shorter, fixed-length output. The compression function can either be specially designed for hashing or be built from a block cipher. In many cases, including the SHA-1 and SHA-0 ciphers, the compression function is based on a block cipher that is specially designed for use in a hash function. The Merkle-Damgård hash function is to break the input into blocks, and process them one at a time with the compression function, each time combining a block of the input with the output of the previous round.
The Merkle-Damgård construction was described in Merkle's Ph.D. thesis. [1] Ralph Merkle and Ivan Damgård independently proved that the structure is sound: if the compression function is collision-resistant, then the hash function will be also.[2][3] In order to prove that the construction is secure, Merkle and Damgård proposed that messages be padded with a padding that encodes the length of the original message. This is called Merkle-Damgård strengthening.
In the diagram, the compression function is denoted by f, and transforms a fixed length input to an output of the same size. The algorithm starts with an initial value, the initialization vector (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function f takes the result so far, combines it with the message block, and produces an intermediate result. The last block is padded with zeros as needed and bits representing the length of the entire message are appended. (This is called length padding, see below for detailed example.)
To harden the hash further the last result is then sometimes fed through a finalisation function. The finalisation function can have several purposes such as compressing a bigger internal state (the last result) into a smaller output hash size or to guarantee a better mixing and avalanche effect on the bits in the hash sum. The finalisation function is often built by using the compression function.
Contents |
[edit] Security characteristics
The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the compression function f is collision resistant, then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:
- Length extension — once an attacker has one collision, he can find more very cheaply.
- Second preimage attacks against long messages are always much more efficient than brute force.
- Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.
- "Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a random oracle.
[edit] Length padding example
Let's say the message to be hashed is "Wikipedia" and the block size of the compression function is 8 bytes (64 bits).
To be able to feed the message to the compression function the last block needs to be zero padded to a full block. So we get two blocks looking like this:
Wikipedi a0000000
But this is not enough since it would mean that for instance the message "Wikipedia00" would get the same hash sum. To prevent this a single "1" can be padded before the zeros. Like this:
Wikipedi a1000000
To harden the hash even further also the length of the message is added in an extra block. So we get three blocks like this:
Wikipedi a1000000 00000009
Now that is a bit wasteful since it means hashing one extra block. So there is a slight speed optimisation that most hash algorithms use. If there is space enough among the zeros padded to the last block the length value can instead be padded there. Like this:
Wikipedi a1000009
Note that to avoid confusion the hash algorithm must use a fixed bit-size for the length value, say 40-bit. So the length value padded in the end really is "00009" not just "9".
[edit] See also
- Cryptographic hash function
- Hash functions based on block ciphers
- Ralph Merkle - One of the two inventors of the Merkle-Damgård structure.
- Ivan Damgård - The other inventor of the Merkle-Damgård structure.
[edit] References
- ^ R.C. Merkle. Secrecy, authentication, and public key systems. Stanford Ph.D. thesis 1979, pages 13-15.
- ^ R.C. Merkle. A Certified Digital Signature. In Advances in Cryptology - CRYPTO '89 Proceedings, Lecture Notes in Computer Science Vol. 435, G. Brassard, ed, Springer-Verlag, 1989, pp. 218-238.
- ^ I. Damgård. A Design Principle for Hash Functions. In Advances in Cryptology - CRYPTO '89 Proceedings, Lecture Notes in Computer Science Vol. 435, G. Brassard, ed, Springer-Verlag, 1989, pp. 416-427.
Hash algorithms: Gost-Hash | HAS-160 | HAS-V | HAVAL | MDC-2 | MD2 | MD4 | MD5 | N-Hash | RadioGatún | RIPEMD | SHA family | Snefru | Tiger | VEST | WHIRLPOOL | crypt(3) DES |
MAC algorithms: DAA | CBC-MAC | HMAC | OMAC/CMAC | PMAC | UMAC | Poly1305-AES | VEST |
Authenticated encryption modes: CCM | EAX | GCM | OCB | VEST Attacks: Birthday attack | Collision attack | Preimage attack | Rainbow table | Brute force attack |
Standardization: CRYPTREC | NESSIE Misc: Avalanche effect | Hash collision | Hash functions based on block ciphers |
History of cryptography | Cryptanalysis | Cryptography portal | Topics in cryptography |
Symmetric-key algorithm | Block cipher | Stream cipher | Public-key cryptography | Cryptographic hash function | Message authentication code | Random numbers |