Hash collision

From Wikipedia, the free encyclopedia

In computer science, a hash collision or hash clash is a situation that occurs when two distinct inputs into a hash function produce identical outputs.

All hash functions have potential collisions, though with a well-designed hash function, collisions should occur less often (compared with a poorly designed function) or be more difficult to find. In certain specialized applications where a relatively small number of possible inputs are all known ahead of time it is possible to construct a perfect hash function which maps all inputs to different outputs. However, many hash functions, including most cryptographic hash functions, produce a fixed size output from an arbitrarily long message. In such a design, there will always be collisions, because any given hash has to correspond to a very large number of possible inputs.

Contents

[edit] In searching

Main article: Hash table

An efficient method of searching can be to process a lookup key using a hash function, then take the resulting hash value and then use it as an index into an array of data. The resulting data structure is called a hash table. As long as different keys map to different indices, lookup can be performed in constant time. When multiple lookup keys are mapped to identical indices, however, a hash collision occurs. The most popular ways of dealing with this are chaining (building a linked list of values for each array index), and open addressing (searching other array indices nearby for an empty space). Both of these, however, degrade the worst-case lookup complexity to linear time of the number of elements.

[edit] Collision resistance

Given: A hash function H, two passwords x and y.

Weak collision resistance: for a given x, it is hard to find a y \neq x such that H(x) = H(y). A user inputs a value, in this example a password, called initial value (x). If the hash function H is weakly collision resistant, the probability of finding a second password with the same hash value as the initial one is negligible in the output length of the hash function.


Strong collision resistance: it is hard to find any x and y such that H(x) = H(y). If the hash function H is strongly collision resistant, the probability of finding any two passwords with the same hash value is negligible in the output length of the hash function.

[edit] In cryptography

One desirable property of cryptographic hash functions is that it is computationally infeasible to find a collision. The value of a hash function can be used to certify an input is unchanged by publishing the signed value of the hash if it is not feasible to produce a collision. Feasible in this context refers to any algorithm with an asymptotic running time polynomial in the output length of the hash function, which is usually much faster than a brute-force birthday attack.

The process of finding two arbitrary values whose hashes collide is called a collision attack; the process of finding one arbitrary value whose hash collides with another, given hash is called a preimage attack. A successful preimage attack is a much more serious break than a successful collision attack.

[edit] See also

[edit] References

[edit] External links