Double hashing

From Wikipedia, the free encyclopedia

Double hashing is a computer programming technique used in hashing to resolve hash collisions, cases when two different values to be searched for produce the same hash key. It is a popular collision-resolution technique in open-addressed hash tables. Like linear probing, it uses one hash value as a starting point and then repeatedly steps forward an interval (this interval is decided using another hash function; hence the name double hashing) to another address until the desired value is located, an empty location is reached, or the entire table has been searched. Double hashing avoids the problem of clustering, an unfortunately high probability of repeated collisions, by using varied step sizes.

This comes at the price of not being able to delete entries properly after a collision occurred for a certain slot. Instead, a delete marker known as a tombstone will be put in its place, filling up a slot and slightly degrading search performance. While tombstones can be reused on a later insertion, it may be prudent to do a little garbage collection on tables which have many tombstones especially if the table is being used to perform frequent lookup. Like all other forms of open addressing, double hashing becomes linear as the hash table approaches maximum capacity. The only solution to this is to rehash to a larger size.

Example : Given $h 1 (k)$ and $h 2 (k)$ are auxiliary hash functions $h(k,j)=(h_1(k) + j \cdot h_2(k))\mod m$ .

Double hashing can also be employed in Bloom filters to compute the requisite hash values associated with each element, but enhanced double hashing is superior in terms of accuracy (false positive probability).^{[citation needed]}