Linear probing
Linear probing is a scheme in computer programming for resolving hash collisions of values of hash functions by sequentially searching the hash table for a free location.[1] This is accomplished using two values - one as a starting value and one as an interval between successive values in modular arithmetic. The second value, which is the same for all keys and known as the stepsize, is repeatedly added to the starting value until a free space is found, or the entire table is traversed. (In order to traverse the entire table the stepsize should be relatively prime to the arraysize, which is why the array size is often chosen to be a prime number.)
- newLocation = (startingValue + stepSize) % arraySize
This algorithm, which is used in open-addressed hash tables, provides good memory caching (if stepsize is equal to one), through good locality of reference, but also results in clustering, an unfortunately high probability that where there has been one collision there will be more. The performance of linear probing is also more sensitive to input distribution when compared to double hashing, where the stepsize is determined by another hash function applied to the value instead of a fixed stepsize as in linear probing.
Given an ordinary hash function H(x), a linear probing function (H(x, i)) would be:
Here H(x) is the starting value, n the size of the hash table, and the stepsize is i in this case.
Dictionary operation in constant time
Using linear probing, dictionary operation can be implemented in constant time. In other words, insert, remove and find operations can be implemented in O(1), as long as the load factor of the hash table is a constant strictly less than one.[2] This analysis makes the (unrealistic) assumption that the hash function is completely random, but can be extended also to 5-independent hash functions.[3] Weaker properties, such as universal hashing, are not strong enough to ensure the constant-time operation of linear probing,[4] but one practical method of hash function generation, tabulation hashing, again leads to a guaranteed constant expected time performance despite not being 5-independent.[5]
See also
- Quadratic probing
- Collision resolution
References
- ↑ Dale, Nell (2003). C++ Plus Data Structures. Sudbury, MA: Jones and Bartlett Computer Science. ISBN 0-7637-0481-4.
- ↑ Knuth, Donald (1963), Notes on "Open" Addressing
- ↑ Pagh, Anna; Pagh, Rasmus; Ružić, Milan (2009), "Linear probing with constant independence", SIAM Journal on Computing 39 (3): 1107–1120, doi:10.1137/070702278, MR 2538852
- ↑ Pătraşcu, Mihai; Thorup, Mikkel (2010), "On the k-independence required by linear probing and minwise independence", Automata, Languages and Programming, 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6-10, 2010, Proceedings, Part I, Lecture Notes in Computer Science 6198, Springer, pp. 715–726, doi:10.1007/978-3-642-14165-2_60
- ↑ Pătraşcu, Mihai; Thorup, Mikkel (2011), "The power of simple tabulation hashing", Proceedings of the 43rd annual ACM Symposium on Theory of Computing (STOC '11), pp. 1–10, arXiv:1011.5200, doi:10.1145/1993636.1993638
External links
- How Caching Affects Hashing by Gregory L. Heileman and Wenbin Luo 2005.
- Open Data Structures - Section 5.2 - LinearHashTable: Linear Probing