Primary clustering
From Wikipedia, the free encyclopedia
Primary clustering is the tendency for certain open-addressing hash tables collision resolution schemes to create long sequences of filled slots. It is most commonly referred to in the context of problems with linear probing.
These long chains degrade the hash-table performance closer to O(n) performance instead of closer to O(1).
Consider the linear probing hash function h(k, i) : (h`(k) + i)mod N. With k being the key, i the probing-iteration, N being the number of slots in the hash-table and h`(k) being the secondary-hash function.
Thus the probing sequence for key k is: {h`(k), h`(k) + 1, h`(k) + 2, ..., h`(k) + n}. It is easy to see that the probing sequences for two different keys may overlap and create clustering.
Also since the probability of each slot being already full is equal to the load-factor of the hash-table, as the factor approaches 1.0 so does the chance that the next slot in the probing sequence will be full, this degrades performance of the hash-table to linear time.
Note: clusters of filled slots can grow exponentially in some cases. For example, imagine two large clusters of filled slots separated by one empty slot. Placing an entry into this slot will double the cluster size.