Rolling hash
From Wikipedia, the free encyclopedia
A rolling hash is a hash function where the input is hashed in a window that moves through the input.
A few hash functions allow a rolling hash to be computed very quickly -- the new hash value is rapidly calculated given only the old hash value, the old value removed from the window, and the new value added to the window -- similar to the way a moving average function can be computed much more quickly than other low-pass filters.
One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash described below.
Another popular application is rsync program which uses Mark Adler's adler-32 checksum as its rolling hash.
[edit] Example of a rolling hash
Rabin-Karp use a very simple rolling hash function that only uses multiplications and additions. a is a randomly chosen large number.
H = c1 * ak-1 + c2 * ak-2 + c3 * ak-3 ... + ck * a0
In order to avoid manipulating huge H values, all math is done modulo n. The choice of a and n is critical to get good hashing.
Removing and adding chars simply involves adding or subtracting the first or last term. Shifting all chars by one position to the left requires multiplying the entire sum H by a. Shifting all chars by one position to the right requires dividing the entire sum H by a. Note that in modulo arithmetic, a has a multiplicative inverse a* by which H can be multiplied to get the result of the division without actually performing a division.
Good values for a/a* and n can be found in a paper by P. L'Ecuyer (see external links below).
[edit] External links
- P. L'Ecuyer, "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", Mathematics of Computation 68:225, 249–260 (1999). Available online at Citeseer.