Pearson hashing
From Wikipedia, the free encyclopedia
This article is orphaned as few or no other articles link to it. Please help introduce links in articles on related topics. (June 2008) |
Pearson hashing[1] is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent[1] on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte lookup table containing a permutation of the values 0 through 255.
This hash function is a CBC-MAC that uses an 8-bit random block cipher implemented via the permutation table. An 8-bit block cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong; but it offers these benefits:
- It is extremely simple.
- It executes quickly on resource-limited processors.
- There is no simple class of inputs for which collisions (identical outputs) are especially likely.
- Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.
The algorithm was originally described by the following pseudocode, which computes the hash of message C using the permutation table T and the auxiliary array h:
h[0] := 0
for i in 1..n loop
index := h[i-1] xor C[i]
h[i] := T[index]
end loop
return h[n]
In the Python programming language, the hash algorithm can be implemented as follows (assuming that permutation_table
is defined externally):
def hash(input): h = 0 for ch in input: h = permutation_table[h ^ ord(ch)] return h
[edit] References
- ^ a b "Fast Hashing of Variable-Length Text Strings". Peter K. Pearson, Communications of the ACM 33(6), 677 (1990) — ACM full text (requires subscription)