Pearson hashing
From Wikipedia, the free encyclopedia
Pearson Hashing[1] is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte table containing a permutation of the values 0 through 255.
This hash function is a CBC-MAC that uses an 8-bit random block cipher implemented via the permutation table. An 8-bit block cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong; but it offers these benefits:
- It is extremely simple.
- It executes quickly on very limited processors.
- There is no simple class of inputs for which collisions (identical outputs) are especially likely.
- Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.
The algorithm was originally described by the following pseudocode, which computes the hash of message C using the permutation table T and the auxiliary array h:
h[0] := 0 ; for i in 1..n loop h[i] := T[ h[i-1] xor C[i] ] ; end loop ; return h[n] ;
In the Python programming language, the hash algorithm can be implemented as follows (assuming that permutation_table is defined externally):
def hash(input): hash_value = len(input) % 256 for character in input: character = ord(character) # ord() gets the character's byte value hash_value = permutation_table[hash_value ^ character] return hash_value
[edit] References
- ^ Fast Hashing of Variable-Length Strings. P.K. Pearson, Commun. of the ACM 33, 677 (1990)