Pearson hashing

From Wikipedia, the free encyclopedia

Pearson Hashing^[1] is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte table containing a permutation of the values 0 through 255.

This hash function is a CBC-MAC that uses an 8-bit random block cipher implemented via the permutation table. An 8-bit block cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong; but it offers these benefits:

It is extremely simple.
It executes quickly on very limited processors.
There is no simple class of inputs for which collisions (identical outputs) are especially likely.
Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.

The algorithm was originally described by the following pseudocode, which computes the hash of message C using the permutation table T and the auxiliary array h:

h[0] := 0 ;
for i in 1..n loop
    h[i] := T[ h[i-1] xor C[i] ] ;
end loop ;
return h[n] ;

In the Python programming language, the hash algorithm can be implemented as follows (assuming that permutation_table is defined externally):

def hash(input):
    hash_value = len(input) % 256
    for character in input:
        character = ord(character)  # ord() gets the character's byte value
        hash_value = permutation_table[hash_value ^ character]
    return hash_value