Pearson hashing

From Wikipedia, the free encyclopedia

Pearson hashing[1][2] is a hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent[1] on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte lookup table containing a permutation of the values 0 through 255.

This hash function is a CBC-MAC that uses an 8-bit random block cipher implemented via the permutation table. An 8-bit block cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong; but it offers these benefits:

  • It is extremely simple.
  • It executes quickly on resource-limited processors.
  • There is no simple class of inputs for which collisions (identical outputs) are especially likely.
  • Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.

One of its drawbacks when compared with other hashing algorithms designed for 8-bit processors is the suggested 256 byte lookup table, which can be prohibitively large for a small microcontroller with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as T[i] = 255-i partly defeats the usability as a hash function as anagrams will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively.

The algorithm can be described by the following pseudocode, which computes the hash of message C using the permutation table T:

h := 0
for each c in C loop
  index := h xor c
  h := T[index]
end loop
return h

C implementation to generate 64-bit (16 hex byte) hash

  1.    unsigned char xPear16(unsigned char *x, int len) {
    
  2.       int h, i, j;
    
  3.       unsigned char ch, hex[20]="";
    
  4.  
    
  5.       struct { // to store h values
    
  6.          int a;
    
  7.       } hh[8];
    
  8.       struct { // 256 values 0-255 in any (random) order suffices
    
  9.          int a;
    
  10.       } T[256] = {
    
  11.        98,  6, 85,150, 36, 23,112,164,135,207,169,  5, 26, 64,165,219, //  1
    
  12.        61, 20, 68, 89,130, 63, 52,102, 24,229,132,245, 80,216,195,115, //  2
    
  13.        90,168,156,203,177,120,  2,190,188,  7,100,185,174,243,162, 10, //  3
    
  14.       237, 18,253,225,  8,208,172,244,255,126,101, 79,145,235,228,121, //  4
    
  15.       123,251, 67,250,161,  0,107, 97,241,111,181, 82,249, 33, 69, 55, //  5
    
  16.        59,153, 29,  9,213,167, 84, 93, 30, 46, 94, 75,151,114, 73,222, //  6
    
  17.       197, 96,210, 45, 16,227,248,202, 51,152,252,125, 81,206,215,186, //  7
    
  18.        39,158,178,187,131,136,  1, 49, 50, 17,141, 91, 47,129, 60, 99, //  8
    
  19.       154, 35, 86,171,105, 34, 38,200,147, 58, 77,118,173,246, 76,254, //  9
    
  20.       133,232,196,144,198,124, 53,  4,108, 74,223,234,134,230,157,139, // 10
    
  21.       189,205,199,128,176, 19,211,236,127,192,231, 70,233, 88,146, 44, // 11
    
  22.       183,201, 22, 83, 13,214,116,109,159, 32, 95,226,140,220, 57, 12, // 12
    
  23.       221, 31,209,182,143, 92,149,184,148, 62,113, 65, 37, 27,106,166, // 13
    
  24.         3, 14,204, 72, 21, 41, 56, 66, 28,193, 40,217, 25, 54,179,117, // 14
    
  25.       238, 87,240,155,180,170,242,212,191,163, 78,218,137,194,175,110, // 15
    
  26.        43,119,224, 71,122,142, 42,160,104, 48,247,103, 15, 11,138,239  // 16
    
  27.       };
    
  28.  
    
  29.       ch=x[0]; // save first byte
    
  30.       for (j=0; j<8; j++) {
    
  31.          // standard Pearson hash (output is h)
    
  32.          h=0;
    
  33.          for (i=0; i<len; i++) {
    
  34.             h=T[h ^ x[i]].a;
    
  35.          }
    
  36.          hh[j].a=h; // store result
    
  37.          x[0]=x[0]+1; // increment first data byte by 1
    
  38.       }
    
  39.       x[0]=ch; // restore first byte
    
  40.       // concatenate the 8 stored values of h
    
  41.       wsprintf(hex,"%02X%02X%02X%02X%02X%02X%02X%02X",
    
  42.          hh[0].a, hh[1].a,
    
  43.          hh[2].a, hh[3].a,
    
  44.          hh[4].a, hh[5].a,
    
  45.          hh[6].a, hh[7].a);
    
  46.       return hex; // output 64-bit 16 hex bytes hash
    
  47.    }
    

For a given string or chunk of data, Pearson's original algorithm produces only an 8 bit byte or integer, 0-255. But the algorithm makes it extremely easy to generate whatever length of hash is desired. The scheme used above is a very straightforward implementation of the algorithm. As Pearson noted: a change to any bit in the string causes his algorithm to create a completely different hash (0-255). In the code above, following every completion of the inner loop, the first byte of the string is incremented by one. x[0]=x[0]+1

Every time that simple change to the first byte of the data is made, a different Pearson hash, h, is generated. xPear16 builds a 16 hex byte hash by concatenating a series of 8-bit Pearson (h) hashes. Instead of producing a value from 0 to 255, it generates a value from 0 to 18,446,744,073,709,551,615.

Pearson's algorithm can be made to generate hashes of any desired length, simply by adding 1 to the first byte of the string, re-computing h for the string, and concatenating the results. Thus the same core logic can be made to generate 32-bit or 128-bit hashes.

References

  1. 1.0 1.1 Pearson, Peter K. (June 1990), "Fast Hashing of Variable-Length Text Strings", Communications of the ACM 33 (6): 677, doi:10.1145/78973.78978 
  2. Online PDF file of the CACM paper.
This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.