MurmurHash

From Wikipedia, the free encyclopedia

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008,[4][5] and exists in a number of variants,[6] all of which have been released into the public domain. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.[7]

Variants

The current version is MurmurHash3,[8][9] which yields a 32-bit or 128-bit hash value.

The older MurmurHash2[10] yields a 32-bit or 64-bit value. Slower versions of MurmurHash2 are available for big-endian and aligned-only machines. The MurmurHash2A variant adds the Merkle–Damgård construction so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash2-160 generates the 160-bit hash, and MurmurHash1 is obsolete.

Implementations

The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[11] C,[12] C#,[9][13] Perl,[14] Ruby,[15] PHP,[16] Haskell,[17] Scala,[18] Java,[19][20] Erlang,[21] and JavaScript.[22][23]

It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), Perl,[24] nginx (ver 1.0.1),[25] Rubinius,[26] libmemcached (the C driver for Memcached),[27] maatkit,[28] Hadoop,[1] Kyoto Cabinet,[29] RaptorDB,[30] OlegDB,[31] and Cassandra.[32]

Algorithm

Murmur3_32(key, len, seed)
    // Note: In this version, all integer arithmetic is performed with unsigned 32 bit integers.
    //       In the case of overflow, the result is constrained by the application of modulo 2^{{32}} arithmetic.
    
    c1 \gets  0xcc9e2d51
    c2 \gets  0x1b873593
    r1 \gets  15
    r2 \gets  13
    m \gets  5
    n \gets  0xe6546b64
 
    hash \gets  seed

    for each fourByteChunk of key
        k \gets  fourByteChunk

        k \gets  k * c1
        k \gets  (k << r1) OR (k >> (32-r1))
        k \gets  k * c2

        hash \gets  hash XOR k
        hash \gets  (hash << r2) OR (hash >> (32-r2))
        hash \gets  hash * m + n

    with any remainingBytesInKey
        remainingBytes \gets  SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.
        
        remainingBytes \gets  remainingBytes * c1
        remainingBytes \gets  (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes \gets  remainingBytes * c2

        hash \gets  hash XOR remainingBytes
 
    hash \gets  hash XOR len

    hash \gets  hash XOR (hash >> 16)
    hash \gets  hash * 0x85ebca6b
    hash \gets  hash XOR (hash >> 13)
    hash \gets  hash * 0xc2b2ae35
    hash \gets  hash XOR (hash >> 16)

References

  1. 1.0 1.1 "Hadoop in Java". Hbase.apache.org. 24 July 2011. Retrieved 13 January 2012. 
  2. Chouza et al.
  3. "Couceiro et al." (PDF) (in (Portuguese)). Retrieved 13 January 2012. 
  4. "MurmurHash on GooglePages". Murmurhash.googlepages.com. Retrieved 13 January 2012. 
  5. Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012. 
  6. "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012. 
  7. "Which hashing algorithm is best for uniqueness and speed". stackexchange.com. 
  8. "MurmurHash3 on smhasher". 
  9. 9.0 9.1 Horvath, Adam (Aug 10, 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET". 
  10. "MurmurHash2 on smhasher". 
  11. "pyfasthash in Python". Google. Retrieved 13 January 2012. 
  12. "C implementation in qLibc by Seungyoung Kim". 
  13. Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012. 
  14. "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012. 
  15. Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org> (3 May 2009). "Ruby". Rubyforge.org. Retrieved 13 January 2012. 
  16. "Murmurhash3 PHP extension". Murmur.vaizard.org. Retrieved 13 January 2012. 
  17. "Haskell". Hackage.haskell.org. Retrieved 13 January 2012. 
  18. "Scala standard library implementation". 14 December 2012. 
  19. MurmurHash3 in Java, part of Guava
  20. Derek Young in Java, public domain
  21. MurmurHash3 in Erlang
  22. raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012. 
  23. garycourt. "MurmurHash.js by Gary Court". Github.com. Retrieved 13 January 2012. 
  24. "perl5176delta". Retrieved 31 December 2012. 
  25. "nginx". Retrieved 13 January 2012. 
  26. "Rubinius". Retrieved 29 February 2012. 
  27. libmemcached
  28. "maatkit". Google. 24 March 2009. Retrieved 13 January 2012. 
  29. "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012. 
  30. Gholam, Mehdi (13 November 2011). "RaptorDB CodeProject page". Codeproject.com. Retrieved 13 January 2012. 
  31. "OlegDB Documentation". Retrieved 24 January 2013. 
  32. "Partitioners". apache.org. 2013-11-15. Retrieved 2013-12-19. 

See also

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.