MurmurHash
MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.[1][2][3] It was created by Austin Appleby in 2008,[4][5] and exists in a number of variants,[6] all of which have been released into the public domain. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.[7]
Variants
The current version is MurmurHash3,[8][9] which yields a 32-bit or 128-bit hash value.
The older MurmurHash2[10] yields a 32-bit or 64-bit value. Slower versions of MurmurHash2 are available for big-endian and aligned-only machines. The MurmurHash2A variant adds the Merkle–Damgård construction so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash2-160 generates the 160-bit hash, and MurmurHash1 is obsolete.
Implementations
The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,[11] C,[12] C#,[9][13] Perl,[14] Ruby,[15] PHP,[16] Haskell,[17] Scala,[18] Java,[19][20] Erlang,[21] and JavaScript.[22][23]
It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), Perl,[24] nginx (ver 1.0.1),[25] Rubinius,[26] libmemcached (the C driver for Memcached),[27] maatkit,[28] Hadoop,[1] Kyoto Cabinet,[29] RaptorDB,[30] OlegDB,[31] and Cassandra.[32]
Algorithm
Murmur3_32(key, len, seed) // Note: In this version, all integer arithmetic is performed with unsigned 32 bit integers. // In the case of overflow, the result is constrained by the application of modulo arithmetic. c1 0xcc9e2d51 c2 0x1b873593 r1 15 r2 13 m 5 n 0xe6546b64 hash seed for each fourByteChunk of key k fourByteChunk k k * c1 k (k << r1) OR (k >> (32-r1)) k k * c2 hash hash XOR k hash (hash << r2) OR (hash >> (32-r2)) hash hash * m + n with any remainingBytesInKey remainingBytes SwapEndianOrderOf(remainingBytesInKey) // Note: Endian swapping is only necessary on big-endian machines. // The purpose is to place the meaningful digits towards the low end of the value, // so that these digits have the greatest potential to affect the low range digits // in the subsequent multiplication. Consider that locating the meaningful digits // in the high range would produce a greater effect upon the high digits of the // multiplication, and notably, that such high digits are likely to be discarded // by the modulo arithmetic under overflow. We don't want that. remainingBytes remainingBytes * c1 remainingBytes (remainingBytes << r1) OR (remainingBytes >> (32 - r1)) remainingBytes remainingBytes * c2 hash hash XOR remainingBytes hash hash XOR len hash hash XOR (hash >> 16) hash hash * 0x85ebca6b hash hash XOR (hash >> 13) hash hash * 0xc2b2ae35 hash hash XOR (hash >> 16)
References
- ↑ 1.0 1.1 "Hadoop in Java". Hbase.apache.org. 24 July 2011. Retrieved 13 January 2012.
- ↑ Chouza et al.
- ↑ "Couceiro et al." (PDF) (in (Portuguese)). Retrieved 13 January 2012.
- ↑ "MurmurHash on GooglePages". Murmurhash.googlepages.com. Retrieved 13 January 2012.
- ↑ Tanjent (tanjent) wrote,3 March 2008 13:31:00. "MurmurHash first announcement". Tanjent.livejournal.com. Retrieved 13 January 2012.
- ↑ "MurmurHash2-160". Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012.
- ↑ "Which hashing algorithm is best for uniqueness and speed". stackexchange.com.
- ↑ "MurmurHash3 on smhasher".
- ↑ 9.0 9.1 Horvath, Adam (Aug 10, 2012). "MurMurHash3, an ultra fast hash algorithm for C# / .NET".
- ↑ "MurmurHash2 on smhasher".
- ↑ "pyfasthash in Python". Google. Retrieved 13 January 2012.
- ↑ "C implementation in qLibc by Seungyoung Kim".
- ↑ Landman, Davy. "Davy Landman in C#". Landman-code.blogspot.com. Retrieved 13 January 2012.
- ↑ "Toru Maesaka in Perl". metacpan.org. Retrieved 13 January 2012.
- ↑ Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org> (3 May 2009). "Ruby". Rubyforge.org. Retrieved 13 January 2012.
- ↑ "Murmurhash3 PHP extension". Murmur.vaizard.org. Retrieved 13 January 2012.
- ↑ "Haskell". Hackage.haskell.org. Retrieved 13 January 2012.
- ↑ "Scala standard library implementation". 14 December 2012.
- ↑ MurmurHash3 in Java, part of Guava
- ↑ Derek Young in Java, public domain
- ↑ MurmurHash3 in Erlang
- ↑ raycmorgan (owner). "Javascript implementation by Ray Morgan". Gist.github.com. Retrieved 13 January 2012.
- ↑ garycourt. "MurmurHash.js by Gary Court". Github.com. Retrieved 13 January 2012.
- ↑ "perl5176delta". Retrieved 31 December 2012.
- ↑ "nginx". Retrieved 13 January 2012.
- ↑ "Rubinius". Retrieved 29 February 2012.
- ↑ libmemcached
- ↑ "maatkit". Google. 24 March 2009. Retrieved 13 January 2012.
- ↑ "Kyoto Cabinet specification". Fallabs.com. 4 March 2011. Retrieved 13 January 2012.
- ↑ Gholam, Mehdi (13 November 2011). "RaptorDB CodeProject page". Codeproject.com. Retrieved 13 January 2012.
- ↑ "OlegDB Documentation". Retrieved 24 January 2013.
- ↑ "Partitioners". apache.org. 2013-11-15. Retrieved 2013-12-19.