Kademlia
From Wikipedia, the free encyclopedia
Kademlia is a distributed hash table designed by Petar Maymounkov and David Mazières, for decentralized peer to peer computer networks. It specifies the structure of the network, regulates communication between nodes and how the exchange of information has to take place. Kademlia nodes communicate among themselves using the transport protocol UDP (see OSI model). Kademlia nodes store data by implementing a distributed hash table. Over an existing LAN/WAN (like the Internet), a new virtual or overlay network is created in which each node is identified by a number ("Node ID"). This number serves not only as its identification, but the Kademlia algorithm uses it for further purposes.
A node that would like to join the net must first go through a bootstrap process. In this phase, the node needs to know the IP address of another node (obtained from the user, or from a stored list) that is already participating in the Kademlia network. If the bootstrapping node has not yet participated in the network, it computes a random ID number that is not already assigned to any other node. It uses this ID until leaving the network.
The Kademlia algorithm is based on the calculation of the "distance" between two nodes. This distance is computed as the exclusive or of the two node IDs, taking the result as an integer number.
This "distance" does not have anything to do with geographical conditions, but designates the distance within the ID range. Thus it can and does happen that, for example, a node from Germany and one from Australia are "neighbours".
Information within Kademlia is stored in so called "values", every value being attached to a "key".
When searching for some key, the algorithm explores the network in several steps, each step approaching closer to the searched-for key, until the contacted node returns the value, or no more closer nodes are found. The number of nodes contacted during the search is only marginally dependent on the size of the network: If the number of participants in the net doubles in number, then a user's node must query only one more node per search, not twice as many.
Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes".
[edit] Use in file sharing networks
Kademlia has been used for file sharing. By making Kademlia keyword searches, one can find information in the file-sharing network so it can be downloaded. Since there is no central instance to store an index of existing files, this task is divided evenly among all clients: If a node wants to share a file, it processes the contents of the file, calculating from it a number (hash) that will identify this file within the file-sharing network. The hashes and the node IDs must be of the same length. It then searches for several nodes whose ID is close to the hash, and has his own IP address stored at those nodes. A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the filehash, then will retrieve the contacts list that is stored in that node. The contacts stored in the network constantly change as nodes connect and disconnect. Due to built-in redundancy, the contacts are replicated in several nodes.
The filehash is usually obtained from a specially formed internet link found elsewhere, or included within an indexing file obtained from other sources.
Filename searches are implemented using keywords. The filename is divided into its constituent words. Each of these keywords is hashed and stored in the network, together with the corresponding filename and filehash. A search involves choosing one of the keywords, contacting the node with an ID closest to that keyword hash, and retrieving the list of filenames that contain the keyword. Since every filename in the list has its hash attached, the chosen file can then be obtained in the normal way.
Public clients using the Kademlia algorithm (these networks are incompatible with one another):
- Overnet network: Overnet (integrated in eDonkey (no longer available) and MLDonkey)
- Kad Network: eMule (starting from v0.40), MLDonkey (starting from 2.5-28) and aMule (starting from 2.1.0).
- RevConnect - see also [1] (starting with version 0.403)
- KadC: [2] C library to publish and retrieve information to/from the Overnet network
- BitTorrent Azureus DHT: Azureus (2.3.0.0+): uses a modified Kademlia implementation for decentralized tracking and various other features like the "Comments and Ratings" plugin.[3]
- BitTorrent Mainline DHT: BitTorrent client (4.1.0+), µTorrent (1.2+), BitSpirit (3.0+) and BitComet (0.59+): They all share a DHT based on an implementation of the Kademlia algorithm, for trackerless torrents.
[edit] External links
- "Kademlia: A Peer to peer information system Based on the XOR Metric", http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf
- "Kademlia: The optimized Peer network", http://kademlia.scs.cs.nyu.edu/ (possibly dead, mirror)
- Petar Maymounkov's Academic Home Page
- Petar Maymounkov - Home Page
- Yi Qiao and Fabian E. Bustamante USENIX 2006 paper with an interesting characterization of Overnet and Gnutella.
- Khashmir: Python implementation of Kademlia