Talk:Kademlia

From Wikipedia, the free encyclopedia

Contents

[edit] Overlays

Here is a list of P2P Overlays

Overlays.

Is overlay the correct word? --ShaunMacPherson 19:24, 24 Sep 2004 (UTC)

Overlay is the correct word used by academics. The link given is of DHT (or structured) P2P overlays. Other unstructured P2P overlays exist, such as Gnutella. Non-P2P overlays exist such as the Skype network, etc. See also Structured versus Unstructured. Bpringlemeir 19:37, 18 September 2007 (UTC)

[edit] How does it work?

I read the white paper and parts of it are very difficult to understand. Can some one create an in depth description of how this network works, but in plain English. (or at least put a hyper link to such a description?)

I agree. Currently this article doesn't make any sense. It's just a bunch of hand waving.

It's a rather popular design for peer-to-peer networked distributed hash tables (see also: hash table). In short, hash tables provide the functionality of efficiently resolving abstract keys into values, and the insertion of these key-value pairs. Distributed hash tables just provide this functionality over a network of nodes. For example, a key could be a search keyword and its value would be a single article/file it matched. The same key can be used in multiple key-value pairs, so a single search keyword can return several results. This "primitive operation" by itself might sound pretty trivial, but it allows one to build very complex functionality on top. So, Kademlia is just a design for implementing these kinds of distributed hash tables. Does this make sense? -- intgr 19:45, 29 September 2006 (UTC)
I have inserted a section called "System Details". This tries to explain how Kademlia works. I will use just a binary mode in the description. I think the XOR notation confuses people. At the end, in the "accelerated lookup", it can be shown that the XOR metric allow the binary tree to expand to higher degree trees. There are a lot of nit-picky details to handle pathological cases in Kademlia. I am not sure if I have too much detail now, or not enough. Also, I think the introduction is too long. Parts of the "file sharing" could be moved to the details. Bpringlemeir 15:50, 14 April 2007 (UTC)
I took the original article (a mess) and rewrote it by creating a separation between Kademlia itself and it uses (File Sharing). Did not attempt to go into much detail because it is too complicated to explain and I was afraid of destroying intelligibility. Just wanted to give readers a hint of what is going on. Now I read your (excellent) work and I doubt if this separation is to be mantained, since you went into File Sharing specifics in your description of the algorithm. I agree that its time for some further reorganization. Should the general description have comments here and there about the different implementations?
Some notes:
  • Kademlia is a distributed database that can be used to store any kind of information. Storing pointers to files is a specific implementation.
  • The concept of "distance" is basic to understand Kademlia. I find the entry "Routing Tables" very difficult to grasp without having the distance concept. A simple description of distance should be taken out of "Accelerated lookups" and placed at the beggining, as a part of the list.
  • FIND_VALUE Doesn't return a pointer, but the actual value. Or many values, depending on implementations.
  • You never lookup for a node, but for the closest nodes to some ID.
  • You mention the caching mechamism (storing values futher away from original k nodes). You do not mention the replication mechanism. In some implementations (Overnet, eMule) the lack of replication is compensated by periodical STORE from the storer node and by not ending a FIND_VALUE when the first node returns one value, but exhaustively querying all nodes close to the desired ID. These compensations are not part of the Kademlia paper. These two P2P networks do not use caching either.
  • While every node ID should be different (according to the paper), implementations do not care and create lots of duplicates. Kademlia is resilient enough to work well even then. It even may compensate a bit for the lack of caching.
  • Node-splitting is just implementation, a good idea if binary trees are used, but useless if you use a table (memory is cheap). Shouldn't this be explained in more general terms? (I know it is like this in the original papers) Again, no hope of understanding its use without understanding distance first.
  • Prefixes. Your description is absolutely correct. But the two last sentences confused me:
  • "These are maximum values and the average value will be far less due to redundancy, increasing the chance of finding a node in a k-bucket that share more than just the prefix with the target". Not due to redundancy but to sheer luck. Statistically you cannot expect to go always along the longest path, but this is not redundancy.
  • "Nodes can use mixtures of prefixes in their routing table, such as the Kad Network used by eMule. The Kademlia network could even be heterogeneous in routing table implementations. This would just complicate the analysis of lookups". Kad uses single bit buckets. Am I correct? Kad goes complicated at the Value side, but buckets are very simple. Are there any implementations with heterogeneous prefixes? If not, I think this sentence just complicates something that is not very simple already.
Now I'll try to improve the article a bit based on your words. Of course, feel free to revert anything incorrect! If you disagree with anything I said, please comment under it. [Apparently by 83.41.82.122? Bpringlemeir 04:23, 20 June 2007 (UTC)]
Sorry, I didn't see your talk entry before I edit some things. You have some good points. I think that what you wrote about RTT might be correct for some implementation. However, general Kademlia doesn't really say anything about it. You have good information, but paired with what I wrote, i think it is too much. However, I have lost objectivity. As I know more about Kadmelia, I might not think some things need explaining now. I did find the parts added to accelerated lookups was wrong (or at least highly confusing). I agree that some re-organization is still needed. I found the concept of distance confused the issue for me when I started. Note, that when you are using accelerated lookups, it is not so much counting the bits. The "XOR metric" is abstract sounding and would sound rather "alien" to many people. However, the concept of bits is easier to understand. The "XOR metric" is only really necessary to describe accelerated lookups. It partitions from a binary tree to a quad-tree, octo-tree, etc. I will have to look at the rest of your comments later. Bpringlemeir 04:32, 20 June 2007 (UTC)
I gave a cursory read at your modifications and they look good. XOR must be mentioned in the article and given the proper importance. There is no need to understand XOR to understand Kademlia, but what differentiates Kademlia from other distributed hash tables is, precisely, XOR usage. MUST be mentioned. A section about "algorith details" at the bottom might be its place.
Regarding prefixes, please look at the paper by Daniel Stutzbach at the bottom of the article. Redundant meant the minimum 'k' to provide a full map of the network. Ie, k=1. Anyways, the first k-bucket covers 1/2 the network, but you might have an entry in the k bucket that is closer to the target. It is discussed in the paper. Bpringlemeir 04:40, 20 June 2007 (UTC)
Kademlia is highly redundant in the sense that there is not one single path to find a Key, but many many possible ones. Agreed. If mentioned in this sense, this should be clear enough so the reader understands what is being ment by redundance. This is my point. "redundantly interwined" sounds good?

[edit] Other uses

  • A non-p2p use found it's way into I2P (they mentioned Kademlia anyway). This is one of the scalability projects i am currently studying. If i get more information on such usage i will certainly contribute it to this article.
--Sarah (Kuro) 22:43, 27 September 2005 (UTC);

[edit] Protocol

In my understanding, Kademlia is not a network protocol. It is an algorithm (or "system", as the authors of the original article call it) that can be implemented as a network protocol. I have edited the article to reflect this belief. If I'm mistaken, please let me know. Strait 04:39, 9 April 2006 (UTC)

The paper refers to it as a system. There are protocol details, algorithms and data structures. Some parts are a bit sketchy, like how the unbalanced binary tree is handled and splitting k-buckets that aren't in the nodes range. Bpringlemeir 15:57, 14 April 2007 (UTC)

[edit] Name

What is the derivation of the name? Roberthoff82 15:24, 3 February 2007 (UTC)

Yeah, what on earth does it mean? Maikel 09:27, 21 May 2007 (UTC)
I believe it was named after the mountain in Bulgaria. --PacoBell (talk) 21:28, 13 May 2008 (UTC)

[edit] How Kademlia start a node

kadc need some working peers stored in a file to start a node, but what about utorrent ? —The preceding unsigned comment was added by 217.44.131.209 (talk) 23:46, 18 March 2007 (UTC).

For trackerless torrents, there are several peers included in the torrent. At least this is what some torrents downloaded from Gnutella show when dumped with "torrentinfo" in the Python Mainline code. I think you are referring to bittorrent and not utorrent. In this case, the bootstrap nodes are in the torrent file (afaik). Bpringlemeir 15:54, 14 April 2007 (UTC)

[edit] Intro

I think the intro is overly long. There is the section above by "Intgr" under "How does this work". I also have an analogy in comments in the main article under "System details". The 1st, 2nd, 3rd generation comment might also be more suitable for an intro. Ie, How does Kademlia differ from other P2p networks? The main thing is that information pointers are distributed throughout the network. Bpringlemeir 16:28, 14 April 2007 (UTC)


I have reworded the introduction to not include details like the OSI layer and some redundant wording. For example, repeating Distributed hash table. Distance information was moved to "System Details". Bpringlemeir 19:04, 28 April 2007 (UTC)