Talk:Kd-tree
From Wikipedia, the free encyclopedia
[edit] Kd-tree should be named kd-trie and vice versa
- If the term kd-trie is accepted, then the current kd-tree article should be about a kd-trie and vice versa. Here's the rationale (this a copy from kd-trie's discussion page). The idea of a trie is to store strings (or finite sequences of any ordered set) such that symbols are shared between words with the same prefix. This delivers O(m) time complexity for insertion, deletion and searching, where m is the length of the word. Now consider the data structure in the kd-tree article. The first inserted point should be taken as an "empty word". Every point has an associated axis-aligned plane which contains it. The subsequent points will be classified by this plane as being on the "negative" or "positive" side (for any direction). Now if you insert another point, then it implicitly has an associated string with it that is given by traversal to its insertion place in the tree. If in the traversal you choose the negative sub-tree, then append "n" to the string, otherwise append "p" to the string. This way you can see that the points are associated with strings of the form "nppnpppnnnpnpnp" etc. Thus you can see that these associated strings form a trie. But they are special kinds of tries: every point is part of the set, and so every associated substring is contained in the word-set —Preceding unsigned comment added by Kaba3 (talk • contribs) 21:46, 16 January 2008 (UTC)
- I would estimate that many people know both implementations as KD-Trees, the distinction between a trie and a tree is minor and can be pointed out in text. The discussion in kd-trie is not extensive by any means, and the first comment there "kd-trie is my own invention..." is somewhat concerningUser A1 (talk) 01:52, 17 January 2008 (UTC)
- I agree. It would be better to embed the discussion in the kd-trie page to the kd-tree page. I hope someone does something, because currently the kd-trie term is misleading. I don't have the guts because I am a wikipedia-newbie and don't want to step on anyone's shoes:) --Kaba3 (talk) 19:52, 18 January 2008 (UTC)
[edit] Contradiction in text?
Doesn't the statement "In addition, every node of a kd-tree, from the root to the leaves, stores a point" contradict the end of the first paragraph "kd-tries are a variant that store data only in leaf nodes" ? I'd change it, but I know almost nothing about the topic. Also, while you are at it, "point is infite" should be "point is infinite." Nice article. It helped me a lot.
- There are at least two different definitions of kd-tree. The most popular one calls for storing points in the intermediate nodes. Preparata/Shamos book gives that definition. However, de Berg/van Kreveld/Overmars/Schwarzkopf book gives an alternative definiton, where points are stored in the leaves only. In the latter variant the splitting planes are still chosen as medians and point (or points) lying on the plane are simply forced into one of the two partitions. —The preceding unsigned comment was added by 198.182.56.5 (talk) 01:31, 28 April 2007 (UTC).
- Yes, I think there is a contradiction too. I left a message at the kd-trie article's discussion page with a rationale, but haven't got a response yet. In short, my opinion is that the current kd-tree should be called a kd-trie, and the current kd-trie a kd-tree. --Kaba3 (talk) 09:43, 24 November 2007 (UTC)
- definitions are contradictory. But both are valid
[edit] Picture Included
The picture included really isn't the best choice for demonstrating a kd-tree to someone who isn't familiar with it. has a much better picture, for instance
The problem I have with the picture (we're talking about the pic at the top, yes?) is that the vertices of the splitting planes appear, visually speaking, to be tree nodes, when in fact that is not correct. And the tree nodes themselves are not represented. wwheeler 18 January 2007
- Go for it. I made the existing one but agree that it's not too explanatory. Btyner 05:17, 20 November 2005 (UTC)
- I also made an SVG version of this new one, but for some reason, commons didn't recognize it as such. Btyner 22:21, 21 February 2006 (UTC)
Also, some psuedo code for construction, insertion, deletion, nearest neighbor, etc. would be very useful. --Numsgil 13:07, 13 October 2005 (UTC)
[edit] Python?
Does the Python version of the pseudocode really add anything? It seems like it is basically a restatement of the pseudocode with a slightly different (less clear) syntax. Do we really need both? Neilc 22:37, 15 April 2006 (UTC)
- I personally found it useful to have an executable version to try things out on and improve my understanding. That's also why I wrote the code that produced that 2D results picture. Granted, this page may not be the best place for an implementation, though I don't know a better place for such a tiny snippet. KiwiSunset 04:51, 19 April 2006 (UTC)
[edit] Content
Ok, so I can understand that balancing a kd tree takes care, but how do I actually do it? This article really needs someone with that knowledge to add that information in. ACiD2
- I've been working on tracking down the existing work done by Overmars on this topic, which is very similar to AVL trees. I might end up just re-deriving it, getting some review, and giving the citation as the source of the idea. If anyone has access to the pertinent journal article, it would be very helpful. There was also a technical report published. M. H. Overmars and J. van Leeuwen, Dynamic multi-dimensional data structures based on quad- and k-d trees, Acta Informatica 17, 3(1982), 267-285.
[edit] Search complexity?
The articles contains no information about the complexity of searching a kd-tree for an exact nearest neighbor?
- The algorithm analyzed in the following paper is fairly generic (it uses region queries), but does apply to kd-trees. I don't believe there's an existing article about the algorithm itself, but it would be nice to have one. Cleary, J. G. 1979. Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space. ACM Trans. Math. Softw. 5, 2 (Jun. 1979), 183-192. DOI= http://doi.acm.org/10.1145/355826.355832
[edit] Adding elements?
Doesnt adding elements in the method described lead to an unbalanced tree? If this is so then i think it should be made explicitly clear, and a rebalancing algorithm (or a way to preserve balance) should be given User A1 07:21, 1 December 2006 (UTC)
- kd-Tree is a static data structure, so yeah adding an element will introduce disorder. Rebalancing it is perhaps outside the realms of the basic description. IlyaTheMuromets 22:45, 8 Jan 2007 (EST)
[edit] Nearest neighbour
Hello,
I added the NN algorithm, but i have hidden it in comments as it was a little off the cuff. Feel free to fix it, i will go over it again later.
Thanks User A1 12:54, 2 December 2006 (UTC)
- I don't suppose you can clarify this section, I find the way it is explained higly confusing (Not that confusing me is particularly hard). For example 'the tree is searched in a depth-first fashion, refining the nearest distance' this should be put in a more understandable way, such what do you mean by depth-first, refine the nearest distance how, and what do you mean by nearest difference. We need to assume the readers are not programmers, as they most likely aren't (though however, I am). --Chase-san (talk) 10:34, 11 January 2008 (UTC)
[edit] Missing title in title bar
The title of this article does not seem to be showing up in my browser title bar. Is this just me, and if not any suggested soultions?
MadScientistVX 05:09, 23 February 2007 (UTC)
- Odd. I second that, firefox 1.5.0.7 linux x86, despite the presence of
<title>Kd-tree - Wikipedia, the free encyclopedia</title>
User A1 15:23, 23 February 2007 (UTC)
[edit] Removing elements
This piece is not really clear to me, is someone could clarify it a bit, that would be excellent! Laurens 11:07, 27 April 2007 (UTC)
- Yes, "removing the invariant" is quite unclear. The point that they wish to make is that when deleting a point from the tree, one has to ensure that sub-points are merged back into the original tree in a sensible fashion, ie one that doesn't break the spatial nature of the tree itself.
- There are several possible techniques for doing this. Firstly one may simply mark the element that one wishes to delete with some form of ignore flag, such that this node no longer contributes to the algorithms that are run over the tree. This has the nice effect that it is fast to perform on any given node, but the downside is that the tree still has that node in it, so it is still taking up computational space. Secondly one could unlink the node, which we can call node "A" then generate a tree using the subnodes of "A" then link its children back into the tree, this has a higher up front computational cost, but removes "A" from the tree. Also this method may result in unbalanced trees, such that search algorithms run past it may perform in a non-optimal fashion. Finally one could completely destroy the tree and rebuild it, but without using "A". This costs the most computationally, but will produce an optimal (in some sense) tree.
- I haven't included this into the article as the bit about linking and unlinking is intentionally vague, as i never actually implemented this when i wrote a KD tree algorithm for NN searching. User A1 13:58, 27 April 2007 (UTC)
- Thanks, that helps a bit :) Laurens 15:07, 27 April 2007 (UTC)
[edit] Bug in query function?
It seems to me that the first d levels in the tree won't be checked against the bounding box in all dimensions. So querying for points in ([6,8], [3,4]) in the example tree would return (7, 2) even though 2 is not in [3,4]. Or have I missed something? 206.248.181.57 (talk) 23:07, 17 November 2007 (UTC)
No you havent, that algorithm should work if tree definition would be that nodes can't store values - only leafs. You have to check explicitly for boudaries - or you'll get false positives. —Preceding unsigned comment added by 149.156.124.14 (talk) 23:28, 15 January 2008 (UTC)
[edit] Sample code in Ruby
- class for generating trees
class KDTree
attr_reader :root # root node attr_reader :points # search results for ortagonal search range
# creates new tree from list of points - with 'dim' dimensions def initialize(points, dim) @dim = dim @root = KDNode.new(dim).parase(points) end
# adds node to tree - creates unbalance def add_node(point) @root.add(point) end
def find(*range) @points = [] query(range, @root) @points end
def query(range, node) axis = node.axis median = node.location[axis]
if node.left && (median >= range[axis].begin) query(range, node.left); #/* Skip left if max of left tree (median) is out of range */ end if node.right && (median <= range[axis].end) query(range, node.right); #/* Skip right if min of right tree (median) is out of range */ end if (0..@dim-1).all?{|ax| range[ax].include? node.location[ax]} # check if all constrains are fullfilled @points << node.location; end end
#print tree def print @root.print end
end
class KDNode
attr_reader :left, :right attr_reader :location
attr_reader :axis
def initialize(dim, location=nil, left=nil, right=nil) @dim = dim @location = location @left = left @right = right end
# node parasing def parase(points, depth = 0) @axis = depth % @dim
points = points.sort_by{|point| point[@axis]} # that's why this algorithm isn't o(nlog(n)) half = points.length / 2 # simplest median choosing
@location = points[half] @left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless half < 1 @right = KDNode.new(@dim).parase(points[half+1..-1], depth+1) unless half+1 >= points.length self end
def add(point) if @location[@axis] < point[@axis] @left ? @left.add(point) : @left = KDNode.new(point) else @right ? @right.add(point) : @right = KDNode.new(point) end end
def remove self.parse(@left.to_a + @right.to_a, @axis) end
def to_a @left.to_a + [@location] + @right.to_a end
def print(l=0) @left.print(l+1) if @left puts(" "*l + @location.inspect) @right.print(l+1) if @right end
end
a = []
10.times do |x|
10.times do |y| 10.times do |z| a << [x, y, z] end end
end
tree = KDTree.new(a, 3) tree.print
puts " -------------- "
tree.find([0,4], [1,4], [5,7])
pp tree.points.sort_by{|po| po[0]} —Preceding unsigned comment added by 149.156.124.14 (talk) 23:37, 15 January 2008 (UTC)
[edit] A Reference to a Generic Java Implementation of the KD Tree Needed
Recently I posted a link to a Java implementation of the KD tree which uses the new Java 5 construct of generics (similar to templates in C++). The link was as follows:
However the addition was removed with the comment that the contributor (Savarese, not myself) was "just an individual" and that although the library was open source that apparently it was somehow not open enough to merit mentioning on the wikipedia page.
Having done a fair bit of research on the web I can attest that Generic Java implementations of KDTrees are not well represented and that, all other concerns aside, the Savarese implementation is one of unusually high quality (modern implementation, full JUnit tests, consistent code quality). I, as a Java programmer, would find a reference to a implementation in my language of choice most useful, as I imagine others would.
So my question is this, how is it possible to get a reference on the Wikipedia KD Tree page to a high quality Java KD Tree implementation?
Would it be necessary for me to do a clean room implementation of a KD Tree using Generics then I suppose get some sanctioning body to accept it so that it won't just be "some guy" implementing a KD Tree? If so it seems rather a high bar and not really in the spirit of individual contributors as I thought Wikipedia to be.
As a side note, and hardly fully definitive, the crowd seems to give a nod to the Savarese implementation as it is the first google hit if you search "KD Tree Java". —Preceding unsigned comment added by Ltburch (talk • contribs) 17:35, 21 April 2008 (UTC)