Talk:Binary search tree

From Wikipedia, the free encyclopedia

	This article is within the scope of WikiProject Computer science, which aims to create a comprehensive computer science reference for Wikipedia. Visit the project page for more information and to join in on related discussions.
B	rated as B-Class on the assessment scale
High	rated as high-importance on the assessment scale

1 bst
2 Copyright violation?
3 Why are the examples in Python?
4 Merge with binary search algorithm
5 Errors
6 Does it necessarily need a total ordering?
7 Optimal binary search trees
8 Insertion
9 Deletion
10 omega n squared worst case for insertion?
11 Clarity Rewrite

[edit] bst

Shouldn't the definition of BST be that every node's right subtree has values greater than or equal to the node's value? The current definition says it's strictly greater, which isn't the case.

You can put equal nodes either on the left or the right, but not both. It's an arbitrary decision. I'll say this somewhere. Deco 01:17, 12 May 2005 (UTC)

I disagree. Here is an excerpt from the lecture I once took at UC Berkeley: "A binary search tree satisfies the _binary_search_tree_invariant_: for any node X, every key in the left subtree of X is less than or equal to X's key, and every key in the right subtree of X is greater than or equal to X's key.". The full lecture can be found here (http://www.cs.berkeley.edu/~jrs/61b/lec/26). I've also verified this with my reference book on algorithms.

I'm sorry, it can be done this way (although there are authoritative references which don't do it this way). It makes some operations more difficult, such as finding all the nodes with a given value, but it really makes almost no difference which way it is defined. In the article I've used your definition, then clarified that in some definitions equal values can be excluded from the left or right side. Deco 01:02, 14 May 2005 (UTC)

By this definition self-balancing binary search trees are not binary search tree. Rotations do not keep this _binary_search_tree_invariant_. To be concrete: Consider an empty AVL tree and insert three equal values. After the rotation there are equal values to both sides of the root node. So you either need to exclude equal values from binary search trees or allow equal values to both sides of a node. E.g: "The left subtree of a node contains only values less or equal to the node's value. The right subtree of a node contains only values greater than or equal to the node's value." 141.84.26.134 01:04, 27 February 2007 (UTC)

It is confusing that the article starts by claiming that equal nodes should always be in the right subtree, while the example image puts equal nodes (such as the seven, which equals the root) in the left subtree. 128.84.153.196 13:29, 29 September 2006 (UTC)

[edit] Copyright violation?

Why does the blurb about Optimal BSTs have a reference to "section 15.5 of Introduction to Algorithms" ? Is this text copied and pasted out of a book? Is this a (c) violation?

Copyright does not cover ideas but an explicit realization of those ideas (see idea-expression divide). The text here, written by me, was based on my understanding of the material in CLRS and not copied verbatim from that book. Deco 00:41, 15 December 2005 (UTC)

[edit] Why are the examples in Python?

I know that the example algorithms are merely for show - but wouldn't examples in a more mainstream language like C/C++ or Java be better for the purpose of understanding? The Phython deletion algorithm, for instance, uses return structures that are foreign to C/C++ or similar languages.

If you think a mainstream programmer might find them difficult to understand that's a legitimate concern. I agree that "find_remove_max" is quite obscure in its use of tuples of two return values. On the other hand, I think the use of tuples or records combined with pattern matching really abbreviates things like the insertion and search algorithms, while still being pretty clear, which is a pretty good reason to use them instead of C/C++/Java. I'm going to look at rewriting parts of the deletion code in a more intuitive way... er, as soon as I learn a little Python. :-) Deco 22:27, 15 February 2006 (UTC)

I decided to write "destructive" versions of insert and delete in C++, which lends itself quite well to this purpose, as you can see from the relative simplicity of the resulting code. I left in the Python insertion to demonstrate the concept of persistent trees, but took out the excessively complex Python deletion example. I hope this makes things clearer for you! Deco 03:48, 16 February 2006 (UTC)

Yes - thank you very much. The C++ example is much more clear to me - and you are right, it was the returning tuples that was causing my head to spin. --208.186.134.102 06:00, 16 February 2006 (UTC) Wouldn't the code in the new C++ delete example cause memory leaks when you set "node=node->right" and don't delete node->right? (or node->left)... (thanks for adding it... it's a good, simple example. very clean) ++Ryangardner

Good point. I've gotten too used to garbage collection. :-) Thanks for your positive feedback. Deco 06:27, 16 February 2006 (UTC)

I see a few other (minor) flaws in the algorithm on the page. If you are deleting a leaf node, you must remove the link from the parent of that node to the node itself. I also think that the delete method as it is implemented would cause an error, because you are deleting the node and then trying to get data from it... Ryangardner

Slightly off topic, but could the python examples be made clearer by the use of exceptions instead of returning numbers (eg. -1, -2)? --RohanMitchell (talk) 00:10, 11 June 2008 (UTC)

[edit] Merge with binary search algorithm

It seems to me this article details the same concept as Binary search algorithm. I suggest a merger.He Who Is 01:59, 25 May 2006 (UTC)

No, they are very different. The binary search algorithm is an algorithm that works on sorted arrays. To use it, you need an array and you need to sort it beforehand (which is expensive). The binary search tree is a data structure. Unlike arrays, binary search trees do not preserve order of insertion. Binary search trees are used because they have the advantage that the tree structure encodes the relative ordering of the elements (i.e. it is always "sorted"), without needing to sort. --Spoon! 00:04, 24 August 2006 (UTC)

[edit] Errors

The deletion algorithm should check for the case of both left and right being NULL before checking either side.

No, the code works fine - I tested it. If both sides are NULL, then instead of assigning the other side it will assign NULL (after deleting the node) - which is correct behavior. —Preceding unsigned comment added by 68.207.121.5 (talk) 05:35, 9 November 2007 (UTC)

Indeed, the deletion doesn't seem to work. Could anybody fix this? Jogers (talk) 15:14, 6 September 2006 (UTC)

Yup, it does seem to be broken. Here is some code that should work. It compiles, but I don't feel up to verifying that it works 100% by actually testing it. It will be better than what's there, though...

 void DeleteNode(struct node* node) {
     struct node **parentLink;
 
     // If there is no parent, we are a root node
     if (parent == NULL) { 
         free(node);
         return;
     }
       
     // Figure out which child of the parent we are
     if (node == node->parent->left)
         parentLink = &(node->parent->left);
     else if (node == node->parent->right) 
         parentLink = &(node->parent->right);
 
     // This will handle cases where both children are null.
     if (node->left == NULL) { 
         *parentLink = node->right;
         free(node); // Free any memory allocated for the node
     } else if (node->right == NULL) {
         *parentLink = node->left;
         free(node); // In C++, delete node could be used here
     } else {
         // Follow the right side of the tree until there is an empty space
         struct node* temp = node->left;
         while (temp->right != NULL) {
             temp = temp->right;
         }
         node->value = temp->value;
         DeleteNode(temp);
     }
 }

Wolever 02:50, 31 March 2007 (UTC)

[edit] Does it necessarily need a total ordering?

C++ STL's "Sorted Associative Containers" (which includes "set", "multiset", "map", and "multimap" classes, all of which are binary search trees) only requires its key types to be a strict weak ordering: http://www.sgi.com/tech/stl/SortedAssociativeContainer.html --Spoon! 00:08, 24 August 2006 (UTC)

You're right, a strict weak ordering suffices to order the tree and enable useful operations. The details behind this are rather complicated though and I think would be best relegated to their own section. Deco 04:09, 24 August 2006 (UTC)

[edit] Optimal binary search trees

In the section about Optimal binary search trees, it is mentioned that:

We can then use a dynamic programming solution, detailed in section 15.5 of Introduction to Algorithms by Thomas H. Cormen Sec Edition, to construct the tree with the least possible expected search cost.

I wish that someone who has the book (or has a nearby library from which he can borrow it) could take the time to write about it.

Thanks. --Hosam Aly 20:26, 7 January 2007 (UTC)

[edit] Insertion

I think the insertion code is not very clear, so I suggest this one, written in pseudocode:

x is current node, y always points to x's parent.

Tree-Insert(T, z)
 y = null                        //x will be set to root, root's parent is null
 x = root(T)
 while x != null do              //we will traverse the tree until we find the leaf
    y = x                        //we are going to change x, so store x as it will became next parent
    if key(z) < key(x)           //if the value of inserted node is lesser then actual node value, move left
      then x = left(x)
      else x = right(x)
                                 //now we should have y pointing to the leaf, and x to null (notice when the while loop terminates)
 parent(z) = y                   //this leaf will become our parent
 if y = null                     //this is a special case, when we are inserting into an empty tree
   then root(T) = z              //so the inserted node will be the root node
   else
     if key(z) < key(y)          //tree is non-empty, so insert the node
       then left(y) = z
       else right(y) = z

Kremso 17:15, 11 January 2007 (UTC)

Heck, why not just re-write all the code in psudo code? It will keep people from directly copy+pasting it in to their code (which generally won't work so well) and it is quite a bit easier to understand. Wolever 02:50, 31 March 2007 (UTC)

I agree. Pseudocode would be much better than the current two-language mess (but it would be even worse if we had Python, C and pseudocode). LittleDan^talk 04:39, 15 April 2007 (UTC)

I disagree. Using a specific language like C or C++ makes it clear how the memory is being modified. If you use pseudo code it might not be clear the operations, because generally much of the traversal code deals with double pointers (or references to pointers). You want to be able to modify not only the children of the node, but also may want to NULL out the node itself (after freeing it), so any other nodes that point to the node will point to a NULL reference. That might not be clear if using psuedo code. —Preceding unsigned comment added by 68.207.121.5 (talk) 15:18, 10 November 2007 (UTC)

[edit] Deletion

"Once we find either the in-order successor or predecessor, swap it with N, and then delete it. Since both the successor and the predecessor must have fewer than two children, either one can be deleted using the previous two cases. In a good implementation, it is generally recommended[citation needed] to avoid consistently using one of these nodes, because this can unbalance the tree."

I disagree with this. Depending on which side of a node the equal values go to (left or right), you can only base this deletion schema in predecessor OR sucessor (not arbitrary). There are cases where taking the wrong choice will make equal values appear to the side they shouldn't be in. 13:11, 5 June 2007

I think the point is mute, as well. If you were truely concerned with balanced trees, you would use red-black or AVL trees. Otherwise, it is pretty hard to ensure balanced trees unless you carefully insert the whole dataset in a specific order —Preceding unsigned comment added by 68.207.121.5 (talk) 15:21, 10 November 2007 (UTC)

I'm pretty sure you're right if you allow equal nodes in the tree, you have to be consistent in your insertion and deletion policy. If you disallow them, then it's pretty obvious that you'll get an unbalanced tree if you consistently choose to fiddle with the left or right subtree.

Aside: Am I the only one that thinks Wikipedia is getting a bit funny with the citations for the bloody obvious? It really does feel like http://bash.org/?757724 sometimes. (`Here is my impression of Wikipedia. "There are five fingers on the human hand [citation needed]"') 129.11.126.179 20:05, 12 June 2007 (UTC)

I'm just wondering whether the example code for Deletion is correct... couldn't a tree be built such that it would lose additional nodes given the example code (e.g. if your nodes are integers and you add them via the above code in the order 6 3 5 4 and then delete 5 would you not then lose Node 4 as well?

[edit] omega n squared worst case for insertion?

The current text says:

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but Ω(n²) time in the worst case.

Isn't Omega(n) the worst case?

-- nyenyec ☎ 23:15, 10 November 2007 (UTC)

Yes, assuming n is the number of items already in the tree. It's Ω(n²) in the worst case to insert n items, and that's probably where the mix-up came from. Dcoetzee 23:24, 10 November 2007 (UTC)

[edit] Clarity Rewrite

I'm a student at Cal Poly State University San Luis Obispo, and as part of a class on teaching technical subjects we were asked to find an article that was unclear or otherwise inaccessible for the wider audience and clarify it, with an emphasis on improving understanding from a teaching point of view.

Most of the changes I am making are directed towards improving the _clarity_ of the subject matter, and NOT an attempt to make them more factually accurate.

One large change I made is to remove the section on BST's being sets/multisets from the introduction. I'm sure most people reading this article aren't interested in graph/set theory, so that information is more pertinent in other articles. Feel free to reintegrate that information if you'd like, but in my opinion it doesn't belong in this article, or if it does it belongs in it's own section.

My wiki markup skills are not very good, so feel free to correct any incorrect wiki-links I may inadvertently create. I've done my best to make them at least link somewhere correct, but I can't get the hang of linking to sections in other articles. Mgius (talk) 23:14, 26 February 2008 (UTC)