Talk:Red-black tree
From Wikipedia, the free encyclopedia
I would like to say thank you for this great article. Also, it might be good to link to http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm. It's a java applet that demonstrates a red-black tree, as well as AVL and splay trees. --Anonymous CS Student
I think the tree in the image is actually invalid, since the left-left-left path and the left-right-right path only passes through two black nodes. Κσυπ Cyp 01:46, 30 Nov 2003 (UTC)
- Now right-right-right-... passes through 3 black nodes, the rest through 2. Swapping the colour of 15 and 17 should work. Nice tree, though... Κσυπ Cyp 01:41, 4 Dec 2003 (UTC)
-
- Oops, I'm a complete idiot. I was trying not to copy an existing image, but I guess I didn't understand red-black trees as well as I thought. Sorry for the problems.
- Derrick Coetzee 14:58, 4 Dec 2003 (UTC)
-
- I fixed the image, but I'm wondering if it could be misleading now, since no node now has a red and a black child (not counting nil children). I don't want to give any misimpressions.
- Derrick Coetzee 15:05, 4 Dec 2003 (UTC)
-
-
- The tree seems correct now. To have a node with both a red and a black child, it would be possible to remove node 6, and invert the colour of node 1, 8 and 11. And then if the tree looks too small, maybe add a red child for 15. Nice tree, in either case. Κσυπ Cyp 16:58, 4 Dec 2003 (UTC)
-
About that whole nil-leaves discussion: Why do you need to remember that there are nil leaves? What does this actually do to change the definition? The example tree without the nil leaves would not violate the definition, and I cannot think of any others that would. Tjdw 19:43, 30 May 2004 (UTC)
- It would violate the requirement that all leaves are black. It may be a good idea to add this to the article. I think nil leaves aren't strictly necessary, but red-black trees are typically presented this way. Derrick Coetzee 22:18, 30 May 2004 (UTC)
This article leaves me confused as to how the behaviour and performance of a red-black tree differs from other binary trees. Could someone who understands the subject please add a note explaining why one might choose this structure over e.g. AVL trees? Alternatively, might a new topic like "binary tree implementations: advantages and disadvantages" be helpful?
- A comparison is a great idea; of course, we know their theoretical (asympototic worst-case) performances are all the same. I don't really know how they differ in practice. Maybe someone does? Derrick Coetzee 14:09, 27 Aug 2004 (UTC)
On another note, this page needs well-organized, detailed explanations of the operations and their performance. I'll do this eventually if no one else does. Derrick Coetzee 16:22, 27 Aug 2004 (UTC)
- And sure enough, I did, complete with pretty diagrams. It'd be really helpful if someone could fact-check me, though — this stuff is pretty complicated. Derrick Coetzee 07:41, 18 Sep 2004 (UTC)
I'm not sure if the description of the insertion operation is complete. Consider the following tree(. is a placeholder, ignore it):
0(B)
.\
.45(R)
After inserting the value 207, the algorithm yields:
207(B)
./
45(B)
.\
..0(R)
Which isn't a valid Red-Black tree(it violates property 4). --Ryan Stone
- You may find it easier to write your trees using s-expression syntax, like this:
- (0 black (45 red nil nil) nil)
- (0 black (45 red nil nil) (207 black nil nil))
- This insertion does fall into Case 2, but this tree does not actually occur, because newly inserted nodes are always initially coloured red. The actual tree produced would look like this:
- (0 black (45 red nil nil) (207 red nil nil))
- This satisfies property 4. If you think the text is unclear about this, please edit. Perhaps a diagram for Case 2 would help. Derrick Coetzee 22:29, 20 Sep 2004 (UTC)
-
- That tree isn't a binary search tree, is it? You have 0 at the root and 47 in the left subtree, which can't happen.
-
-
- No, it isn't; just the example they gave. Good catch. I'm sure they meant to put a negative number, though, just as you meant to put 45. Derrick Coetzee 07:34, 21 Sep 2004 (UTC)
-
-
-
-
- Ok, maybe I'm just applying the algorithm wrong, but here's what I get:
- Step 1: Start with: (0 black nil (45 red nil nil))
- Step 2: Insert value 207 into tree as in normal binary search tree: (0 black nil(45 red nil (207 red nil nil)))
- Step 3: Parent is red, uncle is black; apply case 4(rotate around parent): (0 black nil(207 red (45 red nil nil) nil))
- Step 4: Apply case 5(rotate around grandparent): (207 black (0 red nil (45 black nil nil)) nil)
- Ok, maybe I'm just applying the algorithm wrong, but here's what I get:
-
-
-
-
-
- --Ryan Stone
-
-
Ok, I think that I've found the algorithm that will handle the above case(and similar ones correctly):
Cases 1-3 are unchanged
Cases 4 and 5 only apply when the parent(P in the diagram) is a left child of the grandparent
Case 6: The parent(P) is red, the uncle(U) is black, the new node(N) is the left child of P and P is a right child of its parent(G). Perform a right rotation on P, and then apply case 7 to P.
Case 7: The parent(P) is red, the uncle(U) is black, the new node(N) is the right child of P and P is a right child of its parent(G). Perform a left rotation on G, paint G red and paint P black.
I've tried implementing this and it seems to work, but I'm hesitant to make any changes to the article until somebody who actual knows this stuff confirms that this should work.
--Ryan Stone
- It's true that when P is the right child of its parent, we're in a different, symmetrical situation, where N must be the opposite child of its parent, and rotations are performed in the opposite direction. I admit this isn't obvious. One way of dealing with this is to add the steps you have described. Another way is to say that when P is on the right side, right and left are reversed throughout steps 4 and 5; we are effectively acting on the mirrored tree. I've added a note to this effect — do you think this is a good solution? Derrick Coetzee 15:56, 21 Sep 2004 (UTC)
- A similar note was added for deletion, where left and right must be reversed if N is the right child instead of the left. Although I didn't put it in the article, reversing left and right throughout is always okay, because you're effectively acting on the mirrored red-black tree, which is a binary search tree when considered with the opposite order. Derrick Coetzee 16:03, 21 Sep 2004 (UTC)
I don't understand a couple of things about the deletion algorithm. The cases are supposed to handle the case deleting a black node with two black children, at least one of which is leaf. However, wouldn't property 4 be violated if a black node had one child who was a leaf and the other child was black but was not a leaf node?(in other words, wouldn't be simpler just to say the cases apply to a black node with two leaf nodes as children?)
My second question is about case 3 of the deletion algorithm. I can see that the algorithm will ensure that all paths through P will have one fewer black node. However, won't any path that does not pass through P be unaffected, and have one more black node than any path through P?
--Ryan Stone
- Ok, I see the problem. If you look closely at case 2a of the San Diego State University: CS 660, after colouring the sibling red, you colour the parent double black and then perform the same procedure on it. I think that this also answers my first question, as well.
- --Ryan Stone
- I've changed the article to note that the parent node becomes double-black after case 3 is applied. Am I correct when I say that a double-black node has one fewer black node along any paths running through it? Also, is my wording clear enough?--Ryan Stone 19:03, 22 Sep 2004 (UTC)
Case 3 still mentions double-black nodes. I'm not sure how to reword that to avoid using the term double-black; anyone else have any ideas?--Ryan Stone 16:13, 24 Sep 2004 (UTC)
I've removed the reference to double-black nodes in case 3 of deletion. I've also removed the statement in case 1 that it applies when the root node was deleted and replaced. Because the rebalancing algorithm is recursive, it's possible to fall into case 1 even if the root node was not deleted.--Ryan Stone 21:24, 29 Sep 2004 (UTC)
I've slightly reworded deletion case 3 to make it clearer why property 4 is still violated. I've made a change to the operations paragraph. Case 3 of the insertion rebalancing procedure and case 3 of the deletion rebalancing procedure are the only two recursive cases. Both only involve colour changes; not rotations, so we can say that the insertion and deletion operations require O(log n) colour changes and no more than 2 rotations. Because colour changes are so quick(in practice, merely changing the value of a variable), it gives a better picture of the speed of the insertion and deletion algorithms IMO.
Now, the article states that insertion and deletion require O(log n) space. I assume that's because of the stack space needed when we hit a recursive case. However, aren't both algorithms tail-recursive?(ie wouldn't it be possible to convert them to iterative implementations?) The algorithms are complex enough that I'm not sure if I'd like to make that change, but I think that it would be quite possible to do so.--Ryan Stone 16:03, 30 Sep 2004 (UTC)
- Ok, I understand now. Persistent implementations for Red-Black Trees require O(log n) space, but not the normal version.--Ryan Stone 14:50, 1 Oct 2004 (UTC)
- Hey, thanks for your changes. I believe they're all correct. Sorry that the statement about the persistent implementations was unclear. I was confused why the deletion algorithm didn't seem to be recursive before, so that helps. Thanks again. Derrick Coetzee 22:03, 1 Oct 2004 (UTC)
I don't understand the purpose of property #2, "All leaves are black", if you're going to use "nil" nodes for all leaves, and then say you don't even need to draw them. If you just got rid of property #2, and "nil" nodes, would it not work just as well?
(Yes, the nil nodes confused me, too. You started out by saying nodes hold data, and leaf nodes are black, and then show a diagram where the leaves appear to be red -- and then 3 paragraphs later say that the boxes that are shaped differently from the other nodes and don't hold any data are actually the leaves.)
- The terminology section does conflict with the use of nil leaves. Leaf nodes do not store data in this presentation. It is possible to eliminate nil leaves by removing rule 2 and changing rule 3 to "red nodes only have black children". Unfortunately, this introduces a large number of special cases in the later proofs, since we have to deal with each internal node having 1 or 2 children. By introducing nil leaves and colouring them black, we can now say there is a black child where before we'd have to say "black or no child". I would like to admit the simplest presentation possible of the main invariants, but not if it sacrifices the overall presentation too much. Thanks for your feedback though; I realise nil leaves are often confusing and will think about how to ameliorate this problem. Deco 12:15, 28 Dec 2004 (UTC)
Contents |
[edit] Leaf nodes again
-
- Note that the use of the leaf-nodes in the articles conflict with the link in the terminology section.--Loading 12:11, 11 May 2005 (UTC)
Perhaps these "Nil" nodes code be done away with altogether? I think it'd be easier just to say "black or missing" (or "non-red") than to introduce a funny construct to make certain special cases disappear. It's also not that hard in actual code to do without them.--ghakko
- Well, it's easy to say "black or missing" but it's not so easy to draw black or missing in the diagrams. I copied the nil leaf presentation from an online source, but it could be presented either way, it would just be a lot of work to change over and I don't see a particularly compelling reason not to do it this way. Deco 19:42, 14 May 2005 (UTC)
- A very important information is missing: Red Black Trees are isomorphics to 2-3-4 Trees (B-Trees of order 4) (http://pedia.nodeworks.com/2/2-/2-3/2-3-4_trees/). 22:15, 28 May 2005 (UTC)
I think, we should add an explanation to the Delete case about the possibility that N (the deleted node's child) may be a "Nil" node. In order that the tree remains well defined, N must remain a leaf node even after all transformations. You can verify (for example from the images) that it will, but I think it is not obvious. Should I add the explanation? Another thing is that maybe the text would be simplified if we used consistently "P" instead of "N's parent" throughout the text. What do you think? Drak 22:58, 6 January 2006 (UTC)
- Go for it. I'll review your changes. Thanks for looking over this. Deco 00:39, 7 January 2006 (UTC)
[edit] Deleting a red node
In the early paragraphs of the Deletion section, you state "If we are deleting a red node, we can simply replace it with its black child, if any." This is indeed the case if there is at most one child. If this red node has two children, however, the algorithm becomes a lot more complicated, unless you simply replace the red node by one of the children and reinsert every element of the other child's subtree. Schellhammer
- You can assume that the node has at most one child. The reason for this is explained in the previous paragraph. I'm sorry that it wasn't clear. The idea is that you delete the node exactly as you would delete a node in a BST. To do this, you must find a node that immediately precedes or follows the node to be deleted in inorder traversal order. We need to copy its value into the node to be deleted, then delete the node it was copied from. Such a node will always have zero or one children, which simplifies the remaining cases. Deco 23:52, 28 July 2005 (UTC)
- Okay, got you now. Thanks!
Something else. I was just wondering: in deleting a node with two children, is there's a simple way to determine if replacing the value by the largest preceeding or the smallest following value is the faster alternative? Schellhammer- The simplest way I can think of is to add two values to each node indicating the number of steps needed to reach the smallest and largest node in that subtree. These are easy to maintain in log time and allow you to choose optimally. This requires quite a bit of space and time to maintain though; the most effective strategy is probably just to choose one at random. Deco 02:10, 3 August 2005 (UTC)
- Okay, got you now. Thanks!
I think I encountered a problem in the deletion algorithm yet again connected with the leaf-nodes. You state: "If the node has no non-leaf children, we simply remove it." However, if the node is black and has only leaf children (which are also black), we reduce the black height in this branch if we simply remove the node. Consider the tree built by inserting the integers 1 to 4 and then delete the integer 1. Here's a nice applet which can be used to see the restructuring following the deletion. (It is German, but if you consider the translations Anfang=start Zurück=back Einfügen=insert Weiter=proceed and Löschen=delete (mark node to delete after hitting the button) it should be easy to handle.) Schellhammer
- Thanks for catching this. I'm afraid I've forgotten far too much about red-black trees. I believe we proceed through all the cases whether or not the child is a leaf node or non-leaf node, and they all work out if the child is simply a leaf node since they don't assume that N has children. Deco 02:10, 3 August 2005 (UTC)
- For what it's worth, to avoid further correctness issues I'm going to try to come up with some ML code that precisely mirrors the discussion. I should be able to test it and inject it into the article then to provide some concreteness. Deco 02:52, 3 August 2005 (UTC)
- I've just finished implementing the red-black tree as Visual-Basic classes (don't ask why!) and I took the deletion algorithm from this wiki-page. Empiric evidence (i.e. deleting any one of 1000 nodes in the tree) shows that the algorithm now is correct. Schellhammer 13:16, 3 August 2005 (GMT+1)
- Thanks for your feedback, that's reassuring. I've got some C code for insertion and as soon as I figure out how the article formatting will work, I'll add a bit of code to each case. Then I'll do the same for deletion. This should help clarify things further. Out of curiosity, does your code use parent pointers or do you get around this somehow? The description in this article seems to almost require it. Deco 19:19, 3 August 2005 (UTC)
- Yes, my code uses parent pointers as well. Actually, I don't know how you could do without, seeing that Insertion Case 3 has a recursive call upwards. By the way, the note in the diagram for this case should read "May violate property 2". Schellhammer 11:13, 4 August 2005 (UTC)
- It's possible to do without parent pointers by keeping a linked list during downward traversal; push a pointer to the current node onto the list just before descending into a child. The list elements can allocated on the stack. My C implementation was like this because I was being silly and trying to save space.
- The only other way I can think of handling Insertion Case 3 without parent pointers by having all the insertion functions return boolean values; this would let the caller check if the rebalancing has propagated upwards and continue it if necessary. Deletion is like this, but much messier (because of the need to swap two elements just before starting). I had to do it this way in Haskell (which has no pointers), and it was very painful.
- —Ghakko 14:30, 4 August 2005 (UTC)
- Yes, my code uses parent pointers as well. Actually, I don't know how you could do without, seeing that Insertion Case 3 has a recursive call upwards. By the way, the note in the diagram for this case should read "May violate property 2". Schellhammer 11:13, 4 August 2005 (UTC)
- Thanks for your feedback, that's reassuring. I've got some C code for insertion and as soon as I figure out how the article formatting will work, I'll add a bit of code to each case. Then I'll do the same for deletion. This should help clarify things further. Out of curiosity, does your code use parent pointers or do you get around this somehow? The description in this article seems to almost require it. Deco 19:19, 3 August 2005 (UTC)
- I've just finished implementing the red-black tree as Visual-Basic classes (don't ask why!) and I took the deletion algorithm from this wiki-page. Empiric evidence (i.e. deleting any one of 1000 nodes in the tree) shows that the algorithm now is correct. Schellhammer 13:16, 3 August 2005 (GMT+1)
[edit] The code
I've now added some C code to this page. All this code has been compiled and thoroughly tested, not just against crashes but that the red-black properties are preserved after every single insert/delete operation. I tried to keep the code as simple and readable as possible, and in line with what the text and diagrams describe to the letter. I hope it adds some concreteness to the article. I chose C because it's good for this presentation, since it deals well with trees with parent pointers (cyclic data structures are kinda annoying in functional languages) and is familiar to most programmers. Deco 07:32, 4 August 2005 (UTC)
- Nice job! That'll help a lot in future implementations. However, in the Deletion part, you state "the insertion code works with either representation", i.e. designing leaf-nodes as node objects or NULL pointers. I think you rather meant the insertion algorithm, since you use constructions like
child->color
in thedelete_one_child
function andchild
may be a leaf if the originaln
(the one we deleted) had no non-leaf children. Similarly, calling any of thedelete_caseX
functions with a NULL pointer as parameter will cause problems. The code probably still works if you'd add the (admittedly irritating)if
-statements everywhere to catch this case. Schellhammer 11:32, 4 August 2005 (UTC)- Sorry, I meant the code in the insertion section. There was only one place in which a leaf could be encountered during insertion, but deletion turned out to be much trickier, because it may be that the parameter n is a leaf, which prevents me from accessing any of the surrounding nodes if it's represented with NULL. I would have to additionally pass in at least its parent, which requires updating the parent as we perform rotations, and it's just a mess. Deco 17:30, 4 August 2005 (UTC)
[edit] Errors in algorithm
(copied from an e-mail sent to Deco by Pavel)
case 3. if G was a root, you violate rule 2.
- Also, if its parent is red, it violates rule 4. This is mentioned in the text and the code, and used to be in the image, but I wanted to keep the image easy to update. I fixed the text to mention that it might violate rule 2 also, and to be clearer that we always recursively process G.
case 5. path from leaf 4 and 5 contains 2 black nodes, path from other leafs contains 1 black nodes, that violates rule 5.
- The triangles are not meant to represent leaves but subtrees. Some subtrees have black circles at the top to indicate that their root must be black. The rotation does not alter the number of black nodes along any path (it is the same before and after), and so does not threaten rule 5. The reason rule 5 is not violated initially is that subtrees 4 and 5 are not as deep as subtrees 1, 2, and 3; in fact, the uncle may even be a leaf itself.
- If there's any way I can clarify this further please ask. Thanks again. Deco 14:16, 18 August 2005 (UTC)
[edit] Top-Down Insertion
Can someone change this to the more effecient single pass top-down insertion method?
- It sounds interesting. Can you give a reference? However, I am not completely sure if it should replace the algorithm explained in the article. One of the objectives of this article is to explain the well-established algorithm for red-black trees, and I think the current article does it very well (thank you, authors!). Of course a note about a more efficient algorithm would be helpful.
-
- Here's a reference on the top-down insertion: [1]. The notes paraphrase Data Structures & Problem Solving Using Java by Mark Allen Weiss.
-
- The top-down approach to insertion works by preventing insertion case 3. During the tree descent, any black node with two red children is recolored to red and its children recolored to black (like in case 3). This preserves property 5 but may violate property 4. When property 4 is violated, it is guaranteed that the uncle of the new red node is black, since the grandparent node would have been recolored during the descent if both of its children were black. (See the reference if this is confusing.) As a result, one or two rotations (like in case 5) can be applied to restore property 4. The search then continues.
-
- Once the leaf is ready to be inserted, insertion case 3 is impossible. This prevents the need to recursively ascend back up the tree. This makes the entire operation top-down. Rotations may still be required to preserve properties 4 and 5 after inserting the leaf.
-
- Top-down insertion seems to be less efficient than bottom-up insertion. More rotations are required since each insertion can require multiple sets of rotations. Rather than letting the leaf node "push up" on the tree to find the proper rotation point, the top-down approach rotates at every such rotation point along the path on the way down. I will try to make a diagram to explain this better.
-
- I wrote a C program to test my hypothesis that the top-down insertion is less efficient, and it appears to be correct. A top-down insertion requires more rotations for the same data set. However the top-down algorithm does work and it may make sense to publish it anyway. Cintrom 22:39, 5 May 2006 (UTC)
- Thank you, Cintrom! I see that even if the top-down algorithm is less efficient than the bottom-up one, it is sometimes useful for educational purposes because of its simplicity. In addition, it is probably good to keep in mind that there is also a top-down algorithm for red-black trees, because it may come in handy when you derive a new algorithm from red-black trees.
[edit] Error in Insertion Case 4?
According to the rules of a R-B tree, "both children of a red node are black". In the Case 4 example, N and P are both red after insertion.
- Case 4 does not complete the insertion, as you can see from the sample code - the node P is then processed using Case 5. The text says, "then, we deal with the former parent node P using Case 5". Deco 17:33, 17 October 2006 (UTC)
In case 4 the code that calls insert_case5 is indented as if there were a line with an else missing, but from the narrative, I think the code is right, but the indention is misleading. Also, is it too obvious to say in the last sentence that property 4 is still violated? I found the parenthesized comment about the sub-tree labelled "1" confusing, so I tried to touch it up. The other two things I left alone. Bill Smith 23:35, 24 November 2006 (UTC)
- I implemented the insertion algorithm in OCaml and verified that the indention was wrong, so I corrected it.Bill Smith 17:11, 26 November 2006 (UTC)
-
- The misleading indention was probably a result of tabs. There is no else. Your clarifications are welcome. Deco 14:42, 27 November 2006 (UTC)
[edit] The code
Is the C code correct? For example:
node grandparent(node n) { return n->parent->parent; }
I don't think that's correct: shouldn't dots (.) instead of arrows (->) for accessing members in a struct that isn't referenced by a pointer? And isn't the parent member a pointer to the parent struct? If it is, then the function definition should be:
node* grandparent(node n)
Tell me if I'm wrong.
--wj32 talk | contribs 07:22, 8 November 2006 (UTC)
- You are correct that the arrow operator is used with pointers. I just assumed that somewhere in the code that isn't shown there was "
typedef struct rbtreenode * node;
". This explains the lack of thestruct
keyword, as well. Since the article is about the data structure rather than the C language, I don't think that noting C language syntax is strictly necessary. If we were going to present a play-by-play on how to implement a red/black tree in C, we'd have to note that code is required to keep track of which node is the root node, maybe reccomend the definition of a structure to hold the root and a comparator function pointer, etc. - Perhaps the wording should be changed to something like "We'll demonstrate each case with example C-like code"? Conor H. 06:54, 23 November 2006 (UTC)
- This is actual C code copied from a real working C program. That was how I verified its correctness. Yes, node is a pointer type. I suppose my stylistic conventions are a bit unusual. Deco 05:53, 28 November 2006 (UTC)
I think that the assertion that deletions are O(1) complexity must be incorrect. If the deleted node has two subtrees, then it must find the rightmost node of the left subtree or leftmost node of the right subtree. This find is admittedly quick since it is just following pointers, but nevertheless could be the height of the tree, so it is O(log(n)). Any comments? Caviare 01:08, 25 January 2007 (UTC)