Talk:Binary heap

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Computer science, which aims to create a comprehensive computer science reference for Wikipedia. Visit the project page for more information and to join in on related discussions.
Start rated as start-Class on the assessment scale
Top rated as top-importance on the assessment scale

Contents

[edit] heap

So is it O(n log n) or O(n) after all ? Sorting can't be O(n), but we aren't really doing full sorting here. Taw 00:35 Dec 12, 2002 (UTC)

Was:

It appears offhand to me that this should make this process take O(n lg n) time, but all the references I can find say it's O(n), although I can't understand their proofs. Once you have a heap, you can quickly find the node with the greatest value in it; it will be the node at the root of the tree. If you want to remove this node (as is necessary for a priority queue), in the array representation, you can swap that node's value with the value of the last node in the array (node 11 in the above example), shorten the array by 1 (removing that last node from the heap), and then restore the heap property, which still holds everywhere but at the top of the heap. The value at the top of the tree is now less than at least one of its children; swap it with the greater of its two children, and the heap property will be true at the top, but now possibly not where the top node was just swapped to; so you have to repeat the process until it is greater than either of its two children, which probably means until it reaches the bottom of the heap --- up to lg n swaps.

So removing the largest node from a heap takes O(lg n) steps.



Swapping elements to adjust the heap is inefficient. Each element moves towards the top of the heap at most one place, in each sift-down operation. The difference is the same as the difference between bubble sort and insertion sort; instead of adjusting K ~= lg(N) elements with K-1 swaps (and 3*K-3 moves in all), K+1 moves will suffice. The Heapsort example in Knuth "Art of Programming" Vol II uses insertion to do the sift-down, rather than exchanging.

In the exercises after the discussion of Heapsort, Knuth vol. II also mentions "bottom-up" heap insertion (doubtless much better than I can, but here goes...):

While it is common to implement a binary heap as outlined in the article, reinsertion into a heap (replacing the value at the top of the heap with another, while preserving the heap property) can be done with rather fewer comparisons, on average. Instead of comparing the replacement value with elements in the array "on the way down", we assume all the elements will have to be moved (saving a comparison at each level of the heap), insert the element at the bottom of the heap, and then search "back up the way we came down" for the place it belongs ("spending", on average, about two additional comparisons and one additional move). Instead of

slightly less than 2 * lg(N) comparisons and slightly less than lg(N) moves,

bottom-up reinsertion requires

slightly more than lg(N) comparisons and slightly more than lg(N) moves. Using inserts rather than exchanges to adjust the heap, and implementing bottom-up reinsertion, Heapsort (see the Heapsort article) can compete with Quicksort (unless locality of reference is a major factor, or if parallel processing hardware is available) particularly if the compiler or interpreter is poor. If the compiler is good Quicksort still wins comfortably.

james_barbetti@yahoo.com

Are you talking about insertions or deletions? You seem to be describing a deletion algorithm, but you keep talking about "reinsertions", and I'm really not sure what you mean by that. If you are describing an improved deletion algorithm, I don't think that you'd see much improvement -- it looks to me that you'll only save yourself one swap when acting on a heap in an array. I suppose what you describe might help if you used a simple binary tree to represent the heap, but it seems to me that you could easily end up with an unbalanced tree with your algorithm. And, of course, a heap implemented with binary trees isn't of any help when writing heapsort. Could you try to explain the algorithm better, or provide a link?--67.71.21.145 17:21, 18 Nov 2004 (UTC)
Yes, I'm having trouble understanding this method as well. I think I might have a handle on how it works; let me see if I have it right (I'm using "node" loosely, not necessarily designating a separate data object linked by pointers/references):
  1. We remove the node at the top of the heap, as in the traditional heap deletion procedure.
  2. We remove the last node of the heap, as in the traditional heap deletion procedure.
  3. Instead of moving this node into the place vacated by the just-removed node, and calling heap-down (the traditional implementation), we trade the last node of the heap for the node that was, until just now, that node's parent, and call heap-down on it.
  4. We still have a node that was removed from the heap, but now it is a different node. Again, we trade this node for the node that was, until just now, its parent, and call heap-down on it. If the "parent" is in fact the empty space where the top value was just removed, we place the node in that top space, restoring the shape property, and call heap-down one last time.
Is this essentially correct? -- Antaeus Feldspar 30 June 2005 00:56 (UTC)

"It can be seen as a binary tree with the additional constraint that each node is larger than all its children (for a max-heap) or smaller than all its children (for a min-heap)" - I believe heaps must also satisfy the property that they are complete trees. That is the lowest level of the tree must be filled from left to right -ToasterStrudel@comcast.net

[edit] k-ary heap

Can you guys make some pages for "k-ary heap"? I like this "binary heap" page, but I also wish to see some information for "k-ary heap". Thanks, everyone..

Don't want to be offensive, but the ASCII art sketch kinda sucks :) --84.57.224.246 13:12, 8 November 2005 (UTC)

[edit] a faster method of deleting the root

I wanted to edit the page to add this, but I don't have an svg editor.

A faster method of deleting the root is to remove it and promote its larger (max-heap) or smaller (min-heap) child. The space occupied by that child is now free, so promote one of its children, and continue recursively until the free space is pushed down to the bottom level. Move the last element of the heap into this new free position, and then percolate it up.

Here is an example (requiring fixed width font): initial heap:

        15
   11         8
 3    4     5

remove root:

        __
   11         8
 3    4     5

11 > 8 so move 11 to the root:

        11
   __         8
 3    4     5

4 > 3 so move 4 up:

        11
   4          8
 3   __     5

We have reached the bottom level. Move the last element (5) into the blank space.

        11
   4          8
 3   5     

5 > 4 so swap 5 with its parent:

        11
   5          8
 3   4     

5 < 11 so stop.

Analysis: The algorithm which replaces the root with the last leaf requires two comparisons per level, to an average depth of log_2(n). Because the element is a leaf, it will be pushed all the way back down to the bottom on average. Total cost: 2*log_2(n) comparisons and log_2(n) swaps.

The algorithm above does one comparison per level to a guaranteed depth of log_2(n), followed by one comparison per level for an average of k~=1.6 levels. Because the last element was a leaf, it does not move up the heap very far on average. Total cost: log_2(n) + k comparisons and log_2(n)+k swaps, where k~=1.6. It is twice as fast.

Also, above 90% of the time with heaps is spent deleting the root (insertion is O(!) on average) so the better algorithm really will make your heaps almost twice as fast.

Another improvement: Perhaps some budding computer science student would be willing to write this up. For array based algorithms, it's really better to think in terms of copies instead of swaps. A swap requires 3 copies, but all of the algorithms above can be implemented using the same number of copies as is quoted for swaps.

Here is the example above, written as an array, with copies not swaps.

initial heap: 15 11 8 3 4 5

We have copied the root (15) and now we want to clobber it. Copy the larger child into the root: 11 11 8 3 4 5

The second 11 is really empty space. Clobber it with its larger child: 11 4 8 3 4 5

The second 4 is really empty space. We need to insert 5, but we want to delay copying it until we know its final position. To determine that, compare 5 with the parent of the "free" spot (the second 4). The parent is the first 4. 5 > 4, so clobber the free space with 4: 11 4 8 3 4 5

Compare 5 with the parent of the free space again. 11 > 5 so we stop and copy 5 into the free space: 11 5 8 3 4 5

The last 5 is really free space, but there's no point in overwriting it. It will be clobbered the next time something is added to the heap.

I know it looks inefficient (or at least, not better) on this small example, but recall the argument above. This method does one comparison per level plus one or two extra comparisons on average, and the same number of copies. On heaps with more than 3 levels it will be faster.

I hereby release this post to the public domain. I would be delighted if someone were to incorporate any of it into the article. Thank you Wikipedia people for an invaluable resource. —The preceding unsigned comment was added by Roman pearce (talk • contribs) 04:07, 19 December 2006 (UTC).

[edit] can use array-like structures when number of items is not known in advance

quote:---
This approach is particularly useful in the heapsort algorithm, where it allows the space in the input array to be reused to store the heap. However it requires allocating the array before filling it, which makes this method not that useful in priority queues implementation, where the number of tasks (heap elements) is not necessarily known in advance.
---:endquote
True for true arrays, but it's easy to have a 'growing' array simply by adding a redirection layer when reading and writing, and a redirection + test for more-space-required (plus, if more space required, actually alloc'ing that space).
Granted it's not an array any more, exactly, but it still resembles an array programmatically (even if it has to be addressed through functions/methods rather than [] syntax) and is very fast. My point is not to elaborate on how to get around the problem of fixed-size arrays, just that the problem quoted isn't a problem and the mention of a 'limitation' should be removed or amended.
Thoughts? —Preceding unsigned comment added by Water pepper (talk • contribs) 09:00, 7 January 2008 (UTC)

Of course you can use a dynamic pseudo-array, simulated by an additional software layer. But in this case you must either re-allocate space for growing array and copy its contents, each time you need to extend it (possibly many times), OR you need a dynamic structure (some lists, trees, etc.) and search for items to be compared or swapped, every time you simulate an index addressing. Both ways impose a not acceptable overhead to the sorting routine. In fact there are dynamic structures (like lists or trees) and algorithms tailored to them, which perform as good as heapsort with arrays. So, if you need dynamic data size, use dynamic structures and appropriate algorithms. Of course it is always (well, almost always...) possible to force some algorithm for inappropriate data structure — but that is not a good programming practice. Don't force a specific algorithm just because you know it! --CiaPan (talk) 14:19, 7 January 2008 (UTC), some typos corrected 06:34, 8 January 2008 (UTC)