Soft heap

From Wikipedia, the free encyclopedia

For the Canterbury scene band, see Soft Heap (band).

In computer science, the soft heap, designed by Bernard Chazelle in 2000, is a variant on the simple heap data structure. By carefully "corrupting" (increasing) the keys of at most a certain fixed percentage of values in the heap, it is able to achieve amortized constant-time bounds for all five of its operations:

  • create(S): Create a new soft heap
  • insert(S, x): Insert an element into a soft heap
  • meld(S, S' ): Combine the contents of two soft heaps into one, destroying both
  • delete(S, x): Delete an element from a soft heap
  • findmin(S): Get the element with minimum key in the soft heap

The term "corruption" here indicates that a corrupted key's value has been changed by the data structure from the correct value which was initially inserted to a larger, incorrect value. It is unpredictable which keys will be corrupted in this manner; it is only known that at most a fixed percentage will be changed. This is what makes soft heaps "soft"; you can't be sure whether or not any particular value you put into it will be modified. The purpose of these corruptions is effectively to lower the information entropy of the data, enabling the data structure to break through information-theoretic barriers regarding heaps.

Other heaps such as fibonacci heaps achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. The percentage of values which are corrupted can be chosen freely, but the lower this is set, the more time insertions require (O(log 1/ε) for an error rate of ε).

[edit] Applications

Surprisingly, soft heaps are useful in the design of deterministic algorithms, despite their unpredictable nature. They were the key in creating the best-known algorithm for finding a minimum spanning tree to date. They can also be used to easily build an optimal selection algorithm, as well as near-sorting algorithms, which are algorithms that place every element near its final position, a situation in which insertion sort is fast.

One of the simplest examples is the selection algorithm. Say we want to find the kth largest of a group of n numbers. First, we choose an error rate of 1/4; that is, at most 25% of the keys we insert will be corrupted. Now, we insert all n elements into the heap — at this point, at most n/4 keys are corrupted. Next, we delete the minimum element from the heap n/2 times. Because this is decreasing the size of the heap, it cannot increase the number of corrupted elements. The result is that, still, at most n/4 keys are corrupted.

Thus, at least n/2 − n/4 = n/4 of the remaining keys are not corrupted, and each must be larger than every element we removed. Moreover, because keys are only increased by corruption, the last and largest element L we removed must exceed the original keys of the n/4 other elements we removed. In other words, L divides the elements somewhere between 25%/75% and 75%/25%. We then partition the set about L using the partition algorithm from quicksort and apply the same algorithm again to either the set of numbers less than L or the set of numbers greater than L, neither of which can exceed (3/4)n elements. Since each insertion and deletion requires O(1) amortized time, the total deterministic time is bounded by a multiple of:

T(n) = (5/4)n + (5/4)(3/4)n + (5/4)(3/4)2n + ... = 5n.

The final algorithm looks like this:

 function softHeapSelect(a[1..n], k)
     if k = 1 then return minimum(a[1..n])
     create(S)
     for i from 1 to n
         insert(S, a[i])
     for i from 1 to n/4
         x := findmin(S)
         delete(S, x)
     xIndex := partition(a, x)  // Returns new index of pivot x
     if k < xIndex
         softHeapSelect(a[1..xIndex-1], k)
     else
         softHeapSelect(a[xIndex..n], k-xIndex+1)

[edit] External link

  • Chazelle's original paper (ps) (pdf) directly from the author's website, including C code.
In other languages