Heapsort

Heapsort
A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm  array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.
A run of the heapsort algorithm sorting an array of randomly permuted values. In the first stage of the algorithm the array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.
Class Sorting algorithm
Data structure Array
Worst case performance \Theta(n\log n)
Best case performance \Theta(n\log n)[1]
Average case performance \Theta(n\log n)
Worst case space complexity \Theta(n) total, \Theta(1) auxiliary

Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a more favorable worst-case Θ(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort.

Contents

Overview

Heapsort works as its name suggests. It begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the partially sorted array. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the partially sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements. [2]

Heapsort inserts the input list elements into a heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.

During extraction, the only space required is that needed to store the heap. To achieve constant space overhead, the heap is stored in the part of the input array not yet sorted. (The structure of this heap is described at Binary heap: Heap implementation.)

Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.

Variations

Comparison with other sorts

Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.

Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort for a detailed discussion of this problem, and possible solutions.

Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort.

Heapsort also competes with merge sort, which has the same time bounds, but requires Ω(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:

Introsort is an interesting alternative to heapsort that combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Pseudocode

The following is the "simple" way to implement the algorithm in pseudocode. Arrays are zero based and swap is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the largest elements are at the root of the heap at a[0], while at the end of the sort, the largest element is in a[end].

 function heapSort(a, count) is
     input:  an unordered array a of length count
 
     (first place a in max-heap order)
     heapify(a, count)
 
     end := count-1 //in languages with zero-based arrays the children are 2*i+1 and 2*i+2
     while end > 0 do
         (swap the root(maximum value) of the heap with the last element of the heap)
         swap(a[end], a[0])
         (put the heap back in max-heap order)
         siftDown(a, 0, end-1)
         (decrease the size of the heap by one so that the previous max value will
         stay in its proper placement)
         end := end - 1
 
 function heapify(a, count) is
     (start is assigned the index in a of the last parent node)
     start := (count - 2) / 2
     
     while start ≥ 0 do
         (sift down the node at index start to the proper place such that all nodes below
          the start index are in heap order)
         siftDown(a, start, count-1)
         start := start - 1
     (after sifting down the root all nodes/elements are in heap order)
 
 function siftDown(a, start, end) is
     input:  end represents the limit of how far down the heap
                   to sift.
     root := start

     while root * 2 + 1 ≤ end do          (While the root has at least one child)
         child := root * 2 + 1           (root*2+1 points to the left child)
         (If the child has a sibling and the child's value is less than its sibling's...)
         if child < end and a[child] < a[child + 1] then
             child := child + 1           (... then point to the right child instead)
         if a[root] < a[child] then       (out of max-heap order)
             swap(a[root], a[child])
             root := child                (repeat to continue sifting down the child now)
         else
             return

The heapify function can be thought of as building a heap from the bottom up, successively shifting downward to establish the heap property. An alternative version (shown below) that builds the heap top-down and shifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify.

 function heapify(a,count) is
     (end is assigned the index of the first (left) child of the root)
     end := 1
     
     while end < count
         (sift up the node at index end to the proper place such that all nodes above
          the end index are in heap order)
         siftUp(a, 0, end)
         end := end + 1
     (after sifting up the last node all nodes are in heap order)
 
 function siftUp(a, start, end) is
     input:  start represents the limit of how far up the heap to sift.
                   end is the node to sift up.
     child := end 
     while child > start
         parent := floor((child - 1) ÷ 2)
         if a[parent] < a[child] then (out of max-heap order)
             swap(a[parent], a[child])
             child := parent (repeat to continue sifting up the parent now)
         else
             return

Notes

  1. http://dx.doi.org/10.1006/jagm.1993.1031
  2. http://linux.wku.edu/~lamonml/algor/sort/heap.html
  3. "Data Structures Using Pascal", Tenenbaum & Augenstein, 1991, page 405, gives a ternary heapsort as a student exercise. "Write a sorting routine similar to the heapsort except that it uses a ternary heap."
  4. http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF
  5. http://www.cs.utexas.edu/~EWD/transcriptions/EWD07xx/EWD796a.html
  6. Levcopoulos, Christos; Petersson, Ola (1989), "Heapsort - Adapted for Presorted Files", WADS '89: Proceedings of the Workshop on Algorithms and Data Structures, Lecture Notes in Computer Science, 382, London, UK: Springer-Verlag, pp. 499–509, doi:10.1007/3-540-51542-9_41 .
  7. Wegener, Ingo (1993), BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small), "BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small)", Theoerical computer science (Elsevier) 118 (118): 81–98, doi:10.1016/0304-3975(93)90364-Y 
  8. Merge sort Wikipedia page

References

External links