Insertion sort

From Wikipedia, the free encyclopedia

Example of insertion sort sorting a list of random numbers.
Example of insertion sort sorting a list of random numbers.

Insertion sort is a simple sorting algorithm, a comparison sort in which the sorted array (or list) is built one entry at a time. It is much less efficient on large lists than the more advanced algorithms such as quicksort, heapsort, or merge sort, but it has various advantages:

  • Simple to implement
  • Efficient on (quite) small data sets
  • Efficient on data sets which are already substantially sorted: it runs in O(n + d) time, where d is the number of inversions
  • More efficient in practice than most other simple O(n2) algorithms such as selection sort or bubble sort: the average time is n2/4 and it is linear in the best case
  • Stable (does not change the relative order of elements with equal keys)
  • In-place (only requires a constant amount O(1) of extra memory space)
  • It is an online algorithm, in that it can sort a list as it receives it.

In abstract terms, every iteration of an insertion sort removes an element from the input data, inserting it at the correct position in the already sorted list, until no elements are left in the input. The choice of which element to remove from the input is arbitrary and can be made using almost any choice algorithm.

Sorting is typically done in-place. The resulting array after k iterations contains the first k entries of the input array and is sorted. In each step, the first remaining entry of the input is removed, inserted into the result at the right position, thus extending the result:

The array right before insertion of x

becomes:

The array right after insertion of x

with each element > x copied to the right as it is compared against x.

The most common variant, which operates on arrays, can be described as:

  1. Suppose we have a method called insert designed to insert a value into a sorted sequence at the beginning of an array. It operates by starting at the end of the sequence and shifting each element one place to the right until a suitable position is found for the new element. It has the side effect of overwriting the value stored immediately after the sorted sequence in the array.
  2. To perform insertion sort, start at the left end of the array and invoke insert to insert each element encountered into its correct position. The ordered sequence into which we insert it is stored at the beginning of the array in the set of indexes already examined. Each insertion overwrites a single value, but this is okay because it's the value we're inserting.

A simple pseudocode version of the complete algorithm follows, where the arrays are zero-based:

 insert(array a, int length, value) {
     int i := length - 1;
     while (i ≥ 0 and a[i] > value) {
         a[i + 1] := a[i];
         i := i - 1;
     }
     a[i + 1] := value;
 }
 
 insertionSort(array a, int length) {
     int i := 1;
     while (i < length) {
         insert(a, i, a[i]);
         i := i + 1;
     }
 }


[edit] Good and bad input cases

In the best case of an already sorted array, this implementation of insertion sort takes O(n) time: in each iteration, the first remaining element of the input is only compared with the last element of the result. It takes O(n2) time in the average and worst cases, which makes it impractical for sorting large numbers of elements. However, insertion sort's inner loop is very fast, which often makes it one of the fastest algorithms for sorting small numbers of elements, typically less than 10 or so.

[edit] Variants

D.L. Shell made substantial improvements to the algorithm, and the modified version is called Shell sort. It compares elements separated by a distance that decreases on each pass. Shell sort has distinctly improved running times in practical work, with two simple variants requiring O(n3/2) and O(n4/3) time.

If comparisons are very costly compared to swaps, as is the case for example with string keys stored by reference or with human interaction (such as choosing one of a pair displayed side-by-side), then using binary insertion sort can be a good strategy. Binary insertion sort employs binary search to find the right place to insert new elements, and therefore performs \lceil log_2(n!) \rceil comparisons in the worst case, which is Θ(n log n). The algorithm as a whole still takes Θ(n2) time on average due to the series of swaps required for each insertion, and since it always uses binary search, the best case is no longer Ω(n) but Ω(n log n).

To avoid having to make a series of swaps for each insertion, we could instead store the input in a linked list, which allows us to insert and delete elements in constant time. Unfortunately, binary search on a linked list is impossible, so we still spend O(n2) time searching. If we instead replace it by a more sophisticated data structure such as a heap or binary tree, we can significantly decrease both search and insert time. This is the essence of heap sort and binary tree sort.

In 2004, Bender, Farach-Colton, and Mosteiro published a new variant of insertion sort called library sort or gapped insertion sort that leaves a small number of unused spaces ("gaps") spread throughout the array. The benefit is that insertions need only shift elements over until a gap is reached. Surprising in its simplicity, they show that this sorting algorithm runs with high probability in O(n log n) time. [1]

[edit] Comparisons to other sorts

Insertion sort is very similar to bubble sort. In bubble sort, after k passes through the array, the k largest elements have bubbled to the top. (Or the k smallest elements have bubbled to the bottom, depending on which way you do it.) In insertion sort, after k passes through the array, you have a run of k sorted elements at the bottom of the array. Each pass inserts another element into the sorted run. So with bubble sort, each pass takes less time than the previous one, but with insertion sort, each pass may take more time than the previous one.

Some divide-and-conquer algorithms such as quicksort and mergesort sort by recursively dividing the list into smaller sublists which are then sorted. A useful optimization in practice for these algorithms is to switch to insertion sort for "small enough" sublists on which insertion sort outperforms the more complex algorithms. The size of list for which insertion sort has the advantage varies by environment and implementation, but is typically around 8 to 20 elements.

[edit] External links

[edit] References