Introsort

From Wikipedia, the free encyclopedia

Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort, switching to heapsort once the recursion depth exceeds a preset value. The resulting sort combines the best of both, with a worst-case O(n log n) runtime and practical performance comparable to quicksort on typical data sets. Because both algorithms it uses are comparison sorts, it is also one.

In quicksort, one of the critical operations is choosing the pivot, the element around which the list is partitioned. The simplest pivot selection algorithm is to simply take the first element of the list as the pivot. However, this causes poor behavior for the common case of sorted or nearly-sorted input. Similarly, taking the last element causes bad behavior for reverse-sorted input. A popular improvement is the median-of-3 pivot selection algorithm, which takes the median of the first, middle, and last elements of the list; however, even though this performs well on many real-world inputs, it is still possible to create a carefully-chosen median-of-3 killer list that will cause dramatic slowdown of a quicksort based on this pivot selection technique. Such inputs could potentially be exploited by an aggressor, for example by sending such a list to an Internet server for sorting as a denial of service attack.

Musser reported that on a median-of-3 killer sequence of 100,000 elements, introsort's running time was 1/200th that of median-of-3 quicksort. Musser also considered the effect on caches of Sedgewick's delayed small sorting, where small ranges are sorted at the end in a single pass of insertion sort. He reported that it could double the number of cache misses, but that its performance with double-ended queues was significantly better and should be retained for template libraries, in part because the gain in other cases from doing the sorts immediately was not great.

The June 2000 SGI C++ Standard Template Library stl_algo.h implementation of unstable sort uses the Musser introsort approach with the recursion depth to switch to heapsort passed as a parameter, median-of-3 pivot selection and the Sedgewick final insertion sort pass. The element threshold for switching to the simple insertion sort was 16.

The C++ STL implementations generally significantly (several times as fast) outperform the C implementation because they are implemented to allow inlining, while the generic C equivalent must use function calls for comparisons. This advantage could be compensated for by using custom versions of the sort function, at the cost of losing the advantage of a totally generic library function.

[edit] References

In other languages