Counting sort

From Wikipedia, the free encyclopedia

Counting sort is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A). It uses this range to create an array C of this length. Each index i in array C is then used to count how many elements in A have the value i. The counts stored in C can then be used to put the elements in A into their right position in the resulting sorted array.

Contents

[edit] Characteristics of counting sort

Counting sort is a stable sort and has a running time of Θ(n+k), where n and k are the lengths of the arrays A (the input array) and C (the counting array), respectively. In order for this algorithm to be efficient, k must not be much larger than n.

The indices of C must run from the minimum to the maximum value in A to be able to index C directly with the values of A. Otherwise, the values of A will need to be translated (shifted), so that the minimum value of A matches the smallest index of C. (Translation by subtracting the minimum value of A from each element to get an index into C therefore gives a counting sort. If a more complex function is used to relate values in A to indices into C, it is a bucket sort.) If the minimum and maximum values of A are not known, an initial pass of the data will be necessary to find these (this pass will take time Θ(n); see selection algorithm).

The length of the counting array C must be at least equal to the range of the numbers to be sorted (that is, the maximum value minus the minimum value plus 1). This makes counting sort impractical for large ranges in terms of time and memory needed. Counting sort may for example be the best algorithm for sorting numbers whose range is between 0 and 100, but it is probably unsuitable for sorting a list of names alphabetically. However counting sort can be used in radix sort to sort a list of numbers whose range is too large for counting sort to be suitable alone.

Because counting sort uses key values as indexes into an array, it is not a comparison sort, and the Ω(n log n) lower-bound for sorting is inapplicable.

[edit] Tally sort

A well-known variant of counting sort is tally sort, where the input is known to contain no duplicate elements, or where we wish to eliminate duplicates during sorting. In this case the count array can be represented as a bit array; a bit is set if that key value was observed in the input array. Tally sort is widely familiar because of its use in the book Programming Pearls as an example of an unconventional solution to a particular set of limitations.[1]

[edit] The algorithm

[edit] Informal

  1. Find the highest and lowest elements of the set
  2. Count the different elements in the array. (E.g. Set[4,4,4,1,1] would give three 4's and two 1's)
  3. Accumulate the counts. (E.g. Starting from the first element in the new set of counts, add the current element to the previous.)
  4. Fill the destination array from backwards: put each element to its countth position.
    Each time you put in a new element decrease its count.

[edit] C++ implementation

/// countingSort - sort an array of values.
///
/// For best results the range of values to be sorted
/// should not be significantly larger than the number of 
/// elements in the array.
/// 
/// param nums - input - array of values to be sorted
/// param size - input - number of elements in the array
///
void counting_sort(int *nums, int size)
{
        // search for the minimum and maximum values in the input
        int i, min = nums[0], max = min;
        for(i = 1; i < size; ++i)
        {
                if (nums[i] < min)
                        min = nums[i];
                else if (nums[i] > max)
                        max = nums[i];
        }
 
        // create a counting array, counts, with a member for 
        // each possible discrete value in the input.  
        // initialize all counts to 0.
        int distinct_element_count = max - min + 1;
        int[] counts = new int[distinct_element_count];
        for(i=0; i<distinct_element_count; ++i)
                counts[i] = 0;
 
        // accumulate the counts - the result is that counts will hold
        // the offset into the sorted array for the value associated with that index
        for(i=0; i<size; ++i)
                ++counts[ nums[i] - min ];
 
        // store the elements in the array
        int j=0;
        for(i=min; i<=max; i++)
                for(int z=0; z<counts[i-min]; z++)
                        nums[j++] = i;
 
        delete[] counts;
}

[edit] References

  1. ^ Chapter 1 of Jon Bentley's Programming Pearls ISBN 0-201-10331-1.

[edit] External links

Wikibooks