Bucket sort

From Wikipedia, the free encyclopedia

Elements are distributed among bins
Elements are distributed among bins
Then, elements are sorted within each bin
Then, elements are sorted within each bin

Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Bucket sort is a generalization of pigeonhole sort. Bucket sort may run in linear time (Θ(n)). Each bucket must contain only a single element or it incurs a cost for additional sorts on the buckets themselves. Since bucket sort is not a comparison sort, it is not subject to the Ω(n log n) lower bound.

Bucket sort works as follows:

  1. Set up an array of initially empty "buckets" the size of the range.
  2. Go over the original array, putting each object in its bucket.
  3. Sort each non-empty bucket.
  4. Put elements from non-empty buckets back into the original array

Contents

[edit] Pseudocode

function bucket-sort(array, n) is
  buckets ← new array of n empty lists
  for i = 0 to (length(array)-1) do
    insert array[i] into buckets[msbits(array[i], k)]
  for i = 0 to n - 1 do
    next-sort(buckets[i])
  return the concatenation of buckets[0], ..., buckets[n-1]

Here array is the array to be sorted and n is the number of buckets to use. The function msbits(x,k) returns the k most significant bits of x (floor(x/2^(size(x)-k))); different functions can be used to translate the range of elements in array to n buckets, such as translating the letters A-Z to 0-25 or returning the first character (0-255) for sorting strings. The function next-sort is a sorting function; using bucket-sort itself as next-sort produces a relative of radix sort; in particular, the case n = 2 corresponds to quicksort (although potentially with poor pivot choices).

[edit] Postman's Sort

The Postman's sort is a sorting algorithm, a variant of a bucket sort where attributes of the key are described so the algorithm can allocate buckets efficiently. This is the algorithm used by letter-sorting machines in the post office: first states, then post offices, then routes, etc. Since keys are not compared against each other, sorting time is O(cn), where c depends on the size of the key and number of buckets. This is similar to a radix sort that works, "top down," or, "most significant digit first."

[edit] External links

[edit] References