Talk:Bucket sort

From Wikipedia, the free encyclopedia

	This article is within the scope of WikiProject Computer science, which aims to create a comprehensive computer science reference for Wikipedia. Visit the project page for more information and to join in on related discussions.
B	rated as B-Class on the assessment scale
High	rated as high-importance on the assessment scale

1 The algorithm given is not linear-time
2 Rewrite
3 Source Code
4 Wrong algorithmic order?
5 More Abstractly
6 Mess
7 Memory Requirements
8 Implementation link
9 implementation doubts
10 proof of linearity

[edit] The algorithm given is not linear-time

The article currently says:

Bucket sort runs in linear time ... It works as follows:

Set up an array of initially empty "buckets" the size of the range.
Go over the original array, putting each object in its bucket.
Sort each non-empty bucket.
Put elements from non-empty buckets back into the original array.

Correct me if I'm wrong, but because of the recursion in step 3, this algorithm is not O(n) as given. It is O(kn) where k is the recursion depth. --P3d0 07:12, Apr 30, 2005 (UTC)

If k is determined at implementation, then it is a constant at run time and is ignored, leaving O(n).

If k is determined at runtime as a function of n, then we have O(f(n)*n) which, if f(n) is linear, reduces back to O(n).

If k is determined at runtime by a property of the data (not the amount of data), then things get tricky. For example, if you were doing a base-10 radix sort, then you would have O(f(n)*n) where f(n) is log-base10(max(n)).

That's how I understand it.

--Flatline 15:52, 2005 Apr 30 (UTC)

Your treatment of the second case is wrong. If f(n) is linear in n, then O(f(n)*n) = O(n²). Aragorn2 13:28, 21 March 2006 (UTC)

Linear Time is both Possible and Expected from Bucket Sort:

I believe the condition required to cause bucket sort to be maximally bounded (see big-O notation) by O(n) is the uniform distribution of the inputs across the range of possible inputs. Each bucket is "expected" (see statistics: expected value) to contain [range]/[number of buckets] elements. Thus, if a reasonable sort (such as counting sort or an embedded bucket sort) is applied to each bucket, the total sorting of all buckets will, in an average case, be maximally bounded by O(n). However, as has been pointed out, the worst-case bucket-sort can explode into just as poor of a running time as any other sorting algorithm. Bucket sorting is considered linear, because the average-case is linear given certain conditions.

--Dinre (talk) 17:57, 14 February 2008 (UTC)

I think you are right in this, but isn't it so that O(kn) is also O(n) when k is not n? When k is of siginicant size it will change the lineairity (don't know if that is proper English) of the algorithm.

Another flaw I think in the article is step 3: Sort each non-empty bucket. Isn't this what this algorithm is supposed to do? As I see it:

Set up an array with initially empty buckets called B. You need to have a size of this bucket, which is usually represented as an array. To find a size you can walk through the set S of items that need to be sorted and seek the maximum sort-key k. This will take n operations, where n is the number of items in S;
Go through S and put every item in the bucket that corresponds with the sort-key k.
B now contains buckets that are either empty as non-empty, filled with an item;
One can now put all the items from B back into S, being sorted.

I think this is a more correct version, but that's how I understand this algorithm. --wackmaniac 00:47, 2005 Nov 30 (CET)

okay after reading the above comments here is how bucket sort works

1. Let A be the i/p array and let B be the o/p array having same size as i/p array

   Note:array B is more like a setinel. ie imagine each element of B to point to a bucket, that is a link list.

2. Based on a sort key,eg let this be the MSD of digits the i/p array A , place elements of A in B.

  eg A{57,15,32,50,11,99,45,30,62,91} , then B{X,[15->11],X,[32->30],X,[57->50],[62],X,X,[99->91]}

3. Now sort the buckets individually in B based on the fav algo of your choice.

  eg B{X,[11->15],X,[32->30],X,[50->57],[62],X,X,[91->99]}

4. Concat the results of B

  eg [11->15]->[32->30]->[50->57]->[91->99]

Note: -> indicates a pointer by suneel, iit chicago

[edit] Rewrite

"Bucket sort runs in linear time (Θ(n)) when input is drawn from a uniform distribution."

I have no idea, what the author wants to tell me, but I get the distinct impression that I should rewrite this article.

"partitioning an array into a finite number of buckets and then sorting each bucket"

Oh my god. Rewrite from scratch I think...

Ah, I see now. There is no consesus which algorithm is called bucket sort. I am still searching for the original source of the name bucket sort to clear it up, but for now I have not found anything.

[edit] Source Code

While an example of an algorithm in source form is not a bad thing, in general pseudocode is preferable for a number of reasons. Wikipedia is not s source code repository, and once sample implementations start getting added people feel inclined to add more in their favourite language of choice. This ends up swamping articles on algorithms with an inordinate amount of source code that usually adds nothing not already provided by the pseudocode example. The best place for example implementations is something like Literate Programs which has a wiki specifically designed to hold documented source code in a wide variety of languages. Please note there are potential licensing issues involved as Wikipedia is GFDL while LP is not (though still an open license). Leland McInnes 06:35, 6 April 2006 (UTC)

[edit] Wrong algorithmic order?

Bucket sort is worst case O(n lg m) where m is the number of distinct possible elements. If m > number of buckets, then (as the article says) it must be recursively applied to the buckets. The recursion depth is O(lg m). COnsider sorting 16 bit ints. You can bucket sort with 256 buckets, applied twice. 24 bit ints will require 3 applications, 32 bit, 4 and so on.

Esentially, the number of passes (recursion depth) is linear in the key length, or logarithmic in the number of distinct keys. Serviscope Minor 01:56, 21 November 2006 (UTC)

[edit] More Abstractly

My understanding is that Bucket or Bin sort is a class of sorting algorithms that is identified by the use of buckets or bins in the initial pass over the elements to be sorted. If your elements are evenly distributed and you know enough about the keys you can reduce the asymptotic running time by selecting an appropriate number of bins. This is how O(n log n) can be reduced to O(n). Beware! If you do not have an even distribution a large number of elements in a single bin will erase any benefits and you will have the added cost of the first run as well as any space and copy costs you accrued in your implementation. —The preceding unsigned comment was added by 24.198.83.109 (talk) 02:19, 26 April 2007 (UTC).

[edit] Mess

The articles bucket sort, pigeonhole sort, Counting sort, Tally sort and to some extent, radix sort constitute a mess due to the similarity (if not sameness in terms of the masic smart trick) of the algorithms. I learned Knuth & stuff like 30 years ago, so I am not of fresh memory, but IMO "bucket" and "pigeonhole" are synonyms. I cleased some carelless texts in a couple of places, but Expert attention is needed. `'Miikka 01:59, 4 July 2007 (UTC)

[edit] Memory Requirements

The article should mention the amount of memory the sort requires. (Some of the other sort articles do.) —Preceding unsigned comment added by 206.53.197.12 (talk) 01:02, 27 October 2007 (UTC)

[edit] Implementation link

I got this when ran implementation in C++:

Program received signal SIGSEGV, Segmentation fault.

[edit] implementation doubts

I don't understand this fully:

list sort(list s, typekey min, typekey max)
{
       int i;
       typekey div, maxb[M], minb[M];
       list head[M], t;
       struct rec aux;

       if (s == NULL)
               return(s); 
       if (max == min)
               return s;

       div = ceil( (max - min) / M ); /* Find dividing factor */ 

       for (i = 0; i < M; ++i)
               head[i] = NULL; /* Place records in buckets */

       while (s != NULL) {
               i = (s->k - min) / div;
               if (i < 0)
                       i = 0;
               else if (i >= M)
                       i = M - 1;
               t = s;
               s = s->next;
               t->next = head[i];
               if (head[i] == NULL)
                       minb[i] = maxb[i] = t->k;
               head[i] = t;
               if (t->k > maxb[i])
                       maxb[i] = t->k;
               if (t->k < minb[i])
                       minb[i] = t->k;
       } /* sort iteratively */

       t = &aux;
       for (i = 0; i < M; ++i)
               if (head[i] != NULL) {
                       t->next = sort(head[i], minb[i], maxb[i]);
                       for( ; t -> next != NULL; t = t->next )
                               ;
               }
       return aux.next;
}

What are: list, aux, typekey? I can see that aux is a structure, bur what are its basic elements? and I don't think this will run in TC as is (or with minor modifications) --Sylvestersteele (talk) 17:28, 29 April 2008 (UTC)

[edit] proof of linearity

Can someone put up a proof of linearity for bucket sort? I think that'll be very helpful. Expert needed --Sylvestersteele (talk) 17:30, 29 April 2008 (UTC)