Talk:Bucket sort
From Wikipedia, the free encyclopedia
Contents |
[edit] The algorithm given is not linear-time
The article currently says:
- Bucket sort runs in linear time ... It works as follows:
- Set up an array of initially empty "buckets" the size of the range.
- Go over the original array, putting each object in its bucket.
- Sort each non-empty bucket.
- Put elements from non-empty buckets back into the original array.
Correct me if I'm wrong, but because of the recursion in step 3, this algorithm is not O(n) as given. It is O(kn) where k is the recursion depth. --P3d0 07:12, Apr 30, 2005 (UTC)
- If k is determined at implementation, then it is a constant at run time and is ignored, leaving O(n).
- If k is determined at runtime as a function of n, then we have O(f(n)*n) which, if f(n) is linear, reduces back to O(n).
- If k is determined at runtime by a property of the data (not the amount of data), then things get tricky. For example, if you were doing a base-10 radix sort, then you would have O(f(n)*n) where f(n) is log-base10(max(n)).
- That's how I understand it.
- --Flatline 15:52, 2005 Apr 30 (UTC)
- Your treatment of the second case is wrong. If f(n) is linear in n, then O(f(n)*n) = O(n²). Aragorn2 13:28, 21 March 2006 (UTC)
I think you are right in this, but isn't it so that O(kn) is also O(n) when k is not n? When k is of siginicant size it will change the lineairity (don't know if that is proper English) of the algorithm.
Another flaw I think in the article is step 3: Sort each non-empty bucket. Isn't this what this algorithm is supposed to do? As I see it:
- Set up an array with initially empty buckets called B. You need to have a size of this bucket, which is usually represented as an array. To find a size you can walk through the set S of items that need to be sorted and seek the maximum sort-key k. This will take n operations, where n is the number of items in S;
- Go through S and put every item in the bucket that corresponds with the sort-key k.
- B now contains buckets that are either empty as non-empty, filled with an item;
- One can now put all the items from B back into S, being sorted.
I think this is a more correct version, but that's how I understand this algorithm. --wackmaniac 00:47, 2005 Nov 30 (CET)
okay after reading the above comments here is how bucket sort works
1. Let A be the i/p array and let B be the o/p array having same size as i/p array
Note:array B is more like a setinel. ie imagine each element of B to point to a bucket, that is a link list.
2. Based on a sort key,eg let this be the MSD of digits the i/p array A , place elements of A in B.
eg A{57,15,32,50,11,99,45,30,62,91} , then B{X,[15->11],X,[32->30],X,[57->50],[62],X,X,[99->91]}
3. Now sort the buckets individually in B based on the fav algo of your choice.
eg B{X,[11->15],X,[32->30],X,[50->57],[62],X,X,[91->99]}
4. Concat the results of B
eg [11->15]->[32->30]->[50->57]->[91->99]
Note: -> indicates a pointer by suneel, iit chicago
[edit] Rewrite
"Bucket sort runs in linear time (Θ(n)) when input is drawn from a uniform distribution."
I have no idea, what the author wants to tell me, but I get the distinct impression that I should rewrite this article.
"partitioning an array into a finite number of buckets and then sorting each bucket"
Oh my god. Rewrite from scratch I think...
Ah, I see now. There is no consesus which algorithm is called bucket sort. I am still searching for the original source of the name bucket sort to clear it up, but for now I have not found anything.
[edit] Source Code
While an example of an algorithm in source form is not a bad thing, in general pseudocode is preferable for a number of reasons. Wikipedia is not s source code repository, and once sample implementations start getting added people feel inclined to add more in their favourite language of choice. This ends up swamping articles on algorithms with an inordinate amount of source code that usually adds nothing not already provided by the pseudocode example. The best place for example implementations is something like Literate Programs which has a wiki specifically designed to hold documented source code in a wide variety of languages. Please note there are potential licensing issues involved as Wikipedia is GFDL while LP is not (though still an open license). Leland McInnes 06:35, 6 April 2006 (UTC)
[edit] Wrong algorithmic order?
Bucket sort is worst case O(n lg m) where m is the number of distinct possible elements. If m > number of buckets, then (as the article says) it must be recursively applied to the buckets. The recursion depth is O(lg m). COnsider sorting 16 bit ints. You can bucket sort with 256 buckets, applied twice. 24 bit ints will require 3 applications, 32 bit, 4 and so on.
Esentially, the number of passes (recursion depth) is linear in the key length, or logarithmic in the number of distinct keys. Serviscope Minor 01:56, 21 November 2006 (UTC)