Talk:Spreadsort

From Wikipedia, the free encyclopedia

Issues with Spreadsort:

Contents

[edit] Performance Comparisons

As these are hard to verify, when/how should they be included, if at all?

[edit] Sorting outside of RAM

How much should this be described, if at all? It's a significant strength of the algorithm.


[edit] Worst-case performance

As cited in the Spreadsort paper, Markku Tamminen's paper "Two is as good as any" proves that sorting algorithms of this type are O(n) for continuous integrable functions. It requires effort to design a distribution that makes this hit its worst-case performance. On evenly distributed random data, Spreadsort (like Tamminen's approach) breaks up the problem into tiny pieces which comparison-based algorithms can finish quickly. On data with a lot of overlap, it often can solve the problem quickly in a distributional fashion.

[edit] Parallel version

Anyone interested? It hasn't been developed, but should work well.


[edit] Value vs. "-"

Which is more general and/or usable? A Value function that returns a value to be subtracted, or a "-" function? Obviously a "-" function can be constructed as Value() - Value(). I left this alone since that's the way it is described in the paper.

[edit] Two are as Good as Any

Markku Tamminen proved that two recursive iterations of this type of algorithm will have O(n) time on any continuous integrable function in his paper "Two is as Good as Any". For performance reasons, this isn't always optimal, but it was an interesting result. Real data often isn't in the form of such a function, but a bell curve is a common example of a function that is integrable. It is trivial to make Spreadsort always do two iterations if the first iteration doesn't cut the problem down to a very small size, to attain this result. I'm not sure how much mention to make of this in the article; I consider it an important result.