
on Sun Jul 06 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:
That may be true, but Introsort is significantly faster both on average and worst-case than Quicksort,
As I have always understood it (and as Musser's paper describes it), introsort *is* Quicksort until its behavior starts to stray from the average case, when it switches to heapsort as a fallback. How could it possibly have better average case complexity than Quicksort?
Three reasons: 1) Mean performance across all testcases does not mean just Quicksort's best-case. Heapsort is a good general algorithm, and improves the average performance even if it doesn't change much on random data.
You're saying that by improving the worst cases, introsort improves the average. I guess the worst cases for median-of-3 quicksort are more common than I understood them to be? They would have to be pretty common to make introsort *significantly* faster on average, wouldn't they?
2) Introsort also uses insertion_sort, which is really fast on small data sets.
Yeah, but that's essentially an implementation detail optimization. Many Quicksort implementations do the same thing. I don't believe that changes the overall big-O behavior.
I used to use Quicksort for recursive calls from Spreadsort, then found that for sizes up to 16 it was much better to use insertion_sort. So for the final sorting of very small data sets after it's finished cutting them up, introsort is faster. 3) In benchmark testing I find introsort to be about 3X as fast as the qsort library call. That should be the strongest argument of all.
Well that's no surprise at all. You have to pay for a function call indirection with qsort.
The first version of Spreadsort I wrote in 2001 allocated separate memory for each bin, and with a linked-list approach instead each sorting operation could be done in a single pass (as opposed to the three passes used in the version I'm turning into integer_sort), though without finding the maximum and minimum values. That would make more sense to use with a linked-list data set. It's not complicated to implement, but my testing shows that iterating through linked lists is slow because they often have cache misses, and besides, binary search only makes sense in a data structure that supports random access.
You might consider why the STL provides binary search for forward iterators, then ;-)
That might save a little iteration over a search over every element, but wouldn't it still be O(n)?
It saves no iteration at all. It is O(n) iterator steps but O(log N) comparisons. If comparison is expensive enough, it adds up.
I'd think it's there just because it's possible, not because it has much use. I have no problem with providing general capabilities just because they're possible.
Do you have a genuine desire for a real application to see a bidirectional (or unidirectional) iterator version of a hybrid sort?
I'm just generally interested in algorithms.
It would be at least a few hours of development plus testing to make sure it isn't broken. I'd be happy to tell you how to do it if you wish, but without a need, I don't see why I should bother. I have many things to do, and little time.
I'm not trying to get you to do it. I'm just pointing out that std::sort is overconstrained and should at least be made to work on bidirectional iterators (or have its worst-case complexity bound tightened in the case of random access iterators, or both). -- Dave Abrahams BoostPro Computing http://www.boostpro.com