
on Mon Jul 07 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:
Three reasons: 1) Mean performance across all testcases does not mean just Quicksort's best-case. Heapsort is a good general algorithm, and improves the average performance even if it doesn't change much on random data.
You're saying that by improving the worst cases, introsort improves the average. I guess the worst cases for median-of-3 quicksort are more common than I understood them to be? They would have to be pretty common to make introsort *significantly* faster on average, wouldn't they?
The best-case for median-of-3 quicksort assumes that the median-of-3 evenly bisects the data. Depending upon the distribution, this is not a reasonable assumption. If it divides the data roughly 1/4 - 3/4, it'll take roughly twice as long. I would expect such uneven divisions to be relatively common. If it divides 1/8-7/8, it'll take roughly three times as long, but will be even worse if it always happens to one side. These situations are far from the worst-case, but explain why even on average, Quicksort is somewhat worse than Introsort, because some distributions are uneven enough for Introsort to call Heapsort.
I'm having a hard time believing that the average case for QuickSort is as bad as you say. I have done a good deal of searching and I see quite a few test results showing introsort and quicksort have approximately identical average-case behaviors.
2) Introsort also uses insertion_sort, which is really fast on small data sets.
Yeah, but that's essentially an implementation detail optimization. Many Quicksort implementations do the same thing. I don't believe that changes the overall big-O behavior.
No, it doesn't, but the overhead of finding the median becomes more significant on overall performance on the smaller subsets, so it is a significant implementation detail.
It's not significant in the comparison with QS if QS implementations do it too!
3) In benchmark testing I find introsort to be about 3X as fast as the qsort library call. That should be the strongest argument of all.
Well that's no surprise at all. You have to pay for a function call indirection with qsort.
Then why hasn't the compiler/library provider inlined the qsort call so the uninformed don't have such a huge speed penalty?
Uh... because qsort is a 'C' library routine?
I'm not claiming Introsort is vastly superior to Quicksort; I'm just saying it is enough faster to be significant.
I don't believe that's true in the average case.
You might consider why the STL provides binary search for forward iterators, then ;-)
That might save a little iteration over a search over every element, but wouldn't it still be O(n)?
It saves no iteration at all. It is O(n) iterator steps but O(log N) comparisons. If comparison is expensive enough, it adds up.
That sounds like a corner-case to me; somebody with a situation like that might want to consider a radix sort, if they can radix-sort their data, and a trie-like structure to minimize comparisons would probably save some runtime. If radix-based operations were cheap enough relative to comparisons, they might want a pure-radix sort and radix-based lookup.
Not all data can be radix sorting. And anyway, what does sorting have to do with binary search?
Do you have a genuine desire for a real application to see a bidirectional (or unidirectional) iterator version of a hybrid sort?
I'm not trying to get you to do it. I'm just pointing out that std::sort is overconstrained and should at least be made to work on bidirectional iterators (or have its worst-case complexity bound tightened in the case of random access iterators, or both). <http://lists.boost.org/mailman/listinfo.cgi/boost>
I thought about what would be required to make integer_sort work with forward iterators: 1) I would have to either eliminate the check to see whether the data set is small enough that std::sort would be better, require the user to provide a count of the number of items being sorted, or be able to determine that the iterator isn't random-access and thus not do the check. 2) I would have to eliminate recursive calls to std::sort, and instead use an algorithm that works safely using the iterator I'm provided for recursive comparison-based sorting calls. 3) I would need an inlined function call for going forward n places from an iterator that takes constant time for a random access iterator, and the minimum number of operations necessary for a more limited forward iterator. With that said, it should add an additional pass through the data for a non-random-access iterator.
Given all that, I can make integer_sort work off a forward iterator, without impairing randomaccess iterator performance, and still having decent performance for a forward iterator. So if you can provide me all of these: 1) A way to determine that an iterator is not random-access. 2) An efficient generic comparison-based recursive algorithm that takes forward iterators that is competitive with Introsort in performance. 3) An efficient way to go forward n steps from an iterator that is just as fast as + n for a random access iterator, but also works for normal iterators.
Then I could make integer_sort work with a plain forward-iterator. I'd still have to be able to swap elements. For linked lists, a specialized version of Spreadsort would be significantly faster. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Dave Abrahams BoostPro Computing http://www.boostpro.com