Re: [boost] Proposed templated integer_sort

6 Jul 2008


      on Sun Jul 06 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:
...
...
...
That may be true, but Introsort is significantly faster both on
average and worst-case than Quicksort,
As I have always understood it (and as Musser's paper describes it),
introsort *is* Quicksort until its behavior starts to stray from the
average case, when it switches to heapsort as a fallback.  How could it
possibly have better average case complexity than Quicksort?
Three reasons:
1) Mean performance across all testcases does not mean just Quicksort's
best-case.  Heapsort is a good general algorithm, and improves the average
performance even if it doesn't change much on random data.
You're saying that by improving the worst cases, introsort improves the
average.  I guess the worst cases for median-of-3 quicksort are more
common than I understood them to be?  They would have to be pretty
common to make introsort *significantly* faster on average, wouldn't
they?
...
2) Introsort also uses insertion_sort, which is really fast on small
data sets.
Yeah, but that's essentially an implementation detail optimization.
Many Quicksort implementations do the same thing.  I don't believe that
changes the overall big-O behavior.
...
I used to use Quicksort for recursive calls from Spreadsort, then
found that for sizes up to 16 it was much better to use insertion_sort.  So
for the final sorting of very small data sets after it's finished cutting
them up, introsort is faster.
3) In benchmark testing I find introsort to be about 3X as fast as the qsort
library call.  That should be the strongest argument of all.
Well that's no surprise at all.  You have to pay for a function call
indirection with qsort.
...
...
...
The first version of Spreadsort I wrote in 2001 allocated separate
memory for each bin, and with a linked-list approach instead each
sorting operation could be done in a single pass (as opposed to the
three passes used in the version I'm turning into integer_sort),
though without finding the maximum and minimum values.  That would
make more sense to use with a linked-list data set.  It's not
complicated to implement, but my testing shows that iterating
through linked lists is slow because they often have cache misses,
and besides, binary search only makes sense in a data structure
that supports random access.
You might consider why the STL provides binary search for forward
iterators, then ;-)
That might save a little iteration over a search over every element, but
wouldn't it still be O(n)?
It saves no iteration at all.  It is O(n) iterator steps but O(log N)
comparisons.  If comparison is expensive enough, it adds up.
...
I'd think it's there just because it's possible, not because it has much
use.  I have no problem with providing general capabilities just because
they're possible.
...
...
Do you have a genuine desire for a real application to see a
bidirectional (or unidirectional) iterator version of a hybrid sort?
I'm just generally interested in algorithms.
It would be at least a few hours of development plus testing to make sure it
isn't broken.  I'd be happy to tell you how to do it if you wish, but
without a need, I don't see why I should bother.  I have many things to do,
and little time.
I'm not trying to get you to do it.  I'm just pointing out that
std::sort is overconstrained and should at least be made to work on
bidirectional iterators (or have its worst-case complexity bound
tightened in the case of random access iterators, or both).

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Re: [boost] Proposed templated integer_sort

David Abrahams