Re: [boost] Proposed templated integer_sort

7 Jul 2008


      on Mon Jul 07 2008, "Steven Ross" <spreadsort-AT-gmail.com> wrote:
...
...
...
Three reasons:
1) Mean performance across all testcases does not mean just Quicksort's
best-case.  Heapsort is a good general algorithm, and improves the
average
performance even if it doesn't change much on random data.
You're saying that by improving the worst cases, introsort improves the
average.  I guess the worst cases for median-of-3 quicksort are more
common than I understood them to be?  They would have to be pretty
common to make introsort *significantly* faster on average, wouldn't
they?
The best-case for median-of-3 quicksort assumes that the median-of-3 evenly
bisects the data.  Depending upon the distribution, this is not a reasonable
assumption.
If it divides the data roughly 1/4 - 3/4, it'll take roughly twice as long.
I would expect such uneven divisions to be relatively common.
If it divides 1/8-7/8, it'll take roughly three times as long, but will be
even worse if it always happens to one side.
These situations are far from the worst-case, but explain why even on
average, Quicksort is somewhat worse than Introsort, because some
distributions are uneven enough for Introsort to call Heapsort.
I'm having a hard time believing that the average case for QuickSort is
as bad as you say.  I have done a good deal of searching and I see quite
a few test results showing introsort and quicksort have approximately
identical average-case behaviors.
...
...
...
2) Introsort also uses insertion_sort, which is really fast on small
data sets.
Yeah, but that's essentially an implementation detail optimization.
Many Quicksort implementations do the same thing.  I don't believe that
changes the overall big-O behavior.
No, it doesn't, but the overhead of finding the median becomes more
significant on overall performance on the smaller subsets, so it is a
significant implementation detail.
It's not significant in the comparison with QS if QS implementations do
it too!
...
...
...
3) In benchmark testing I find introsort to be about 3X as fast as the
qsort
library call.  That should be the strongest argument of all.
Well that's no surprise at all.  You have to pay for a function call
indirection with qsort.
Then why hasn't the compiler/library provider inlined the qsort call so the
uninformed don't have such a huge speed penalty?
Uh... because qsort is a 'C' library routine?
...
I'm not claiming Introsort is vastly superior to Quicksort; I'm just saying
it is enough faster to be significant.
I don't believe that's true in the average case.
...
...
...
...
You might consider why the STL provides binary search for forward
iterators, then ;-)
That might save a little iteration over a search over every element, but
wouldn't it still be O(n)?
It saves no iteration at all.  It is O(n) iterator steps but O(log N)
comparisons.  If comparison is expensive enough, it adds up.
That sounds like a corner-case to me; somebody with a situation like that
might want to consider a radix sort, if they can radix-sort their data, and
a trie-like structure to minimize comparisons would probably save some
runtime.  If radix-based operations were cheap enough relative to
comparisons, they might want a pure-radix sort and radix-based lookup.
Not all data can be radix sorting.  And anyway, what does sorting have
to do with binary search?
...
...
...
...
...
Do you have a genuine desire for a real application to see a
bidirectional (or unidirectional) iterator version of a hybrid sort?
I'm not trying to get you to do it.  I'm just pointing out that
std::sort is overconstrained and should at least be made to work on
bidirectional iterators (or have its worst-case complexity bound
tightened in the case of random access iterators, or both).
 <http://lists.boost.org/mailman/listinfo.cgi/boost>
I thought about what would be required to make integer_sort work with
forward iterators:
1) I would have to either eliminate the check to see whether the data set is
small enough that
std::sort would be better,  require the user to provide a count of the
number of items being sorted, or be able to determine that the iterator
isn't random-access and thus not do the check.
2) I would have to eliminate recursive calls to std::sort, and instead use
an algorithm that works safely using the iterator I'm provided for recursive
comparison-based sorting calls.
3) I would need an inlined function call for going forward n places from an
iterator that takes constant time for a random access iterator, and the
minimum number of operations necessary for a more limited forward iterator.
With that said, it should add an additional pass through the data for a
non-random-access iterator.
Given all that, I can make integer_sort work off a forward iterator, without
impairing randomaccess iterator performance, and still having decent
performance for a forward iterator.
So if you can provide me all of these:
1) A way to determine that an iterator is not random-access.
2) An efficient generic comparison-based recursive algorithm that takes
forward iterators that is competitive with Introsort in performance.
3) An efficient way to go forward n steps from an iterator that is just as
fast as + n for a random access iterator, but also works for normal
iterators.
Then I could make integer_sort work with a plain forward-iterator.  I'd
still have to be able to swap elements.
For linked lists, a specialized version of Spreadsort would be significantly
faster.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Re: [boost] Proposed templated integer_sort

David Abrahams