Re: [boost] Review Request: Algorithm.Sorting

7 May 2009

      std::sort is fast for mostly-sorted data on x86 processors because it
handles sorted data in an optimized fashion and minimizes the number of
swaps.  It iterates forward from the first unprocessed element until it
finds one that is out of order.  Then it iterates back from the last
unprocessed element until it finds one that is out of order.  Then those two
are swaps (after which the loop repeats, starting from the next elements,
until done).
This approach has two benefits:
1) It minimizes the number of swaps, which are slow operations
2) If the data is mostly (or fully) sorted, the iteration to the first
unordered element is extremely fast, as modern x86 processors can handle
long predictable and simple iterations very efficiently.

Hence the speed advantage std::sort has for mostly-sorted data.  std::sort
does log(n) very fast operations when the data is mostly sorted.
integer_sort does about 2 slow iterations with mostly-sorted data.  As the
fast operations optimize much better, the log(n) can be faster.

To resolve the already-sorted data performance issue, I took my
find_extremes method, which finds the maximum and minimum of the data,
enabling more efficient processing, and modified it.
Since the maximum of a contiguous list of elements is the last one (assuming
sorting on <; for > it's just backwards and handled by the functors; look at
the reverse integer sort sample), and the minimum is the first one, I can
instead check to see if data is sorted to find the minimum and maximum, and
if the data is sorted, then it's faster.  As soon as it hits an out-of-order
element, the sorted check stops, and I fall back on the conventional
approach for the rest of the data.
This approach has negligible impact on performance for unsorted data, but
for fully-sorted (or identical) data, I find out the data is fully sorted,
and quit as soon as this quick check is done, saving lots of time.  For
mostly-sorted data, some iteration time is saved by iterating up to the
first unsorted element before switching to the slower conventional min/max
finding.
(Yes, I've thought of using a single loop to find the min and a single loop
to find the max; it was slower when I last tested it on my old Mac, but
might be faster on x86).

That was the first optimization; it improved weighted-average performance on
random data 8% ,  already-sorted performance by roughly an order of
magnitude, and cut mostly-sorted runtime by about 40%.

The second optimization was to preprocess the bins to minimize swapping.  I
keep track of the first unsorted element in a bin, and the change was to
increment that position as long as the element in that position belongs
where it currently is.  This substantially reduces the number of swaps on
mostly-sorted data.  I only apply this optimization if the bins are at least
a modest minimum size to avoid adding unnecessary overhead to processing
unsorted data.
After this optimization I'm finding a 15% speed penalty on mostly-sorted
data relative to std::sort on average and a 8% speed improvement on my
worst-tested-case with mostly-sorted data.  By contrast, I see a 34% speedup
on average and 58% speedup for worst-tested-case for unsorted data.

An optimization I tried a while ago was to check if the item I'm thinking of
swapping with belongs where it is, and if it doesn't, I check the next
element in the bin, iterating until I find an out-of-order element to swap
with.  As this adds a loop inside the innermost loop in the slowest part of
the code, it doesn't optimize well and I found it hurt performance.

On Wed, May 6, 2009 at 8:24 PM, Jeffrey Bosboom <jbosboom@uci.edu> wrote:
...
I'm curious as to why std::sort was faster for mostly-sorted data, and
what you did to your algorithm to match its speed.
--Jeffrey Bosboom

Re: [boost] Review Request: Algorithm.Sorting

Steven Ross