
std::sort is fast for mostly-sorted data on x86 processors because it handles sorted data in an optimized fashion and minimizes the number of swaps. It iterates forward from the first unprocessed element until it finds one that is out of order. Then it iterates back from the last unprocessed element until it finds one that is out of order. Then those two are swaps (after which the loop repeats, starting from the next elements, until done). This approach has two benefits: 1) It minimizes the number of swaps, which are slow operations 2) If the data is mostly (or fully) sorted, the iteration to the first unordered element is extremely fast, as modern x86 processors can handle long predictable and simple iterations very efficiently. Hence the speed advantage std::sort has for mostly-sorted data. std::sort does log(n) very fast operations when the data is mostly sorted. integer_sort does about 2 slow iterations with mostly-sorted data. As the fast operations optimize much better, the log(n) can be faster. To resolve the already-sorted data performance issue, I took my find_extremes method, which finds the maximum and minimum of the data, enabling more efficient processing, and modified it. Since the maximum of a contiguous list of elements is the last one (assuming sorting on <; for > it's just backwards and handled by the functors; look at the reverse integer sort sample), and the minimum is the first one, I can instead check to see if data is sorted to find the minimum and maximum, and if the data is sorted, then it's faster. As soon as it hits an out-of-order element, the sorted check stops, and I fall back on the conventional approach for the rest of the data. This approach has negligible impact on performance for unsorted data, but for fully-sorted (or identical) data, I find out the data is fully sorted, and quit as soon as this quick check is done, saving lots of time. For mostly-sorted data, some iteration time is saved by iterating up to the first unsorted element before switching to the slower conventional min/max finding. (Yes, I've thought of using a single loop to find the min and a single loop to find the max; it was slower when I last tested it on my old Mac, but might be faster on x86). That was the first optimization; it improved weighted-average performance on random data 8% , already-sorted performance by roughly an order of magnitude, and cut mostly-sorted runtime by about 40%. The second optimization was to preprocess the bins to minimize swapping. I keep track of the first unsorted element in a bin, and the change was to increment that position as long as the element in that position belongs where it currently is. This substantially reduces the number of swaps on mostly-sorted data. I only apply this optimization if the bins are at least a modest minimum size to avoid adding unnecessary overhead to processing unsorted data. After this optimization I'm finding a 15% speed penalty on mostly-sorted data relative to std::sort on average and a 8% speed improvement on my worst-tested-case with mostly-sorted data. By contrast, I see a 34% speedup on average and 58% speedup for worst-tested-case for unsorted data. An optimization I tried a while ago was to check if the item I'm thinking of swapping with belongs where it is, and if it doesn't, I check the next element in the bin, iterating until I find an out-of-order element to swap with. As this adds a loop inside the innermost loop in the slowest part of the code, it doesn't optimize well and I found it hurt performance. On Wed, May 6, 2009 at 8:24 PM, Jeffrey Bosboom <jbosboom@uci.edu> wrote:
I'm curious as to why std::sort was faster for mostly-sorted data, and what you did to your algorithm to match its speed.
--Jeffrey Bosboom