
On Sat, Jul 12, 2008 at 11:20 AM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
We want to move element A to the position currently occupied by element B. We could swap them, but moving B to A's position is probably not useful. So let's first move B to a good position for it, and then move A into its space. Of course some other element C will be occupying the position where we want to put B, so we have to keep going until we find an element that wants to be in the position where A was.
Whether it actually improves performance will depend on quite how large the structs are. It certainly won't help with ints. So let's see your version that sorts structs! <http://lists.boost.org/mailman/listinfo.cgi/boost>
This is an excellent idea. You need n + 2 copies to definitely put n elements in their correct position, and if you're lucky, put n + 1 elements in the correct position. With a swap, n = 1, so you need 3 copies. Most of the speedup will probably occur with a small n (n = 4 to 8?). I will try an explicitly unrolled loop with a small number of elements to see how well this works. I have the struct code commented out in sample.c, but there is no code change in spreadsort.hpp required to handle larger structs. I don't really need to swap the elements into position, what I need is the fastest way to rearrange elements that are not in the correct order, but where I do know where they need to be put, in a minimum number of copies and checks for the correct position. The prior optimization we discussed where I posted my new code for the swap loop had a net 16% speed improvement across my regressions, to the point where my regressions ran in less than half the time that std::sort took on them (for 4e7 elements), so we're getting susbstantial improvement. I intend to try your version of that change at some point, but I'll have to verify that it also works well on Intel processors, and I'm skeptical from prior experience. Your latest suggestion is more promising and novel, so I will try it first. Your idea may even make an improvement for integers, if I code it right, because if you think about it, most of the time swapping is spent with roughly 512 swaps before an appropriate element for the position is found. If I can find a way to parameterize the "n" number of copies based upon size_of(*RandomAccessIter), then I could tune this to the size of the data type, but even without that, a small number of iterations of this approach should get a sizeable speedup across all data types. I had considered some ideas of this type in the past, but got stuck on the extra memory required, and thought that might add overhead, but with a small constant number of elements, the overhead should be minimal. In effect, this placement algorithm (as improved with my last posting) works in two stages: first stage, repeated n/512 times: swap until the right element is in position (roughly 512 times), then go to the next element This stage should dominate runtime now, and it's the one your latest suggestion would improve. second stage: run along the bins, skipping to the first element that is out of place, and do a few swaps for each out-of-place element until it is in the correct position. This stage is now very fast. I was working on fixing my worst-case calculation to be more accurate, which is most important for worst-case performance, but gave a consistent .1-.2% improvement in my regressions, and gave LOG_CONST more meaning. I'll include that fix with my next posting, but I intend to try out your idea next time I work on integer_sort, and I expect to see a substantial improvement. Thanks again Phil. I think when we're done optimizing we'll have a much faster algorithm. Steve