
On Sat, Jul 12, 2008 at 11:20 AM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
The problem with that is that you might recurse a lot and fill your stack, so in practice you'd want to limit it to a few iterations and then give up and put a "wrong" value back at the source. Also you need to be certain that 'tmp' gets some sort of return value optimisation and isn't copied during each return. An explicitly unrolled iterative version would probably be best:
Whether it actually improves performance will depend on quite how large the structs are. It certainly won't help with ints. So let's see your version that sorts structs!
Hi Phil, I implemented this change, using a 3-way swap (n = 2; 4 copies, 2 items placed in the right position) and sent it through my regressions, and saw a 7% speed improvement. It required also iterating over the bins, as you originally suggested. What I also discovered was that I could get a 6% speed improvement on my original algorithm just by replacing std::swap with an explicitly inlined swap and changing which item gets assigned to tmp in the swap (the order matters!), so the improvement (on integers) of your multi-way swap on its own is about 1%. By doing the 3-way swap I should get the majority of the speedup of your technique, as the marginal speedup rapidly degrades as you increase n. For my testing on integers, a 4-way swap seemed slower than a 3-way swap; my guess is that the 3-way swap is easier to optimize. I will upload two sample apps to www.sourceforge.net/projects/spreadsort in the file release; one with integers, and one with integer + data. I've attached two files: philspreadsort.hpp spreadsort.hpp (plus an updated constants.h) I'd welcome people comparing to see which is faster, and by how much, on various platforms. I'll note that I can add a memory optimization to philspreadsort.hpp (count no longer has to be in the bins for speed), but haven't done so for now. Assuming philspreadsort.hpp is at least as fast on average as spreadsort.hpp, I'll run with philspreadsort.hpp, on the assumption that the swapping of larger data types will be faster. We're getting close to making integer_sort faster than std::sort for 1000 elements! Steve