
Steven Ross wrote:
I am using MSVC 2005 standard edition with bjam's default release settings for it. I have not tried other compiler settings, besides 64-bit compilation, which has comparable performance. This is on a Core 2 Duo purchased earlier this year for Boost development. I saw the same behavior and the same magnitude of speedup for float_sort vs. std::sort on floats on a 6 year old refurbished Dell Windows system using gcc in Cygwin. With two different compilers on processors manufactured at least 6 years apart, I suspect it's an issue with the architecture. There may be Intel processors that are faster for this purpose, but they certainly aren't the two mass-market Intel processors I've tried, with default optimizations.
If anybody would like to look at my testcase, it is in the boost vault, inside algorithm_sorting.zip. Inside there, it is in libs/algorithm/sorting/samples/float_as_integer.cpp.
This version fails to compile for me, because it includes boost/static_warning.hpp, and it does not seem to be present in my SVN HEAD tree. I've changed include to boost/serialization/static_warning.hpp, but this can't be right. Am I missing something?
Even without using integer_sort, and just using std::sort on integers with the code I posted initially, the speedup is on the same order (80X). I tested using randomgen 7 16 1000000 (randomgen is compiled from my randomgen.cpp). 7 is the bit range among the "high" 16 bits, the second number (16) is the bit range among the "low" 16 bits, and 1000000 is the size of the vector to sort. 23 bits are used to exclude NaNs, which I assume to be uncommon in real data, and which are sorted arbitrarily, meaning that two correct sorted results containing NaNs will often not be identical. With mostly NaNs the speedup with this approach is about 2X.
From the libs/algorithm/sorting directory (once copied into the boost tree), you can test this way: bjam --tune randomgen randomgen 7 16 1000000 bjam --tune float_as_integer float_as_integer (uses the approach I described) float_as_integer -std (uses std::sort on floats) (I see a huge difference in runtime on x86, but not on PowerPC)
To bet bigger times that are less prone to noise, I have changed loopCount in your test to 10. Then, then time reported without -std is 1 second, with very small difference between each run. The time with -std is 43 seconds, again with negligible variation. If I add: cxxflags="-march=nocona -mfpmath=sse" then -std runs in 16 seconds, while the time for your algorithm is the same 1 second. Of course, this is still significant speedup, but smaller than number you have reported, so probably a deeper look is reasonable. Incidentally, does your library allow to sort only a collection of integers/floats, or it also allows to sort an arbitrary collection where each item has integer/float key? If so, can you point me at example? - Volodya