
On Tue, Dec 16, 2008 at 7:34 AM, Scott McMurray <me22.ca+boost@gmail.com>wrote:
Empyrically, putting the following into the LLVM online demo (http://llvm.org/demo/) shows that, in llvm-g++ at least, the (intermediate) code generated from the memcpy is optimal:
#include <string.h> int foo(float const y) { int x; memcpy(&x,&y,sizeof(int)); return x; }
define i32 @_Z3foof(float %y) nounwind readnone { entry: %0 = bitcast float %y to i32 ; <i32> [#uses=1] ret i32 %0 }
So since the memcpy is always safe and doesn't break the aliasing rules, it certainly seems the best approach.
Thanks, Scott, I've added you to the list of contributors. I made that change after some string_sort improvements, and found: 1) float_sort now works on Cygwin 2) No measurable performance penalty relative to my prior casting approach 3) My Cygwin testcase that was crashing runs over 100X as fast with float_sort than std::sort, generating exactly the same (correct) results. My speculation for the cause of the incredible speedup, based upon some other testcases where float_sort runs in 40% less time than std::sort, is that floating-point comparisons are extremely inefficiently implemented on my Cygwin test system; the testcase that is 100X as fast is purely radix-sorted by float_sort, while the ones that are only modestly faster use some recursive calls to std::sort, which uses comparisons. Based upon these results, it looks to me like an LSB radix sort is the best approach to use on Cygwin for sorting sets of floating-point numbers large than some basic minimum size (1000 elements?), as long as the additional memory overhead for a duplicate vector is acceptable. It could also just be my old processor (refurbished windows system purchased from Dell over 5 years ago); modern systems might not have this problem. Does anyone have a competing theory as to why pure radix sorting is so much faster for floating-point numbers on an old Cygwin system?