
Thanks Phil and Vladimir for pointing out the denormalization issue. I have not seen this issue on PowerPC, which was my main test system for sorting for years, and I still don't see why the processor treats them differently, but it clearly does. The performance impact of denormalized numbers is so huge that it doubles the runtime of std::sort on a large list of completely random bits being sorted (200 million), even though only 1 in 256 is denormalized, but normally people won't use them, so they're not a realistic case. Given that input, I'm going to scrap my sort-as-integer approach, and stick with the cast-to-integer approach already in my float_sort implementation, using no additional memory overhead. I'll also do my testing with sse optimizations, and make sure to denormalize my float_sort tests so that performance results are more realistic. I expect it to drop to a roughly 2X improvement, which is about what the other algorithms (integer_sort, string_sort) get. I apologize for being stubborn on this issue.