
I've been investigating why float_sort is so much faster than std::sort on x86 processors, and why sorting integers is so much faster than float_sort, and came to the conclusion that floating-point comparisons are extremely slow on x86. Based upon that, I've written a different sorting approach that copies the input array of floats, casting them to integers, and sorts them as integers, dealing with the negative floats sorting in reverse order issue, completely eliminating floating-point comparisons. It takes O(n) additional memory, and is somewhat less general, as a different approach is required for key plus data sorting (as non-float data can't be safely cast this way), but it obtains integer-sorting speeds. On some problems it is as much as 120X as fast as std::sort. I doubt average results will be quite that good, but they could easily be well over 10X as fast. Does this sound like it's appropriate for Boost? If so, I'll add it to my proposed Boost.Algorithm.Sorting library, which is complete, aside from this issue. Example source (not templated yet): #define DATATYPE float #define CAST_TYPE int inline void float_as_integer_sort(std::vector<DATATYPE> &input) { if(input.size() < 2) return; std::vector<CAST_TYPE> cast_vec; cast_vec.resize(input.size()); unsigned neg_index = 0; int pos_index = input.size() - 1; //Casting to integer and presorting into negative and positive for(unsigned u = 0; u < input.size(); ++u) { int cast_val = float_mem_cast<DATATYPE, CAST_TYPE>(input[u]); if(cast_val < 0) cast_vec[neg_index++] = cast_val; else cast_vec[pos_index--] = cast_val; } assert((pos_index + 1) == neg_index); //Sort the negatives integer_sort(&(cast_vec[0]), &(cast_vec[neg_index])); //Then the positives integer_sort(&(cast_vec[neg_index]), &(cast_vec[0]) + cast_vec.size()); //casting back negatives for(unsigned u = 0; u < neg_index; ++u) std::memcpy(&(input[neg_index - u - 1]), &(cast_vec[u]), sizeof(DATATYPE)); //Then positives for(unsigned u = neg_index; u < cast_vec.size(); ++u) std::memcpy(&(input[u]), &(cast_vec[u]), sizeof(DATATYPE)); }