
On Tue, Dec 16, 2008 at 3:42 PM, Scott McMurray <me22.ca+boost@gmail.com>wrote:
As for the bit patterns, I can't think of any bit patterns that cannot be created by legal <<ing and ++ing, so I'm not convinced that invalid integer bit patterns exist (at least in unsigned types -- I don't know what ones-complement signed ints do with ~0, for example).
An IEEE floating-point number maps nicely to a signed int of the same size for the purposes of sorting. The main issue is that negative floats increase in the opposite order from the integer with the same bits, so I sort them in reverse (if you look at the source). Positive floats sort in the same order as the positive integer with the same bit representation. -0 ends up being less than 0 this way, though for std::sort they are equal. NANs sort in a definite order during the radix-sorting portion, unlike std::sort which dumps them arbitrarily throughout the resulting file. I assume that developers won't mind these discrepancies for -0 and NANs. This trick is commonly used by radix sorting implementations, and is especially valuable because floating-point compares are inefficiently implemented on some platforms. I should note that I named the operation "casting_float_sort" because I was concerned that not all theoretical platforms may support the cast. If people are concerned about sorting some non-IEEE floating-point number, they can use float_sort, which requires the user to specify the type they cast the float to, or the functor versions, where the user defines the >> operation (where the cast is). All versions of float_sort still assume that negatives need to be sorted in reverse order; it's a method specialized to sorting IEEE floating_point numbers as integers. Anything that can be sorted in a normal fashion based upon an integer return from the >> operator (or equivalent functor) can be sorted by integer_sort.
That said, how is it used in practise? Aliasing allows reading through an unsigned char*, the size of which is always a factor of the size of any other type, so would it be better to implement everything to always go through that? I know naïve radix sorts will often go CHAR_BIT bits at a time to take advantage of that, but I figure the better implementation may well need something more complex.
I could do that, but I would often need one more iteration than is strictly necessary, and I wouldn't be able to reuse integer_sort to sort the positive floats. I have no reason to believe that going 8 bits at a time would be any faster, unless it can save me a memory operation to work around a casting restriction that isn't enforced on all (any?) platforms.