Re: [boost] sorting floats by casting to integer

6 Jul 2009

      On Mon, Jul 6, 2009 at 8:37 AM, Vladimir Prus <vladimir@codesourcery.com>wrote:
...
...
With 23-bit random data, all
the exponents are the same.  With 32-bit random data, there will be 512
...
different values for the exponents,
Are you sure? 32-bit IEEE format has 8 bits for the exponent.
+1 for sign.
...
...
splitting the identical-exponent
problems much smaller, to the point where they are close to the size of
the
cache (in-cache sorting is extremely fast).
With realistic data, there will be multiple exponents, but not as many as
there are data elements.  More likely, there will be a small range of
exponents, and for each of them, many elements with the same exponent.
Also, a denial-of-service attack could easily be made against a system
that
sort floats by just feeding it floats with the same exponent, but
different
coefficients.
When I test with 26-bit random data, where there are 8 different exponent
values (a more realistic case), here are my results:
sorting floats as integers (my approach): 1.473s
std::sort 21.511s
(I used randomgen 10 16 10000000 to generate the input data)
So not 120X, but still a substantial difference.
Also worth noting is that when the element count increases, the speed
difference increases, even on 32-bit random data.
randomgen 16 16 200000000:
What real-world problems require sorting such huge arrays of floats?
Circuit layouts, if someone wants to sort them for various purposes.  They
are gigabytes in size, so they take a long time to sort unless casting to
integer is done.

Yes, I guess I was using denormalized numbers for the 23-bit case, but it's
interesting that there is such a huge speed difference, and I doubt it's
because they're denormalized; there is no reason to special-case
denormalized sorting.
Notably, normalized numbers also show the same problem if their exponent
range is small relative to the number of elements; this can be achieved by
using a small number of different exponents, or simply a large number of
floats (as shown with my large test).  What do you consider a reasonable
range?  I can easily make a test that excludes denormalized numbers.

Even if the full exponent range is in use (which isn't realistic), wouldn't
it be better to use an algorithm that doesn't blow up when n gets much
larger than 10 million elements, and is still faster with a smaller number
of elements?

Re: [boost] sorting floats by casting to integer

Steven Ross