Re: [boost] sorting floats by casting to integer

6 Jul 2009


      Steven Ross wrote:
...
On Sun, Jul 5, 2009 at 9:17 AM, Phil Endecott <
spam_from_boost_dev@chezphil.org> wrote:
...
If the problem were simply that "x86 FP is very slow", then this problem
would be evident with std::sort as well as your algorithm.  I've therefore
done a simple test with a program like the following:
Agreed Phil.  I do see it with std::sort.  Also Vladimir also saw a large
runtime difference with 23-bit random data, though not quite as large.
...
Note that I haven't actually checked if I get the right answers from this,
so please let me know if you can see any mistakes!  Also note that I have
not done any fixup for negative floats, so some small additional time would
be needed for that.
I've run variations of this code with the following results:
vector<int>, std::sort(begin,end) : 3.71 s
vector<float>, std::sort(begin,end) : 9.12 s
vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.19 s
My results, with 32-bit random data, which is what you appear to be using
(you did not provide your random_fill function; I used randomgen 16 16
10000000 to get these results):
vector<float>, sorting as integers (my approach) : 1.417s and data is
correctly sorted
vector<float>, std::sort(begin, end): 2.793s and data is correctly sorted
vector<float>, std::sort(begin,end,compare_floats_as_ints) : 4.291s and data
is not correctly sorted (because negatives aren't flipped)
I set NaNs (I get 3975 of them) to 0.0 in my testing so as to have a unique
correct sorted output.
I'm using a Core 2 Duo.
I think I've figured out what's going on with my 23-bit random data: floats
on x86 have some form of optimization to look at the exponent first, and
then the coefficient afterwards.  On x86 systems, the comparison is
extremely slow if the exponents are the same.
Same, and equal to what? If your test ends up generating a pile of
denormalized numbers, then it's rather broken regardless of what
processor you are using.
...
With 23-bit random data, all 
the exponents are the same.  With 32-bit random data, there will be 512
different values for the exponents,
Are you sure? 32-bit IEEE format has 8 bits for the exponent.
...
splitting the identical-exponent 
problems much smaller, to the point where they are close to the size of the
cache (in-cache sorting is extremely fast).
With realistic data, there will be multiple exponents, but not as many as
there are data elements.  More likely, there will be a small range of
exponents, and for each of them, many elements with the same exponent.
Also, a denial-of-service attack could easily be made against a system that
sort floats by just feeding it floats with the same exponent, but different
coefficients.
When I test with 26-bit random data, where there are 8 different exponent
values (a more realistic case), here are my results:
sorting floats as integers (my approach): 1.473s
std::sort 21.511s
(I used randomgen 10 16 10000000 to generate the input data)
So not 120X, but still a substantial difference.
Also worth noting is that when the element count increases, the speed
difference increases, even on 32-bit random data.
randomgen 16 16 200000000:
What real-world problems require sorting such huge arrays of floats?
...
I'll test with SSE and other compiler settings next.  Is there an easy way
to add these with bjam?
Because bjam is actually a low-level build engine, it is pretty much impossible
to do anything with it at all. Boost.Build allows you to pass any flags
to compiler you see fit, my previous post gives the actual syntax I have
used.

- Volodya

Re: [boost] sorting floats by casting to integer

Vladimir Prus