
On Mon, Jul 6, 2009 at 8:37 AM, Vladimir Prus <vladimir@codesourcery.com>wrote:
With 23-bit random data, all
the exponents are the same. With 32-bit random data, there will be 512
different values for the exponents,
Are you sure? 32-bit IEEE format has 8 bits for the exponent.
+1 for sign.
splitting the identical-exponent problems much smaller, to the point where they are close to the size of the cache (in-cache sorting is extremely fast). With realistic data, there will be multiple exponents, but not as many as there are data elements. More likely, there will be a small range of exponents, and for each of them, many elements with the same exponent. Also, a denial-of-service attack could easily be made against a system that sort floats by just feeding it floats with the same exponent, but different coefficients. When I test with 26-bit random data, where there are 8 different exponent values (a more realistic case), here are my results: sorting floats as integers (my approach): 1.473s std::sort 21.511s (I used randomgen 10 16 10000000 to generate the input data)
So not 120X, but still a substantial difference.
Also worth noting is that when the element count increases, the speed difference increases, even on 32-bit random data. randomgen 16 16 200000000:
What real-world problems require sorting such huge arrays of floats?
Circuit layouts, if someone wants to sort them for various purposes. They are gigabytes in size, so they take a long time to sort unless casting to integer is done. Yes, I guess I was using denormalized numbers for the 23-bit case, but it's interesting that there is such a huge speed difference, and I doubt it's because they're denormalized; there is no reason to special-case denormalized sorting. Notably, normalized numbers also show the same problem if their exponent range is small relative to the number of elements; this can be achieved by using a small number of different exponents, or simply a large number of floats (as shown with my large test). What do you consider a reasonable range? I can easily make a test that excludes denormalized numbers. Even if the full exponent range is in use (which isn't realistic), wouldn't it be better to use an algorithm that doesn't blow up when n gets much larger than 10 million elements, and is still faster with a smaller number of elements?