Rob Stewart: I've incorporated your suggestion about how to handle the spread_sort wrapper and resolved the code issues you pointed to and some of the documentation ones (thanks). I'll leave the invisible doc formatting issues (partially due to the 3 year wait for review) alone until after this has been reviewed. Are you or someone else willing to formally review this? I've put it up on git with the fixes: https://github.com/spreadsort/algorithm_sorting Note that: integer_sort works on anything with a >> and a - operator that works on the result of the >>, or equivalent functors. float_sort works on anfything with an IEEE floating-point number key that can be cast to an integer type for sorting, as long as sign is handled correctly (float, double). string_sort works on any array of individual elements, where the individual elements sort as integers (such as string or wstring), or functors are provided with equivalent behavior. On Thu, Jul 25, 2013 at 9:43 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
On 25 July 2013 20:15, Steven Ross
wrote: Actually, RAND_MAX = INT_MAX = (1 << 31) - 1 on my system (glibc-2.15). So how do I generate random 64-bit numbers... (rand() << 32) + rand()? Bah, I should just be using Boost.Random!
Sounds good.
Yes, I understand that's the assumption that bucket sort works on. So I guess that's also an argument for having multiple, different implementations. I'm keen to do a comprehensive comparative performance test, all the algorithms across different distributions, but it will take some time.
You might want to look at the example suite my tune.pl script runs (libs/algorithm/sorting/tune.pl). It tests a bunch of interesting distributions, including various types of random data, the ALRbreaker distribution (which shows the danger of conventional radix sorts), sorted data, and mostly sorted data.
I've also been thinking that the LSD radix-sort can be optimized (especially for large-width integers) where k <= (k_max >> 1). That is, where some of the most-significant digits are not being used, the number of column/digit sort passes with counting-sort can potentially be reduced. Anyway, maybe in version 2.0. :)
spread_sort has that optimization (it's useful for making the worst-case
condition hard to construct/much rarer). All you need to do is find the maximum unsigned value in a quick linear pass through the data, and you can ignore all bits higher than that value.