Re: [boost] radix sort

26 Jul 2013

      Rob Stewart:
I've incorporated your suggestion about how to handle the spread_sort
wrapper and resolved the code issues you pointed to and some of the
documentation ones (thanks).  I'll leave the invisible doc formatting
issues (partially due to the 3 year wait for review) alone until after this
has  been reviewed.  Are you or someone else willing to formally review
this?
I've put it up on git with the fixes:
https://github.com/spreadsort/algorithm_sorting

Note that:
integer_sort works on anything with a >> and a - operator that works on the
result of the >>, or equivalent functors.
float_sort works on anfything with an IEEE floating-point number key that
can be cast to an integer type for sorting, as long as sign is handled
correctly (float, double).
string_sort works on any array of individual elements, where the individual
elements sort as integers (such as string or wstring), or functors are
provided with equivalent behavior.

On Thu, Jul 25, 2013 at 9:43 PM, Jeremy Murphy <
jeremy.william.murphy@gmail.com> wrote:
...
On 25 July 2013 20:15, Steven Ross <spreadsort@gmail.com> wrote:
Actually, RAND_MAX = INT_MAX = (1 << 31) - 1 on my system (glibc-2.15).  So
how do I generate random 64-bit numbers... (rand() << 32) + rand()?  Bah, I
should just be using Boost.Random!
Sounds good.
...
Yes, I understand that's the assumption that bucket sort works on.  So I
guess that's also an argument for having multiple, different
implementations.  I'm keen to do a comprehensive comparative performance
test, all the algorithms across different distributions, but it will take
some time.
You might want to look at the example suite my tune.pl script runs
(libs/algorithm/sorting/tune.pl).  It tests a bunch of interesting
distributions, including various types of random data, the ALRbreaker
distribution (which shows the danger of conventional radix sorts), sorted
data, and mostly sorted data.
...
I've also been thinking that the LSD radix-sort can be optimized
(especially for large-width integers) where k <= (k_max >> 1).  That is,
where some of the most-significant digits are not being used, the number of
column/digit sort passes with counting-sort can potentially be reduced.
 Anyway, maybe in version 2.0.  :)
spread_sort has that optimization (it's useful for making the worst-case
condition hard to construct/much rarer).  All you need to do is find the
maximum unsigned value in a quick linear pass through the data, and you can
ignore all bits higher than that value.