I've added fasterbzip to the develop doc directory: https://github.com/spreadsort/sort/blob/develop/doc/fasterbzip2-1.0.5.zip, but haven't referenced it from any documentation. bzip has a bizarre, highly optimized sort, and stuffing the Spreadsort algorithm into it was non-trivial. Still, I see substantial speedup on data sets with high compression levels (and little speedup on random or near-random data that doesn't compress much). Antony,
Implementation is not tuned.
For example, take a look here
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/spr...
and here
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/spr...
. Memory allocations occur at those points, however they could be easily avoided. Here's a typical use case of `size_bins` function:
https://github.com/spreadsort/sort/blob/master/include/boost/sort/detail/str... .
Using a stack based buffer would significantly improve sort times on small datasets.
The develop branch now has the change you suggested, and it sped up sort times (with the minimum threshold disabled) on 700 element arrays by 37%, making 700 elements the new crossover point. I'm going to test the impact on large arrays more, but so far I see no measurable impact. I could try using the tricks used in "Engineering Radix Sort" to logarithmically bound the bin_cache size, add additional stack when it is needed via co-recursion, and put the bin_cache on the stack (or I could just allocate the bin_cache on the stack every iteration). As int_min_split_count (min s) is tuned to 9, I doubt that I'll be able to get the crossover point below around 500 elements. Do you think bringing the crossover point down to 500 or so is worth significant additional code complexity or stack memory usage to avoid heap allocation completely?