Re: [boost] [sort] Parallel sorting sub-library mini-review. Performance tests.

22 Nov 2016


      On Sun, Nov 20, 2016 at 3:24 PM Christophe Henry <
christophe.j.henry@gmail.com> wrote:
...
I did not manage yet to write a full review. One week is a bit short so a
few more days will be necessary. I provide some performance tests first.
Thanks Christophe.  We will be interested in incorporating your feedback to
improve the library.
...
OTOH the single-threaded ones are interesting, especially the stable_sort
and intro_sort for cases where usage of spreadsort is not possible.
I challenge you to find a sorting problem where it isn't possible to use
spreadsort.  string_sort can sort anything that std::sort can; you just
need to convert the sort key comparison sequence into a sequence of bytes
and feed string_sort the correct functors.  That said, it may not be
efficient to sort some problems with string_sort, and it'll probably be too
much work to be worth the effort for many problems.
...
A short summary:
100000000 uint64_t elements already sorted:
Boost parallel sort                      : 0.064965 secs
Asynchronous parallel sort           : 0.0167134 secs (same with other
algorithms)
Asynchronous provides a special optimization for this case.
If this optimization is practical to incorporate in Boost parallel sort, it
is a common case and your solution seems noticeably faster, so I would like
Francisco to consider it.  That said, .06s isn't bad.
...
I added this one:
100000000 uint64_t elements reverse sorted
Boost parallel sort                   : 0.581381 secs
Asynchronous parallel sort          : 0.0478701 secs
Asynchronous provides a special optimization for this case.
I this the library should do it too. This one is pretty common and a
popular DOS attack.
This optimization would be nice to have, if it's cheap.
...
100000000 uint64_t elements randomly filled
Asynchronous parallel quickspreadsort: 0.587432 secs
Asynchronous quick_intro_sort        : 0.728393 secs
This mixing of quicksort degrading into a spreadsort works here best.
I find this algorithm interesting; have you considered using MSD
radix-based sorting to break up the data for the threads in parallel?  For
example, take T threads, have each thread take 1/T of the data and bucket
it into N MSD-based buckets (just like Spreadsort), where T < N <= ~2^11,
then merge the equivalent buckets, and as you merge them hand them off to
another thread to sort.  This will probably divide better if you find the
max and min before bucketing using a parallel min/max search.
...
10000000 strings randomly filled
OMP parallel sort                    : 1.05803 secs
Boost parallel sort                  : 0.933055 secs
OMP parallel stable sort             : 1.12269 secs
Boost sample sort                    : 0.889216 secs
Boost parallel stable sort           : 1.56564 secs
Asynchronous parallel quickspreadsort: 0.788856 secs
Asynchronous quick_intro_sort            : 0.893652 secs
Asynchronous parallel stable sort    : 1.23495 secs
Asynchronous boost::stable sort      : 1.21817 secs
Similar results.
Let's move on to big objects:
1562500 elements of size 512 randomly filled
H E A V Y   C O M P A R I S O N
OMP parallel sort                    : 0.308923 secs
Boost parallel sort                  : 0.342992 secs
Asynchronous parallel_quicksort      : 0.246709 secs
Asynchronous quick_intro_sort        : 0.269666 secs
Both quicksorts are best, with a slight advantage to intro_sort.
The light comparison is similar.
Do we have your permission to copy and/or adapt your quick_intro_sort
algorithm into the Boost.Sort library?  How is the worst-case performance
relative to Francisco's library?

Re: [boost] [sort] Parallel sorting sub-library mini-review. Performance tests.

Steven Ross