Re: [boost] [parallel_sort] Proposal

2 Feb 2009

      On Mon, 02 Feb 2009 09:57:40 +0000, "Phil Endecott"
<spam_from_boost_dev@chezphil.org> wrote:
...
...
Current benchmarks on my Q6600 vs2008-64-bit default allocator:
2 threads : 160 % faster
4 threads : 260 % faster
So you're getting a super-linear speedup going from 1 to 2 processors?
That seems odd.
What size input does the benchmark relate to?  I'm curious to know
whether you still get a worthwhile speedup for relatively small inputs.
No sorry, this is wrongly said, I get factor 1.6 and 2.6. I meant 160 %
speed, not 160 % faster.

For small input it's very simple, anything under 1000 is not parallelized
because of the thread overhead cost.

This would require some precise benchmarking to find the right value.
...
...
It can be heavily customized. I already offer the possibility to choose
between quick sorting and merge sorting, and for quick sorting I offer
two
pivot algorithms : fast or secure. Then you can of course specify a
predicate and a fallback sorting algorithm for when you run out of
threads.
How about run-time selection between quicksort and mergesort, in the
style of introsort?  This seems hard to beat in uniprocessor
applications.
Not really needed for merge_sort, there is no problematic behaviour like
quicksort. The
reason why you would want to use quicksort is that if the extra memory
allocation is an issue.
...
...
Nevertheless, I think it's time to open up the code and start getting
feedback.
Yes please.  You haven't said anything about how it actually works...
Threads are specified as a template parameter. There are finite. If you
wish to have a mutlithreaded version that uses 2 threads you do

parallel_sort<2>(v.begin(), v.end());

4 threads ?

parallel_sort<4>(v.begin(), v.end());

What happens then is that you have a template recursive call that generates
the right amount of threading through recursion. That means apart from
thread creation and termination you have little overhead. 

For example, for 4 threads you have :

parallel_sort<4> -> parallel_sort<2> -> parallel_sort<1>
                                     -> parallel_sort<1>
                 -> parallel_sort<2> -> parallel_sort<1>
                                     -> parallel_sort<1>

This code is generated at compile time.

The one thread version use the fallback sorter which is by default
std::sort.

The default > 1 thread version merges the results of the underlying calls
(there is a version with quicksort).

Cheers.

-- 

EA

Re: [boost] [parallel_sort] Proposal

Edouard A.