
On Sun, 08 Mar 2009 22:36:16 +0000, "Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote:
So you are doing all of the partitioning in the main thread, before starting any other threads?
Have you profiled that?
Yes I know this is suboptimal, see below.
If that is sufficiently large, the performance of the threadpool queue (etc) will not matter as much.
I suggest that you also tweak your benchmarking so that it runs for a *lot* longer than 30 milliseconds.
Yes I know, I have done benchmarks on larger data. To simplify: - std::sort - x 1 - boost::tp::sort - x 2.5 - tbb::sort - x 4 I had some ideas this morning to improve my existing implementation. As you said the partitionning phase it not parallelized at all, which is a shame really and would account for the huge difference. I guess you can simply start a rec_pivot_partition in a thread without waiting for it, in which case there is no need for the sort in a different thread. As for the block size, I guess the ideal size will change from platform to platform. -- EA