
----- Original Message ----- From: "Edouard A." <edouard@fausse.info> To: <boost@lists.boost.org> Sent: Monday, February 02, 2009 8:41 PM Subject: Re: [boost] [parallel_sort] Proposal
It seems fundamentally runtime to me, since different machines or just different runs will want different levels of concurrency. The overhead ought to just be a few compares and arithmetic operations, which would be swamped by the effort involved in the sorting.
I think normally I'd want to just use parallel_sort(b, e, thread::hardware_concurrency())
Maybe you're right. That's not a major design change to make it runtime, I can easily give it a try. We spend most of our time sorting, making an additional check is no big deal.
As for TBB, I did a quick benchmark.
As expected, TBB is faster, but not a lot.
TBB is roughly 3 times faster than std::sort. Again, I didn't expect to get a x 4 gain because of the design of the Q6600 CPU.
My implementation is 2.6 times faster than std::sort.
Although I don't slice the data, my synchronization mechanisms are extremely simple, which explain why the difference isn't very big (15% advantage for tbb).
I have however noticed that the behavior on sorted input is not very satisfactory on my side, probably for the reasons described by Phil. tbb::parallel_sort behavior is constant from what I can see.
I'm going to:
* write a parallel_merge * see what I can do to optimize memory allocations * think about the slicing issue * probably play left4dead instead of all the above :(
Hi, There is also Boost.ThreadPool on the queue schedule. In case this can help you? Vicente