
I have implemented a parallel_sort that is a bit slower than tbb with threadpool (see below for the code).
Am Sunday 08 March 2009 19:48:11 schrieb Edouard A.: threadpool uses a signle-lock global queue - if a thread enters the queue for enqueing/dequeing a task it aquires the lock so no other thread can enter the queue. with a lock-free implementation it should work faster.
This is quite possible. I'm looking forward to test it with a lock free queue. I've also seen a lots of .lock() .unlock() in the code... Sometimes holding the lock is faster than releasing it and acquiring it again.
pool::shutdown() is not correct - use boost::wait_for_all() or boost::wait_for_any() from the future lib from Anthony WIlliams (I've added the lib to threadpool archive).
task< int > tsk1 = pool.submit(...); task< string > tsk2 = pool.submit(...); task< int > tsk3 = pool.submit(...);
wait_for_all( tsk1.result(), tsk2.result(), tsk3.result()); // here all tasks are finsihed
I'm not sure it's a good idea to force the user to collect the tasks and wait for them. I thought threadpool could abstract that out. I mean, it's nice to have the alternative to wait for specific tasks, but generally you just throw in work into the pool and want that work to be finished... Perhaps a task_group object could solve this problem? You could also "link" tasks. Ie, you need depending tasks to be finished when you wait on one. Anyway, with wait_for_all, it's clearly slower than with shutdown(). Not really surprising as you go through more abstraction layers and you have to store the tasks' results in a structure. IMHO, the fastest way would probably to have a condition_variable bound to the number of running tasks in the pool itself. With wait_for_all: std::fill 0..1000000 ok : 1 elapsed: 0.01 tbb::parallel_for fill 0..1000000 ok : 1 elapsed: 0.01 std::sort reverse 0..1000000 ok : 1 elapsed: 0.111 tbb::parallel_sort reverse 0..1000000 ok : 1 elapsed: 0.038 boost::tp::sort reverse 0..1000000 ok : 1 elapsed: 0.055 Regards. -- EA