
Please could you be more precise, which kind of hidden system locks?
False sharing, see my other answer. Classic problem.
I'm working on a Asynchronous Execution framework that you can get from http://www.boostpro.com/vault/index.php?action=downloadfile&filename=in terthreads.zip&directory=Concurrent%20Programming&. It provides a wait_for_all function which will fork each function except the last one which will be executed in the current thread. So if you make 4 partitions you need to use it as
wait_for_all(ae, bind(inplace_solve, at_c<0>(partition)), bind(inplace_solve, at_c<1>(partition)), bind(inplace_solve, at_c<2>(partition)), bind(inplace_solve, at_c<3>(partition)), );
I'll try to implement this overloading.
template< typename AE, typename F, typename Sequence> typename result_of::wait_for_all<AE, F,Sequence >::type wait_for_all( AE& ae, F f, Sequence seq );
This opens up many possibilities... Thanks I will have a look, but I need to implement parallel_merge first. ;) With your library, it seems that slicing the input and scheduling the threads will be pretty straightforward. -- EA