General design question about threading/concurrency

Hello, I am writing a program that needs to run N simulations with N >10 000. Obviously, I cannot instantiate N threads at the same time but I cannot run the simulations sequentially either. Can anyone please advise me how to design my program using the boost threading library and concepts please? Thanks in advance, Julien.

On 24 Oct 2010, at 22:44, Julien Martin wrote:
Hi Julien, if you have an 8-core machine you could, e.g. create 8 threads and always run 8 simulations to an end. If the simulations are independent I would rather suggest that you use Boost.MPI and distribute it on a cluster. A 10000-core cluster can then allow you to run all 10000 simulations simultaneously. We are routinely doing that with several thousand simulations. Matthias

I'm not aware of any task management stuff that comes with the boost libraries. boost::thread is pretty much a straightforward wrapper to lower-level threading concepts. I would usually build up some kind of task queue/stack that is either locking or concurrent and non-locking that manages the tasks. Then you can start M threads where M is the number of cores on your machine. These threads can then "steal" the work from the task queue, and/or add new tasks to the task queue until the tasks are finished. TBB (intel thread building blocks) has some of this already in place to make it easier to write these kinds of programs, but I typically roll my own and use boost threads directly as I have seen better performance. Brian On Sun, Oct 24, 2010 at 1:44 PM, Julien Martin <balteo@gmail.com> wrote:

FWIW, Boost.Thread has now futures: http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#th... and there's ASIO, which already provides a kind of queue that you're talking about.

Hi Julien, On Oct 24, 2010, at 2:44 PM, Julien Martin wrote:
I am writing a program that needs to run N simulations with N >10 000.
This sounds a lot like Monte Carlo. Are each of the N simulations essentially independent of each other, until perhaps all N complete and then some additional processing occurs? If so, then instead of using threads,
Obviously, I cannot instantiate N threads at the same time but I cannot run the simulations sequentially either. Can anyone please advise me how to design my program using the boost threading library and concepts please?
I'd use MPI if you can, but I don't know enough about your application domain. -- Noel

Thanks all for your replies, Yes. It is a Monte-Carlo simulation I am trying to build up with all simulations being independent of each other. I am going to look into all the boost libraries you advised especially the MPI one. I'll post further questions if required. Regards, Julien. 2010/10/25 Belcourt, Kenneth <kbelco@sandia.gov>

Hello Noel, Yes, I am indeed running a monte carlo simulation which simulates N paths and then, when all are finished that takes action based upon the results. What class, method or concepts should I look for in MPI please? Thanks, J. 2010/10/25 Belcourt, Kenneth <kbelco@sandia.gov>

Hi Julien, On Oct 25, 2010, at 4:55 AM, Julien Martin wrote:
So are you new to MPI and Threading? If you don't have at least some experience with parallel code development you could be jumping into the deep end (but don't let me discourage you)! Here's some random questions you might consider before adopting a particular approach. How long does each independent simulation run (seconds, minutes, hours)? Is there a throughput or turn around time requirement? What kinds of hardware are you going to run on (smp dual, quad, hex)? Do you have access to other compute machines on the network where you'll run? If distributed machines are available, can you ssh into them, are the machines homogeneous or heterogenous, do they have shared file systems? -- Noel

To Noel, I use a single 4-core machine for now but could experiment a cluster with my second machine. The two machines are heterogeneous. No shared file system... I used parallel_for with success but don't know yet how to tune TBB. I am going to read the TBB docs. I am going to have a detailed look at the code you provided too. To Matthias, What you say about clustering is interesting. I might try that actually. Thanks, J. 2010/10/25 Belcourt, K. Noel <kbelco@sandia.gov>

On Oct 24, 2010, at 2:44 PM, Julien Martin wrote:
I am writing a program that needs to run N simulations with N >10
On Sunday, October 24, 2010 6:08 PM, Belcourt, Kenneth wrote: 000.
cannot
I know this isn't a boost solution, but have you looked into Intel's Threading Building Blocks? (found at http://www.threadingbuildingblocks.org/ )

If u stay on the same computer (i.e. plain x86 PC with multi cores or multi processors), I would recommend Intel's TBB, which splits up tasks automatically and tries to keep the processor busy. It would be nice if Boost / std C++ would have such a parallellism library because it is an addition to using threads.

Hi Julien, On Oct 25, 2010, at 4:53 AM, Julien Martin wrote:
Start with the Intel TBB book, it's quite good and walks you through the product in fairly detailed fashion. In our application we do this. In a top-level routine (like main). empty_task *root = 0; task_scheduler_init *init = 0; if (1 < n_tasks) { init = new task_scheduler_init(); root = new (tbb::task::allocate_root()) empty_task; root->set_ref_count(n_tasks+1); } We're telling TBB thread manager how many TBB::threads we expect to create (so n_tasks is the number of smp cores). Next we round up the number of realizations we're going to run to load balance equally across the system. The number of epistemic realizations is your N. if (1 < n_tasks && n_epistemic_realizations % n_tasks) { n_epistemic_realizations += n_tasks - (n_epistemic_realizations % n_tasks); } Our epistemic class inherits from tbb::thread so this code constructs a thread for each core (task). unsigned int n = n_epistemic_realizations / n_tasks; epistemic* ep = new (root->allocate_child()) epistemic(n); For 8 core blade (n_tasks = 8) with 10^4 realizations (N = 10000), n is 1250. Note that depending how long your discrete simulations run, you may eventually want to use hybrid parallelism (MPI to leverage resources on your networks and threads for each local smp machine, though you can also use just straight MPI to manage both within and cross box parallelism. -- Noel
participants (9)
-
Andrew Holden
-
Belcourt, K. Noel
-
Belcourt, Kenneth
-
Brian Budge
-
gast128
-
Igor R
-
Julien Martin
-
Matthias Troyer
-
Ray Burkholder