General design question about threading/concurrency

newer
Re: [Boost-users] General design...

Julien Martin

24 Oct 2010 24 Oct '10

8:44 p.m.

Hello, I am writing a program that needs to run N simulations with N >10 000. Obviously, I cannot instantiate N threads at the same time but I cannot run the simulations sequentially either. Can anyone please advise me how to design my program using the boost threading library and concepts please? Thanks in advance, Julien.

Attachments:

attachment.html (text/html — 349 bytes)

Show replies by date

Matthias Troyer

24 Oct 24 Oct

8:57 p.m.

New subject: General design question about threading/concurrency

On 24 Oct 2010, at 22:44, Julien Martin wrote:

...

Hi Julien, if you have an 8-core machine you could, e.g. create 8 threads and always run 8 simulations to an end. If the simulations are independent I would rather suggest that you use Boost.MPI and distribute it on a cluster. A 10000-core cluster can then allow you to run all 10000 simulations simultaneously. We are routinely doing that with several thousand simulations. Matthias

Brian Budge

9 p.m.

New subject: General design question about threading/concurrency

I'm not aware of any task management stuff that comes with the boost libraries. boost::thread is pretty much a straightforward wrapper to lower-level threading concepts. I would usually build up some kind of task queue/stack that is either locking or concurrent and non-locking that manages the tasks. Then you can start M threads where M is the number of cores on your machine. These threads can then "steal" the work from the task queue, and/or add new tasks to the task queue until the tasks are finished. TBB (intel thread building blocks) has some of this already in place to make it easier to write these kinds of programs, but I typically roll my own and use boost threads directly as I have seen better performance. Brian On Sun, Oct 24, 2010 at 1:44 PM, Julien Martin <balteo@gmail.com> wrote:

...

Igor R

9:09 p.m.

New subject: General design question about threading/concurrency

...

FWIW, Boost.Thread has now futures: http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#th... and there's ASIO, which already provides a kind of queue that you're talking about.

Ray Burkholder

9:10 p.m.

New subject: General design question about threading/concurrency

...

Try boost::asio, it has the concept of a server and of reusable worker threads. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.

Belcourt, Kenneth

10:08 p.m.

Hi Julien, On Oct 24, 2010, at 2:44 PM, Julien Martin wrote:

...

I am writing a program that needs to run N simulations with N >10 000.

This sounds a lot like Monte Carlo. Are each of the N simulations essentially independent of each other, until perhaps all N complete and then some additional processing occurs? If so, then instead of using threads,

...

Obviously, I cannot instantiate N threads at the same time but I cannot run the simulations sequentially either. Can anyone please advise me how to design my program using the boost threading library and concepts please?

I'd use MPI if you can, but I don't know enough about your application domain. -- Noel

Julien Martin

25 Oct 25 Oct

8:53 a.m.

New subject: General design question about threading/concurrency

Thanks all for your replies, Yes. It is a Monte-Carlo simulation I am trying to build up with all simulations being independent of each other. I am going to look into all the boost libraries you advised especially the MPI one. I'll post further questions if required. Regards, Julien. 2010/10/25 Belcourt, Kenneth <kbelco@sandia.gov>

...

Julien Martin

10:55 a.m.

New subject: General design question about threading/concurrency

Hello Noel, Yes, I am indeed running a monte carlo simulation which simulates N paths and then, when all are finished that takes action based upon the results. What class, method or concepts should I look for in MPI please? Thanks, J. 2010/10/25 Belcourt, Kenneth <kbelco@sandia.gov>

...

Belcourt, K. Noel

4:03 p.m.

Hi Julien, On Oct 25, 2010, at 4:55 AM, Julien Martin wrote:

...

So are you new to MPI and Threading? If you don't have at least some experience with parallel code development you could be jumping into the deep end (but don't let me discourage you)! Here's some random questions you might consider before adopting a particular approach. How long does each independent simulation run (seconds, minutes, hours)? Is there a throughput or turn around time requirement? What kinds of hardware are you going to run on (smp dual, quad, hex)? Do you have access to other compute machines on the network where you'll run? If distributed machines are available, can you ssh into them, are the machines homogeneous or heterogenous, do they have shared file systems? -- Noel

Julien Martin

26 Oct 26 Oct

4:13 p.m.

New subject: General design question about threading/concurrency

To Noel, I use a single 4-core machine for now but could experiment a cluster with my second machine. The two machines are heterogeneous. No shared file system... I used parallel_for with success but don't know yet how to tune TBB. I am going to read the TBB docs. I am going to have a detailed look at the code you provided too. To Matthias, What you say about clustering is interesting. I might try that actually. Thanks, J. 2010/10/25 Belcourt, K. Noel <kbelco@sandia.gov>

...

Andrew Holden

25 Oct 25 Oct

2:24 p.m.

New subject: General design question about threading/concurrency

...

On Oct 24, 2010, at 2:44 PM, Julien Martin wrote:

...
I am writing a program that needs to run N simulations with N >10

On Sunday, October 24, 2010 6:08 PM, Belcourt, Kenneth wrote: 000.

...

This sounds a lot like Monte Carlo. Are each of the N simulations essentially independent of each other, until perhaps all N complete and then some additional processing occurs? If so, then instead of using threads,

...
Obviously, I cannot instantiate N threads at the same time but I

cannot

...

...
run the simulations sequentially either. Can anyone please advise me how to design my program using the boost threading library and concepts please?

I'd use MPI if you can, but I don't know enough about your application domain.

I know this isn't a boost solution, but have you looked into Intel's Threading Building Blocks? (found at http://www.threadingbuildingblocks.org/ )

gast128

10:29 a.m.

New subject: General design question about threading/concurrency

If u stay on the same computer (i.e. plain x86 PC with multi cores or multi processors), I would recommend Intel's TBB, which splits up tasks automatically and tries to keep the processor busy. It would be nice if Boost / std C++ would have such a parallellism library because it is an addition to using threads.

Julien Martin

10:53 a.m.

New subject: General design question about threading/concurrency

Thanks gast128, I just installed TBB. Can you please tell me what to look for (which class, method or concept) in TBB? It is pretty vast... J. 2010/10/25 gast128 <gast128@hotmail.com>

...

gast128

12:03 p.m.

New subject: [Boost-users] General design question about threading/concurrency

Julien Martin <balteo <at> gmail.com> writes:

...

Hello, tbb::parallel_for is probably where u will be interested in if the individual tasks are truely independent.

Julien Martin

2:24 p.m.

New subject: General design question about threading/concurrency

Ok. I will try that! Thank you, J. 2010/10/25 gast128 <gast128@hotmail.com>

...

Belcourt, K. Noel

3:24 p.m.

Hi Julien, On Oct 25, 2010, at 4:53 AM, Julien Martin wrote:

...

Start with the Intel TBB book, it's quite good and walks you through the product in fairly detailed fashion. In our application we do this. In a top-level routine (like main). empty_task *root = 0; task_scheduler_init *init = 0; if (1 < n_tasks) { init = new task_scheduler_init(); root = new (tbb::task::allocate_root()) empty_task; root->set_ref_count(n_tasks+1); } We're telling TBB thread manager how many TBB::threads we expect to create (so n_tasks is the number of smp cores). Next we round up the number of realizations we're going to run to load balance equally across the system. The number of epistemic realizations is your N. if (1 < n_tasks && n_epistemic_realizations % n_tasks) { n_epistemic_realizations += n_tasks - (n_epistemic_realizations % n_tasks); } Our epistemic class inherits from tbb::thread so this code constructs a thread for each core (task). unsigned int n = n_epistemic_realizations / n_tasks; epistemic* ep = new (root->allocate_child()) epistemic(n); For 8 core blade (n_tasks = 8) with 10^4 realizations (N = 10000), n is 1250. Note that depending how long your discrete simulations run, you may eventually want to use hybrid parallelism (MPI to leverage resources on your networks and threads for each local smp machine, though you can also use just straight MPI to manage both within and cross box parallelism. -- Noel

5389

Age (days ago)

5391

Last active (days ago)

List overview

Download

15 comments

9 participants

participants (9)

Andrew Holden
Belcourt, K. Noel
Belcourt, Kenneth
Brian Budge
gast128
Igor R
Julien Martin
Matthias Troyer
Ray Burkholder