[Thread] Beginner question regarding thread groups
Hi, I am considering rewriting part of our ray tracing code to use the Boost thread library. As many of you might know, ray tracing is a task that lends itself well to a parallel approach since each ray is independent from the other. However, the code right now is in Fortran which doesn't support task level parallelism easily ... The thread_group class seems ideally suited to my needs (one thread == one ray) but there is at least one problem I envision that I would like to solve before embarking on this fairly time-consuming rewrite. Just to give you some background, what we are doing is not ray tracing in the visualization/render sense: rather, it is a scientific application used to determine the light emission pattern of certain devices without waveguides (such as LEDs). So keep in mind that not all the conventional ray tracing approaches will work for me. I also come from a physics/engineering background so I am not up to date on all the newer object-oriented techniques and concepts. Now to the heart of the matter: in our approach, each of the initial rays is split into a reflected/refracted ray every time a material interface is reached which generates a secondary ray. Each secondary ray can generate other secondary rays when they hit an interface and so on ... This goes on until the power in each ray has decayed sufficiently (the material is usually absorbing), all rays exit the device or a maximum number of rays has been reached. That means that I do not know ahead of time how many rays (or threads) that I will have. This brings me to my question: can I add new threads to thread group while it is running ? All of the code samples for thread_group seem to call add_thread() on a fixed number of objects before calling join_all(). So it seems that the execution of the thread does not start at the moment it is added to the group but at the moment join_all() is called. In my case, I would have a join_all() first and then periodically add new threads: so this is basically a dynamic queue with new rays being pushed at the end. How can I start these threads automatically if add_thread does not start the execution ? Should I call join() on the new threads before adding them to the thread group instead of using join_all() ? Should I give up on thread_group and store my thread objects in some other container ? Any alternative solutions or a code sample with a dynamically changing thread_group would be appreciated. I am hoping someone has faced a similar problem in another context ... Regards, Michel Lestrade Crosslight Software P.S. Since I am a new user of Boost, I will go ahead and use the latest version (1.36.0). I already built the binaries for my tool set (VS 2008).
On Sun, Aug 24, 2008 at 18:58, Michel Lestrade
I am considering rewriting part of our ray tracing code to use the Boost thread library. As many of you might know, ray tracing is a task that lends itself well to a parallel approach since each ray is independent from the other. However, the code right now is in Fortran which doesn't support task level parallelism easily ... The thread_group class seems ideally suited to my needs (one thread == one ray) but there is at least one problem I envision that I would like to solve before embarking on this fairly time-consuming rewrite.
It's worth pointing out that, as I recall, you don't really want more continually-active threads than, say, twice your number of cores. It would be nice to run one thread per ray, but the switching and OS management overhead will kill you if you attempt to run thousands of threads at once, which I assume you'd need with 1:1. (Erlang, for example, which is conceptually based on many, many communicating processes, uses its own implementation, rather than OS threads of processes.) I think you'd be much better off creating a "ray queue" of some sort, then using a thread_group with cores+1 threads that all continually poll the ray queue for which one to work on. So that's not exactly an answer to the question you asked, but I think that's the usual wisdom with regards to threads. HTH, ~ Scott
Date: Sun, 24 Aug 2008 21:56:49 -0400 From: me22.ca+boost@gmail.com To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Thread] Beginner question regarding thread groups
On Sun, Aug 24, 2008 at 18:58, Michel Lestrade wrote:
I am considering rewriting part of our ray tracing code to use the Boost thread library. As many of you might know, ray tracing is a task that lends itself well to a parallel approach since each ray is independent from the other. However, the code right now is in Fortran which doesn't support task level parallelism easily ... The thread_group class seems ideally suited to my needs (one thread == one ray) but there is at least one problem I envision that I would like to solve before embarking on this fairly time-consuming rewrite.
It's worth pointing out that, as I recall, you don't really want more continually-active threads than, say, twice your number of cores. It would be nice to run one thread per ray, but the switching and OS management overhead will kill you if you attempt to run thousands of threads at once, which I assume you'd need with 1:1. (Erlang, for example, which is conceptually based on many, many communicating processes, uses its own implementation, rather than OS threads of processes.)
I think you'd be much better off creating a "ray queue" of some sort,
If you want some ideas for reference, I did a quick check on the intel site and they have some papers and apparently even an openMP Fortran compiler if that helps you, [ obviously their comments are specific to their products but quite generally useful esp if you are looking for ideas ] http://cache-www.intel.com/cd/00/00/21/92/219292_hyperthreading_extract.pdf ( http://www.google.com/search?hl=en&q=site%3Aintel.com+openmp+performance+optimization ) I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983 but I have seen several posts recently on threading as a solution to everything and would like to point out that there may be better approaches that yield greater improvements. Have you tried to "think locally, act globally?" That is, consider ways of organizing your approach to increase various types of locality that minimize cache thrashing? While you say that rays are independent, if you do classical physical optics, nearby rays tend to have similar trajectories etc. Rather than let an ignorant but fair thread scheduler decide what piece of memory to access next, if you are cache aware, you could even consider something like sorting the rays to get the best locality and making them dependent with a transform scheme that recognizes they are similar if nearby etc. Random access memory is random access but if you look at the overall architecture you can take a big performance hit for not keeping stuff sequential. Again, I'm not sure any of this helps you here and if you have a lot of processors then maybe threading is the easiest solution for you but I still don't see a lot of discussion on memory access optimization which becomes a big limitation in many cases. got a few hits here FWIW, http://citeseerx.ist.psu.edu/search?q=%22ray+tracing%22+AND++locality+AND+cache&sort=rel Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 415-264-8477 (w)<- use this 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka@hotmail.com Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. Note: hotmail is getting cumbersom, try also marchywka@yahoo.com _________________________________________________________________ Be the filmmaker you always wanted to be—learn how to burn a DVD with Windows®. http://clk.atdmt.com/MRT/go/108588797/direct/01/
On Mon, Aug 25, 2008 at 07:45, Mike Marchywka
I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983 but I have seen several posts recently on threading as a solution to everything and would like to point out that there may be better approaches that yield greater improvements. Have you tried to "think locally, act globally?" That is, consider ways of organizing your approach to increase various types of locality that minimize cache thrashing? While you say that rays are independent, if you do classical physical optics, nearby rays tend to have similar trajectories etc. Rather than let an ignorant but fair thread scheduler decide what piece of memory to access next, if you are cache aware, you could even consider something like sorting the rays to get the best locality and making them dependent with a transform scheme that recognizes they are similar if nearby etc.
It may be possible to do fairly well on locality by on a split, rather than putting both new rays into the queue and doing a new fetch from the queue, continuing on with the ray most similar to the incident ray, only adding the other ray(s?) to the queue, so the frame of reference only jumps when a ray dies. (Not unlike half-tailing quicksort.)
participants (3)
-
Michel Lestrade
-
Mike Marchywka
-
Scott McMurray