Date: Sun, 24 Aug 2008 21:56:49 -0400 From: me22.ca+boost@gmail.com To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Thread] Beginner question regarding thread groups
On Sun, Aug 24, 2008 at 18:58, Michel Lestrade wrote:
I am considering rewriting part of our ray tracing code to use the Boost thread library. As many of you might know, ray tracing is a task that lends itself well to a parallel approach since each ray is independent from the other. However, the code right now is in Fortran which doesn't support task level parallelism easily ... The thread_group class seems ideally suited to my needs (one thread == one ray) but there is at least one problem I envision that I would like to solve before embarking on this fairly time-consuming rewrite.
It's worth pointing out that, as I recall, you don't really want more continually-active threads than, say, twice your number of cores. It would be nice to run one thread per ray, but the switching and OS management overhead will kill you if you attempt to run thousands of threads at once, which I assume you'd need with 1:1. (Erlang, for example, which is conceptually based on many, many communicating processes, uses its own implementation, rather than OS threads of processes.)
I think you'd be much better off creating a "ray queue" of some sort,
If you want some ideas for reference, I did a quick check on the intel site and they have some papers and apparently even an openMP Fortran compiler if that helps you, [ obviously their comments are specific to their products but quite generally useful esp if you are looking for ideas ] http://cache-www.intel.com/cd/00/00/21/92/219292_hyperthreading_extract.pdf ( http://www.google.com/search?hl=en&q=site%3Aintel.com+openmp+performance+optimization ) I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983 but I have seen several posts recently on threading as a solution to everything and would like to point out that there may be better approaches that yield greater improvements. Have you tried to "think locally, act globally?" That is, consider ways of organizing your approach to increase various types of locality that minimize cache thrashing? While you say that rays are independent, if you do classical physical optics, nearby rays tend to have similar trajectories etc. Rather than let an ignorant but fair thread scheduler decide what piece of memory to access next, if you are cache aware, you could even consider something like sorting the rays to get the best locality and making them dependent with a transform scheme that recognizes they are similar if nearby etc. Random access memory is random access but if you look at the overall architecture you can take a big performance hit for not keeping stuff sequential. Again, I'm not sure any of this helps you here and if you have a lot of processors then maybe threading is the easiest solution for you but I still don't see a lot of discussion on memory access optimization which becomes a big limitation in many cases. got a few hits here FWIW, http://citeseerx.ist.psu.edu/search?q=%22ray+tracing%22+AND++locality+AND+cache&sort=rel Mike Marchywka 586 Saint James Walk Marietta GA 30067-7165 415-264-8477 (w)<- use this 404-788-1216 (C)<- leave message 989-348-4796 (P)<- emergency only marchywka@hotmail.com Note: If I am asking for free stuff, I normally use for hobby/non-profit information but may use in investment forums, public and private. Please indicate any concerns if applicable. Note: hotmail is getting cumbersom, try also marchywka@yahoo.com _________________________________________________________________ Be the filmmaker you always wanted to be—learn how to burn a DVD with Windows®. http://clk.atdmt.com/MRT/go/108588797/direct/01/