Re: [Boost-users] [Thread] Beginner question regarding thread groups

25 Aug 2008

...
Date: Sun, 24 Aug 2008 21:56:49 -0400
From: me22.ca+boost@gmail.com
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] [Thread] Beginner question regarding thread groups
On Sun, Aug 24, 2008 at 18:58, Michel Lestrade
 wrote:
...
I am considering rewriting part of our ray tracing code to use the Boost
thread library. As many of you might know, ray tracing is a task that lends
itself well to a parallel approach since each ray is independent from the
other. However, the code right now is in Fortran which doesn't support task
level parallelism easily ... The thread_group class seems ideally suited to
my needs (one thread == one ray) but there is at least one problem I
envision that I would like to solve before embarking on this fairly
time-consuming rewrite.
It's worth pointing out that, as I recall, you don't really want more
continually-active threads than, say, twice your number of cores. It
would be nice to run one thread per ray, but the switching and OS
management overhead will kill you if you attempt to run thousands of
threads at once, which I assume you'd need with 1:1. (Erlang, for
example, which is conceptually based on many, many communicating
processes, uses its own implementation, rather than OS threads of
processes.)
I think you'd be much better off creating a "ray queue" of some sort,
If you want some ideas for reference, I did a quick check on the intel
site and they have some papers and apparently even an openMP Fortran
compiler if that helps you,
 [ obviously their comments are specific to their products but quite generally useful
esp if you are looking for ideas ]
http://cache-www.intel.com/cd/00/00/21/92/219292_hyperthreading_extract.pdf

( http://www.google.com/search?hl=en&q=site%3Aintel.com+openmp+performance+optimization )

I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983 
 but I have seen several posts recently
on threading as a solution to everything and would like to point out that there
may be better approaches that yield greater improvements. Have you
tried to "think locally, act globally?" That is, consider ways of organizing your
approach to increase various types of locality that minimize cache thrashing?
While you say that rays are independent, if you do classical physical optics,
nearby rays tend to have similar trajectories etc.  Rather than let an ignorant
but fair thread scheduler decide what piece of memory to access next, if you
are cache aware, you could even consider something like sorting the rays
to get the best locality and making them dependent with a transform
scheme that recognizes they are similar if nearby etc. 

Random access memory is random access but if you look at the overall architecture
you can take a big performance hit for not keeping stuff sequential.

Again, I'm not sure any of this helps you here and if you have a lot of processors
then maybe threading is the easiest solution for you but I still don't
see a lot of discussion on memory access optimization which becomes
a big limitation in many cases. 

got a few hits here FWIW,

http://citeseerx.ist.psu.edu/search?q=%22ray+tracing%22+AND++locality+AND+cache&sort=rel

Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka@hotmail.com
Note: If I am asking for free stuff, I normally use for hobby/non-profit
information but may use in investment forums, public and private.
Please indicate any concerns if applicable.
Note:  hotmail is getting cumbersom, try also marchywka@yahoo.com

_________________________________________________________________
Be the filmmaker you always wanted to be—learn how to burn a DVD with Windows®.
http://clk.atdmt.com/MRT/go/108588797/direct/01/