On Mon, Aug 25, 2008 at 07:45, Mike Marchywka
I haven't looked at ray tracing at all since going to SIGGRAPH circa 1983 but I have seen several posts recently on threading as a solution to everything and would like to point out that there may be better approaches that yield greater improvements. Have you tried to "think locally, act globally?" That is, consider ways of organizing your approach to increase various types of locality that minimize cache thrashing? While you say that rays are independent, if you do classical physical optics, nearby rays tend to have similar trajectories etc. Rather than let an ignorant but fair thread scheduler decide what piece of memory to access next, if you are cache aware, you could even consider something like sorting the rays to get the best locality and making them dependent with a transform scheme that recognizes they are similar if nearby etc.
It may be possible to do fairly well on locality by on a split, rather than putting both new rays into the queue and doing a new fetch from the queue, continuing on with the ray most similar to the incident ray, only adding the other ray(s?) to the queue, so the frame of reference only jumps when a ray dies. (Not unlike half-tailing quicksort.)