
Well, have you considered certainty of memory aliasing? In particular, gcc supports the restrict keyword, (e.g. double *__restrict__ c ) indicating that the memory spaces pointed to by c will never be accessed by anything /but/ c, allowing it to make load- store and register usage optimizations it couldn't otherwise. In particular, it's 100% certain in the manually indexed case that a[0] will never ever refer to b[1]. Then again, it can't be as sure in the looped version. Just a thought, that may or may not pan out. All it takes to try is a quick addition of __restrict__ however, so it's not a tough test. - Greg Link Penn State University York College of Pennsylvania On May 8, 2006, at 2:23 PM, Brian Budge wrote:
Thanks for the ideas guys.
Compile options are like so: g++ -O3 -msse -mfpmath=sse
I tried the metaprogramming technique (which is pretty nifty :) ), and got interesting results.
Basically, it made my += operator run twice as SLOW, while making my + operator run twice as FAST.
I have a feeling that this is all due to the different optimizations that gcc is doing at multiple stages of compilation. For example, it may be doing autovectorization of the simple loop case of +=, which it can't figure out with the metaprogramming technique. I'm still stumped as to why I'm roughly an order of magnitude slower with + than with +=.
Any more insights?
Thanks again for the ideas so far! Brian
On 5/8/06, John Maddock
wrote: Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I would suspect it's the loop that's at fault, although very I'm surprised it's a factor of 10. Your original code had the loop unrolled, so you might try a bit of template metaprogramming to achieve the same effect here. Otherwise you're going to have to do a bit of debugging and/or inspection of the assembly generated.
BTW the measurements you made were in release mode right? If inline expansions are turned off (debug mode for example) the operators- based version may well pass through many more function calls. Of course these all disappear as long as your compiler does a reasonable job of inlining.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users