Hi all - I just tried out the operators library for the first time today. I was quite happy with how easily it helped me implement my test class. Unfortunately, I'm running into some performance issues, in particular with operator +. The interesting thing is that the new operator += is actually faster than my old += by about a factor of 2, but my operator + is slower than my old + by a factor of nearly 10! I'm new here, and not really sure how much code is appropriate to post to the list, so here's a minimal rundown: -------------------------------------------------------------------------- Old code: template<typename real> Vector3<real> operator + (const Vector3<real> &a, const Vector3<real> &b); template<typename real> class Vector3 { protected: real m_v[3]; . . friend Vector3 operator +<> (const Vector3 &a, const Vector3 &b); }; template<typename real> inline Vector3<real> operator + (const Vector3<real> &a, const Vector3<real> &b){ Vector3<real> t; t[0] = a[0] + b[0]; t[1] = a[1] + b[1]; t[2] = a[2] + b[2]; return t; } ------------------------------------------------------------------------------------------------- New code (note that the new code is templatized on size, and that my instantiations were of size 3 to match the above code): template<typename T, unsigned int S> class vec : boost::additive< vec<T, S> , boost::multiplicative< vec<T, S>, T > > { private: typedef unsigned int uint; typedef const T &const_reference; typedef T &reference; private: boost::array<T,S> m_v; template<typename U> vec &operator += (const vec<U,S> &t){ for(uint i = 0; i < S; ++i){ m_v[i] += t.m_v[i]; } return *this; } }; Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important. Thanks, Brian
On Monday 08 May 2006 06:05, Brian Budge wrote:
Hi all -
I just tried out the operators library for the first time today. I was quite happy with how easily it helped me implement my test class. Unfortunately, I'm running into some performance issues, in particular with operator +.
The interesting thing is that the new operator += is actually faster than my old += by about a factor of 2, but my operator + is slower than my old + by a factor of nearly 10!
I'm new here, and not really sure how much code is appropriate to post to the list, so here's a minimal rundown:
-------------------------------------------------------------------------- Old code:
template<typename real> Vector3<real> operator + (const Vector3<real> &a, const Vector3<real> &b);
template<typename real> class Vector3 { protected: real m_v[3]; . . friend Vector3 operator +<> (const Vector3 &a, const Vector3 &b); };
template<typename real> inline Vector3<real> operator + (const Vector3<real> &a, const Vector3<real> &b){ Vector3<real> t; t[0] = a[0] + b[0]; t[1] = a[1] + b[1]; t[2] = a[2] + b[2]; return t; }
--------------------------------------------------------------------------- ---------------------- New code (note that the new code is templatized on size, and that my instantiations were of size 3 to match the above code):
template<typename T, unsigned int S> class vec
: boost::additive< vec<T, S>
, boost::multiplicative< vec<T, S>, T
{ private: typedef unsigned int uint; typedef const T &const_reference; typedef T &reference;
private: boost::array<T,S> m_v;
template<typename U> vec &operator += (const vec<U,S> &t){ for(uint i = 0; i < S; ++i){ m_v[i] += t.m_v[i]; } return *this; } };
Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I don't know anything about the operators library, but I noticed that the fast implementation does not have the loop that the slow one has. Loops are typically long to setup, so that could be the problem. Fred
Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I would suspect it's the loop that's at fault, although very I'm surprised it's a factor of 10. Your original code had the loop unrolled, so you might try a bit of template metaprogramming to achieve the same effect here. Otherwise you're going to have to do a bit of debugging and/or inspection of the assembly generated. BTW the measurements you made were in release mode right? If inline expansions are turned off (debug mode for example) the operators-based version may well pass through many more function calls. Of course these all disappear as long as your compiler does a reasonable job of inlining. HTH, John.
Thanks for the ideas guys. Compile options are like so: g++ -O3 -msse -mfpmath=sse I tried the metaprogramming technique (which is pretty nifty :) ), and got interesting results. Basically, it made my += operator run twice as SLOW, while making my + operator run twice as FAST. I have a feeling that this is all due to the different optimizations that gcc is doing at multiple stages of compilation. For example, it may be doing autovectorization of the simple loop case of +=, which it can't figure out with the metaprogramming technique. I'm still stumped as to why I'm roughly an order of magnitude slower with + than with +=. Any more insights? Thanks again for the ideas so far! Brian On 5/8/06, John Maddock <john@johnmaddock.co.uk> wrote:
Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I would suspect it's the loop that's at fault, although very I'm surprised it's a factor of 10. Your original code had the loop unrolled, so you might try a bit of template metaprogramming to achieve the same effect here. Otherwise you're going to have to do a bit of debugging and/or inspection of the assembly generated.
BTW the measurements you made were in release mode right? If inline expansions are turned off (debug mode for example) the operators-based version may well pass through many more function calls. Of course these all disappear as long as your compiler does a reasonable job of inlining.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Well, have you considered certainty of memory aliasing? In particular, gcc supports the restrict keyword, (e.g. double *__restrict__ c ) indicating that the memory spaces pointed to by c will never be accessed by anything /but/ c, allowing it to make load- store and register usage optimizations it couldn't otherwise. In particular, it's 100% certain in the manually indexed case that a[0] will never ever refer to b[1]. Then again, it can't be as sure in the looped version. Just a thought, that may or may not pan out. All it takes to try is a quick addition of __restrict__ however, so it's not a tough test. - Greg Link Penn State University York College of Pennsylvania On May 8, 2006, at 2:23 PM, Brian Budge wrote:
Thanks for the ideas guys.
Compile options are like so: g++ -O3 -msse -mfpmath=sse
I tried the metaprogramming technique (which is pretty nifty :) ), and got interesting results.
Basically, it made my += operator run twice as SLOW, while making my + operator run twice as FAST.
I have a feeling that this is all due to the different optimizations that gcc is doing at multiple stages of compilation. For example, it may be doing autovectorization of the simple loop case of +=, which it can't figure out with the metaprogramming technique. I'm still stumped as to why I'm roughly an order of magnitude slower with + than with +=.
Any more insights?
Thanks again for the ideas so far! Brian
On 5/8/06, John Maddock <john@johnmaddock.co.uk> wrote:
Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I would suspect it's the loop that's at fault, although very I'm surprised it's a factor of 10. Your original code had the loop unrolled, so you might try a bit of template metaprogramming to achieve the same effect here. Otherwise you're going to have to do a bit of debugging and/or inspection of the assembly generated.
BTW the measurements you made were in release mode right? If inline expansions are turned off (debug mode for example) the operators- based version may well pass through many more function calls. Of course these all disappear as long as your compiler does a reasonable job of inlining.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Thanks for the idea Greg. I thought for sure you were on to something, but I tried adding the _restrict__ keyword in the operator.hpp binary_ops functions, and it made no difference :( On 5/8/06, Greg Link <link@cse.psu.edu> wrote:
Well, have you considered certainty of memory aliasing? In particular, gcc supports the restrict keyword, (e.g. double *__restrict__ c ) indicating that the memory spaces pointed to by c will never be accessed by anything /but/ c, allowing it to make load- store and register usage optimizations it couldn't otherwise. In particular, it's 100% certain in the manually indexed case that a[0] will never ever refer to b[1]. Then again, it can't be as sure in the looped version.
Just a thought, that may or may not pan out. All it takes to try is a quick addition of __restrict__ however, so it's not a tough test.
- Greg Link Penn State University York College of Pennsylvania
On May 8, 2006, at 2:23 PM, Brian Budge wrote:
Thanks for the ideas guys.
Compile options are like so: g++ -O3 -msse -mfpmath=sse
I tried the metaprogramming technique (which is pretty nifty :) ), and got interesting results.
Basically, it made my += operator run twice as SLOW, while making my + operator run twice as FAST.
I have a feeling that this is all due to the different optimizations that gcc is doing at multiple stages of compilation. For example, it may be doing autovectorization of the simple loop case of +=, which it can't figure out with the metaprogramming technique. I'm still stumped as to why I'm roughly an order of magnitude slower with + than with +=.
Any more insights?
Thanks again for the ideas so far! Brian
On 5/8/06, John Maddock <john@johnmaddock.co.uk> wrote:
Any ideas how to increase the performance of the new code here? A factor of 10 makes it seem like I am just missing something important.
I would suspect it's the loop that's at fault, although very I'm surprised it's a factor of 10. Your original code had the loop unrolled, so you might try a bit of template metaprogramming to achieve the same effect here. Otherwise you're going to have to do a bit of debugging and/or inspection of the assembly generated.
BTW the measurements you made were in release mode right? If inline expansions are turned off (debug mode for example) the operators- based version may well pass through many more function calls. Of course these all disappear as long as your compiler does a reasonable job of inlining.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Mon, 8 May 2006, Brian Budge wrote:
Thanks for the ideas guys.
Compile options are like so: g++ -O3 -msse -mfpmath=sse
I tried the metaprogramming technique (which is pretty nifty :) ), and got interesting results.
Basically, it made my += operator run twice as SLOW, while making my + operator run twice as FAST.
I have a feeling that this is all due to the different optimizations that gcc is doing at multiple stages of compilation. For example, it may be doing autovectorization of the simple loop case of +=, which it can't figure out with the metaprogramming technique. I'm still stumped as to why I'm roughly an order of magnitude slower with + than with +=.
Any more insights?
Did you try with -funroll-loops ? I once did a few tests with vectors, one version with loops and the other with manually unrolled loops, and with options -O3 -funroll-loops, the generated code was identical. But then again, that was with g++-3.3. As for the use of boost::operators, I don't know, I did a small test using the following, and the generated code with g++-4.0 and g++-4.1 (with option -O3 -msse -mfpmath=sse and with and without -DUSE_OP) is identical (diff reports no difference). #ifdef USE_OP #include <boost/operators.hpp> #endif template < typename T > class vector #ifdef USE_OP : boost::addable< vector< T > > #endif { private : T data_[ 3 ] ; public : vector() { } explicit vector( const T* v ) { data_[ 0 ] = v[ 0 ] ; data_[ 1 ] = v[ 1 ] ; data_[ 2 ] = v[ 2 ] ; } const T& operator [] ( ::std::size_t i ) const { return data_[ i ] ; } vector& operator += ( const vector& rhs ) { data_[ 0 ] += rhs[ 0 ] ; data_[ 1 ] += rhs[ 1 ] ; data_[ 2 ] += rhs[ 2 ] ; return * this ; } #ifndef USE_OP friend vector operator + ( const vector& lhs , const vector& rhs ) { vector r ; r.data_[ 0 ] = lhs[ 0 ] + rhs[ 0 ] ; r.data_[ 1 ] = lhs[ 1 ] + rhs[ 1 ] ; r.data_[ 2 ] = lhs[ 2 ] + rhs[ 2 ] ; return r ; } #endif } ; -- François Duranleau LIGUM, Université de Montréal "Any sufficiently advanced technology is indistinguishable from magic" - Arthur C. Clarke
participants (5)
-
Brian Budge
-
François Duranleau
-
Fred Labrosse
-
Greg Link
-
John Maddock