boost.proto optimization on MSVC (2005/2008)
Hi,
I've been trying out boost.proto lately and I've noticed that for my
expressions it tends to force my operands to be created on the stack. For
example, instead of optimizing out an object completely when everything
about it
is known at compile-time, the compiler (MSVC 2005 or 2008) is creating that
object on the stack anyway.. even though it's not necessary. Even worse, it
actually folds the simpler expressions at compile-time and never even
touches
the data that it just put on the stack. The problem is compounded with
larger
expressions involving many operands. It even seems to build the larger
expression objects on the stack unnecessarily (and this looks like it starts
to
break constant folding).
Here's some of the generated assembly:
int main()
{
00401800 sub esp,18h
vector2< float[3] > v1 = { 0, 1, 2 };
00401803 xorps xmm0,xmm0
vector2< float[3] > v2 = { 3, 4, 5 };
vector2< float[3] > v3 = { 6, 7, 8 };
// Add two vectors lazily and get the 2nd element.
#if 1
// this sticks temporary junk all over the stack, even with 2 terms
std::cout << (v1 + v2)[0];
00401806 fld dword ptr [__real@40400000 (402128h)]
0040180C movss dword ptr [esp],xmm0
00401811 movss xmm0,dword ptr [__real@3f800000 (402124h)]
00401819 movss dword ptr [esp+4],xmm0
0040181F movss xmm0,dword ptr [__real@40000000 (402120h)]
00401827 movss dword ptr [esp+8],xmm0
0040182D movss xmm0,dword ptr [__real@40400000 (402128h)]
00401835 movss dword ptr [esp+0Ch],xmm0
0040183B movss xmm0,dword ptr [__real@40800000 (40211Ch)]
00401843 push ecx
00401844 mov ecx,dword ptr [__imp_std::cout (402044h)]
0040184A fstp dword ptr [esp]
0040184D movss dword ptr [esp+14h],xmm0
00401853 movss xmm0,dword ptr [__real@40a00000 (402118h)]
0040185B movss dword ptr [esp+18h],xmm0
00401861 call dword ptr
[__imp_std::basic_ostream
::type::value_type result_type;
result_type operator ()(Expr const & expr, subscript_context
const & ctx) const
{
return proto::arg(expr)[ctx.i];
}
};
};
// Here is the domain-specific expression wrapper, which overrides
// operator [] to evaluate the expression using the subscript_context.
template< typename Expr >
struct vector_expr : proto::extends
{ typedef proto::extends< Expr, vector_expr< Expr >, VectorDomain > base_type; vector_expr(Expr const & expr) : base_type(expr) { } // Use the subscript_context to implement subscripting // of a lazy vector expression tree. typename proto::result_of::eval< Expr, subscript_context const
::type operator [](std::size_t i) const { subscript_context const ctx(i); return proto::eval(*this, ctx); } };
// Tell proto that in the VectorDomain, all // expressions should be wrapped in vector_expr<> struct VectorDomain : proto::domain< proto::generator< vector_expr >, VectorGrammar > { }; template< typename T > struct is_vector : mpl::false_ { }; template< typename T > struct is_vector< vector2< T > > : mpl::true_ { }; BOOST_PROTO_DEFINE_BINARY_OPERATOR(+, boost::proto::tag::plus, is_vector, VectorDomain); int main() { vector2< float[3] > v1 = { 0, 1, 2 }; vector2< float[3] > v2 = { 3, 4, 5 }; vector2< float[3] > v3 = { 6, 7, 8 }; // Add two vectors lazily and get the 2nd element. #if 1 // this sticks write-only temporary junk all over the stack, even with 2 terms std::cout << (v1 + v2)[0];// + v3 + v3 + v3 + v3 + v3)[0]; #else // this optimizes out completely for 2 terms... expanding to 3 terms does not typedef proto::terminal< vector2< float[3] > >::type expr_type; std::cout << (vector_expr< expr_type >( expr_type::make(v2) ) + vector_expr< expr_type >( expr_type::make(v3) ))[0]; #endif std::cin.get(); return 0; } Thanks in advance. // AJD < n.outpost@gmail.com >
AJD
Here's some of the generated assembly:
int main() { 00401800 sub esp,18h vector2< float[3] > v1 = { 0, 1, 2 }; 00401803 xorps xmm0,xmm0 vector2< float[3] > v2 = { 3, 4, 5 }; vector2< float[3] > v3 = { 6, 7, 8 };
// Add two vectors lazily and get the 2nd element. #if 1 // this sticks temporary junk all over the stack, even with 2 terms std::cout << (v1 + v2)[0]; 00401806 fld dword ptr [__real <at> 40400000 (402128h)] 0040180C movss dword ptr [esp],xmm0 00401811 movss xmm0,dword ptr [__real <at> 3f800000 (402124h)] 00401819 movss dword ptr [esp+4],xmm0 0040181F movss xmm0,dword ptr [__real <at> 40000000 (402120h)] 00401827 movss dword ptr [esp+8],xmm0 0040182D movss xmm0,dword ptr [__real <at> 40400000 (402128h)] 00401835 movss dword ptr [esp+0Ch],xmm0 0040183B movss xmm0,dword ptr [__real <at> 40800000 (40211Ch)] 00401843 push ecx 00401844 mov ecx,dword ptr [__imp_std::cout (402044h)] 0040184A fstp dword ptr [esp] 0040184D movss dword ptr [esp+14h],xmm0 00401853 movss xmm0,dword ptr [__real <at> 40a00000 (402118h)] 0040185B movss dword ptr [esp+18h],xmm0 00401861 call dword ptr [__imp_std::basic_ostream
::operator<< (402040h)] std::cin.get(); 00401867 mov ecx,dword ptr [__imp_std::cin (40203Ch)] 0040186D call dword ptr [__imp_std::basic_istream
::get (402038h)] return 0; 00401873 xor eax,eax } 00401875 add esp,18h 00401878 ret
Just to be clear, the following is the expected assembly:
00401800 fld dword ptr [__real@41100000 (402118h)]
00401806 push ecx
00401807 mov ecx,dword ptr [__imp_std::cout (402044h)]
0040180D fstp dword ptr [esp]
00401810 call dword ptr
[__imp_std::basic_ostream
"AJD"
Hi, I've been trying out boost.proto lately and I've noticed that for my expressions it tends to force my operands to be created on the stack.
Hi AJD, What compiler (MSVC 2005 or 2008) and what compiler options did you use to generate the assembly code you posted? Also, are you using Proto from release 1.34.1 or from Trunk? Thanks, Dave Jenkins
AJD wrote:
Hi,
I've been trying out boost.proto lately and I've noticed that for my expressions it tends to force my operands to be created on the stack. <snip>
The way you were defining your vector terminals confused proto into
thinking they needed to be wrapped in your vector_expr wrapper. The
extra layer of indirection is what is flummoxing msvc's optimizer. I can
only assume you went through those contortions in order to preserve the
nice aggregate initialization for your vector terminals. Try the
following instead.
#include <cstddef>
#include <iostream>
#include
"Eric Niebler"
template< typename T, typename U = proto::is_proto_expr > struct vector2;
template< typename T > struct vector2
{ BOOST_PROTO_EXTENDS(typename proto::terminal ::type, vector2 , VectorDomain) typedef T value_type;
T & operator [](std::size_t i) { return proto::arg(*this)[i]; }
T const & operator [](std::size_t i) const { return proto::arg(*this)[i]; } };
// Tell proto that in the VectorDomain, all // expressions should be wrapped in vector_expr<> struct VectorDomain : proto::domain< proto::pod_generator< vector_expr > , VectorGrammar > { };
Thanks, this works great. The only problem is that vector<> is intended as a wrapper class, and I just implemented it as an array for convenience in this example. It could have any underlying type, not necessarily an array. Is there a proper way to do this without exposing vector<>'s underlying type?
AJD wrote:
Thanks, this works great. The only problem is that vector<> is intended as a wrapper class, and I just implemented it as an array for convenience in this example. It could have any underlying type, not necessarily an array. Is there a proper way to do this without exposing vector<>'s underlying type?
I'm confused. So your vector2<> type is intended as a wrapper for any
kind of terminal, not arrays? So how is your example different than any
of the examples in proto's docs? Have you tried using proto::extends<>
(or BOOST_PROTO_EXTENDS) to extend terminal<T>::type instead of
terminal
"Eric Niebler"
I'm confused. So your vector2<> type is intended as a wrapper for any kind of terminal, not arrays?
vector2<> doesn't wrap terminals, it just wraps regular types to provide
uniform access to them with a vector (math vector) interface. This way
vector2
So how is your example different than any of the examples in proto's docs?
I based it off of the lazyvector example, but the code does not optimize as expected. (In fact, all I really did was swap out the vector type.)
Have you tried using proto::extends<> (or BOOST_PROTO_EXTENDS) to extend terminal<T>::type instead of terminal
::type ? You'll also need to patch up your grammar to accept any kind of terminal.
I'm not sure that's the correct approach... proto should not have to know about the internal type of vector2<> at all. I could do this with a helper type where vector2 extends vector2_internal which contains all the functionality, but then I'm faced with the problem that the many specializations of vector2_internal<> could have different constructors which would no longer be accessible through the vector2 interface.
AJD wrote:
"Eric Niebler"
wrote in message news:475AEB69.9070304@boost-consulting.com... I'm confused. So your vector2<> type is intended as a wrapper for any kind of terminal, not arrays?
vector2<> doesn't wrap terminals, it just wraps regular types to provide uniform access to them with a vector (math vector) interface. This way vector2
, vector2 , and vector2<D3DXVECTOR> could all be used with expressions regardless of their underlying type, and additional types can be used just by specializing vector2<T> and providing the subscript operator as well as some traits.
OK...
So how is your example different than any of the examples in proto's docs?
I based it off of the lazyvector example, but the code does not optimize as expected. (In fact, all I really did was swap out the vector type.)
Your code doesn't optimize because you replaced the terminal type with something that proto doesn't recognize as a terminal.
Have you tried using proto::extends<> (or BOOST_PROTO_EXTENDS) to extend terminal<T>::type instead of terminal
::type ? You'll also need to patch up your grammar to accept any kind of terminal. I'm not sure that's the correct approach... proto should not have to know about the internal type of vector2<> at all. I could do this with a helper type where vector2 extends vector2_internal which contains all the functionality, but then I'm faced with the problem that the many specializations of vector2_internal<> could have different constructors which would no longer be accessible through the vector2 interface.
Proto builds trees. Trees have terminals and non-terminals. Any type
which proto doesn't recognize is assumed to be a terminal and is
automatically wrapped. Types which proto recognizes are:
- proto::expr<>
- Anything that inherits from proto::extends<>
- Anything type that uses the BOOST_PROTO_EXTENDS macro
Your vector terminals must conform to one of those. Otherwise, proto
will make it conform by wrapping it, and that makes your optimizer's job
harder.
The code I sent has a specialization for vector2
participants (3)
-
AJD
-
Dave Jenkins
-
Eric Niebler