[bind] performance tuning for a bad compiler

I'm trying to make local modifications to bind to improve the code generated the compiler I have to use. I have written this small test file: //////////////////////////////// #include <boost/bind.hpp> class tester { public: struct value_type { float x, y, z; }; typedef const value_type& const_reference; value_type test( const value_type& a ); }; tester::value_type test_bind( tester& t, tester::const_reference x ) { return boost::bind( &tester::test, &t, _1 )( x ); } struct handspun { handspun( tester& t ) : m_t(t) {} tester::value_type operator()( tester::const_reference x ) { return m_t.test(x); } tester& m_t; }; tester::value_type test_handspun( tester& t, tester::const_reference x ) { return handspun(t)(x); } tester::value_type test_unrolled( tester& t, tester::const_reference x ) { return t.test( x ); } //////////////////////////////// Checking the generated assembly this code generates essentially the same code for test_bind and test_unrolled under VC8. Under the compiler I have to use (RVCT 2.2 sp1) however, it generates the following code. _Z9test_bindR6testerRKNS_10value_typeE PROC PUSH {lr} LDR r3,|L1.152| LDR r12,|L1.156| MOV lr,r2 LDM r3,{r2,r3} ; <Anon1>, <Anon1> SUB sp,sp,#0x2c LDRB r12,[r12,#0] STR r2,[sp,#0xc] STR r1,[sp,#8] ADD r2,sp,#0xc STR r12,[sp,#0x1c] STR r3,[sp,#0x10] STR r1,[sp,#0x14] LDM r2,{r2,r3,r12} ADD r1,sp,#0x20 STM r1,{r2,r3,r12} MOV r1,#0 LDR r3,[sp,#0x24] STR r1,[sp,#0x18] STR lr,[sp,#0x14] LDR r1,[sp,#0x28] TST r3,#1 LDRNE r12,[sp,#0x20] ADD r1,r1,r3,ASR #1 LDRNE r3,[r1,#0] BICNE r12,r12,#3 LDRNE r3,[r3,r12] LDREQ r3,[sp,#0x20] MOV r2,lr BLX r3 ADD sp,sp,#0x2c POP {pc} ENDP _Z13test_handspunR6testerRKNS_10value_typeE PROC PUSH {r3,lr} STR r1,[sp,#0] BL _ZN6tester4testERKNS_10value_typeE POP {r12,pc} ENDP _Z13test_unrolledR6testerRKNS_10value_typeE PROC B _ZN6tester4testERKNS_10value_typeE ENDP AREA ||.constdata||, DATA, READONLY, ALIGN=2 ||.constdata$1|| ||<Anon1>|| DCD _ZN6tester4testERKNS_10value_typeE DCD 0x00000000 I want to use bind in some performance sensitive areas because I don't want to have to get all the details of writing my functors or algorithms right since Boost and STL have already done that for me. Unless I can coerce my compiler into generating better code I'll have no choice but to abandon bind for this project. It isn't very easy for me to follow the bind sources. Does anyone have any hints on where I might start looking? Thanks, Michael Marcin

Michael Marcin:
I'm trying to make local modifications to bind to improve the code generated the compiler I have to use.
...
tester::value_type test_bind( tester& t, tester::const_reference x ) { return boost::bind( &tester::test, &t, _1 )( x ); }
I'd start with a code example that is closer to your actual use, as there is no reason to use bind at all in the above. Typically boost::bind, which is the inefficient part, is called once, and the (x) call is done multiple times inside a for_each or similar. You might find the performance adequate for some uses. There will certainly be cases where boost::bind won't cut it, but you'd be able to selectively replace just those uses with handwritten function objects, or even with handwritten loops. It's not easy to optimize the general boost::bind case while still allowing the tests to pass; it might be better to write a separate leaner version that is more limited but still serves the majority of your specific needs.

Peter Dimov wrote:
Michael Marcin:
I'm trying to make local modifications to bind to improve the code generated the compiler I have to use.
...
tester::value_type test_bind( tester& t, tester::const_reference x ) { return boost::bind( &tester::test, &t, _1 )( x ); }
I'd start with a code example that is closer to your actual use, as there is no reason to use bind at all in the above. Typically boost::bind, which is the inefficient part, is called once, and the (x) call is done multiple times inside a for_each or similar. You might find the performance adequate for some uses. There will certainly be cases where boost::bind won't cut it, but you'd be able to selectively replace just those uses with handwritten function objects, or even with handwritten loops.
It's not easy to optimize the general boost::bind case while still allowing the tests to pass; it might be better to write a separate leaner version that is more limited but still serves the majority of your specific needs.
That code started as my use case but I removed a lot of complexity to make it fit for consumption on the list. Essentially it is a function object for a std::transfrom that moves 3d vertices from model space to view space for all vertices in a model. The unrolled and bind version were close to identical for VC8 so I figured I might be able to make RVCT act right if I dropped in a few magic __forceinline keywords here or there. It's probably too much effort for me anyways so I'll just use a hand coded functor for now and move on. Thanks, Michael Marcin
participants (2)
-
Michael Marcin
-
Peter Dimov