Why use 'functional composition' with boost::bind

Wolfram Brenig

3 Aug 2010 3 Aug '10

2:35 p.m.

Hi, as a casual/newbie boost-user I'm trying to come to grips with why/when nested functional composition with boost::bind might be of help - following Karlsson's book, in its fifth printing. ---------------------- 1) It seems that he suggests, that if one is into doing something like T faa(int); T foo(const& T); int i; vector<T> v(10); // container for(i=0; i<10; i++) { v[i] = faa(i); // set it v[i] = foo(v[i]); // operate on it } then instead of the 'old style' for-loops it would be more desirable to use 'STL style' algorithms with the functors created from boost::bind, say like transform(v.begin(),v.end(),boost::bind(&foo,_1)); instead of the 2nd line in the previous for-loop, a.s.o. ---------------------- 2) Then, it is suggested furthermore that if foo() is a (simple?) nested function, say T=double and, double foo(double x) { return 2*x; } then one should do 'functional composition', i.e. create the functor for the previous point 1) by boost::bind(std::multiplies<double>(),2,_1) (Karlsson has this example on adding and subtracting 10%.) ----------------------- Now, my question is, if such nested construction of functors at the call site, which is suggested also in the 'Bind Summary' in Karlsson's book, will actually lead to code which has no relevant performance loss as compared to 'old-style' for-loops? To check this, I do something rather simple (simplistic?). Namely, I take a vector of some size, then I fill it with random doubles between 0 and M_PI, then (stupid me) I test if sin(v[i])^2 + cos([i])^2 = 1 for each element of the vector. The code is attached. I do this many many times and measure the runtime. I do this for two cases: case 1: with old-style for loops case 3: with STL algorithms and boost::bind At least for this little program, the old-style code is faster by roughly a factor of 2 as compared to the boost::bind approach. (compiled with gcc version 4.3.2 and optimized) Case 2 just to show that this is not due to more sin()/cos() function calls within the nested functor, since it is also a factor of 2 faster than then the boost variant. So, in conclusion: when/how should I really use functional composition? wb --------------------------------- #include <iostream> #include <vector> #include #include <algorithm> #include "boost/lambda/bind.hpp" #include #include using namespace std; using namespace boost; using namespace boost::lambda; int main() { int in; unsigned n(1000),i,l; double x,y,t; vector<double> v(n); clock_t start, stop; cin >> in; // tell me which case to check switch(in) { case 0: // Check how much time is lost in loops and math // --------------------------------------------------------------------------- t = 0.0; assert((start = clock())!=-1); for(l=0;l<10000;l++) { for(i=0;i

Show replies by date

Steven Watanabe

3 Aug 3 Aug

5:12 p.m.

AMDG Wolfram Brenig wrote:

...

----------------------- Now, my question is, if such nested construction of functors at the call site, which is suggested also in the 'Bind Summary' in Karlsson's book, will actually lead to code which has no relevant performance loss as compared to 'old-style' for-loops? <snip> bind(multiplies<double>(),bind(sin,_1),bind(sin,_1)), bind(multiplies<double>(),bind(cos,_1),bind(cos,_1))

These calls to sin and cos are through function pointers. In Christ, Steven Watanabe

WB

6:27 p.m.

...

AMDG

Wolfram Brenig wrote:

...
----------------------- Now, my question is, if such nested construction of functors at the call site, which is suggested also in the 'Bind Summary' in Karlsson's book, will actually lead to code which has no relevant performance loss as compared to 'old-style' for-loops? <snip> bind(multiplies<double>(),bind(sin,_1),bind(sin,_1)), bind(multiplies<double>(),bind(cos,_1),bind(cos,_1))

These calls to sin and cos are through function pointers.

In Christ, Steven Watanabe

Hi, Even if boost::bind does so in this expression, what should I conclude from that in the context of my question? Does this already make up for the performance difference? Can you suggest a different nested bind at the call site for this example which performs similar to old-style for-loops (and please: by code not by casuistry) Cheers, wb

OvermindDL1

11:56 p.m.

On Tue, Aug 3, 2010 at 12:27 PM, WB wrote:

...

...
AMDG

Wolfram Brenig wrote:

...
----------------------- Now, my question is, if such nested construction of functors at the call site, which is suggested also in the 'Bind Summary' in Karlsson's book, will actually lead to code which has no relevant performance loss as compared to 'old-style' for-loops? <snip> bind(multiplies<double>(),bind(sin,_1),bind(sin,_1)), bind(multiplies<double>(),bind(cos,_1),bind(cos,_1))

These calls to sin and cos are through function pointers.

In Christ, Steven Watanabe

Hi,

Even if boost::bind does so in this expression, what should I conclude from that in the context of my question? Does this already make up for the performance difference?

Can you suggest a different nested bind at the call site for this example which performs similar to old-style for-loops (and please: by code not by casuistry)

You all really should look at Boost.Phoenix (part of the Boost.Spirit namespace right now, becoming a full namespace soon), it is designed for just what you are trying to do, but vastly better, and much more efficient. The documentation to Phoenix2 is at: http://www.boost.org/doc/libs/1_43_0/libs/spirit/phoenix/doc/html/index.html And Phoenix3 is coming out soon, but will have the same interface as Phoenix2, but yet even more capable.

Mathias Gaunard

4 Aug 4 Aug

1:02 a.m.

Le 03/08/2010 15:35, Wolfram Brenig a écrit :

...

Hi,

as a casual/newbie boost-user I'm trying to come to grips with why/when nested functional composition with boost::bind might be of help - following Karlsson's book, in its fifth printing.

---------------------- 1) It seems that he suggests, that if one is into doing something like

T faa(int); T foo(const& T);

int i; vector<T> v(10); // container for(i=0; i<10; i++) { v[i] = faa(i); // set it v[i] = foo(v[i]); // operate on it }

then instead of the 'old style' for-loops it would be more desirable to use 'STL style' algorithms with the functors created from boost::bind, say like

transform(v.begin(),v.end(),boost::bind(&foo,_1));

You might as well write boost::transform(v, foo); But, personally, since that's a fairly trivial algorithm, I don't find that much more interesting than foreach(int& i, v) i = foo(i); There are other situations, though, where using higher-order programming really shines.

...

2) Then, it is suggested furthermore that if foo() is a (simple?) nested function, say T=double and,

double foo(double x) { return 2*x; }

then one should do 'functional composition', i.e. create the functor for the previous point 1) by

boost::bind(std::multiplies<double>(),2,_1)

Replace all that by just 2*_1

...

Now, my question is, if such nested construction of functors at the call site, which is suggested also in the 'Bind Summary' in Karlsson's book, will actually lead to code which has no relevant performance loss as compared to 'old-style' for-loops?

Chances are it will be *more* efficient, actually. Algorithms typically do their task the best and fastest way (e.g. the way you did it your iteration in your original post isn't really the best way to iterate, albeit most compilers will optimize that), and for simple algorithms they will be fully inlined. With some compilers, function objects actually get inlined better than function pointers. One other advantage is that algorithms are generic, idiomatic, more high-level and thus better reflect what the code does and reduces potential bugs.

...

To check this, I do something rather simple (simplistic?). Namely, I take a vector of some size, then I fill it with random doubles between 0 and M_PI, then (stupid me) I test if sin(v[i])^2 + cos([i])^2 = 1 for each element of the vector. The code is attached.

I do this many many times and measure the runtime.

I do this for two cases:

case 1: with old-style for loops case 3: with STL algorithms and boost::bind

At least for this little program, the old-style code is faster by roughly a factor of 2 as compared to the boost::bind approach. (compiled with gcc version 4.3.2 and optimized)

I ran your test. Case 3 runs in 0.65s, case 1 and 2 in 0.82s. That's with GCC 4.4.3. Comparing 3 and 1 is not even fair, since 3 does the unnecessary sin and cos calculations like 2, but the compiler seems to optimize these. Your code, however, is needlessy complicated (std::multiplies etc.). By making it use boost.lambda, however, it goes up to 0.78s. I suppose it would fare better with Phoenix. I admit I have no idea whatsoever why all these cases don't perform the same. Still, none of these is as optimized as it could be.

WB

6:43 a.m.

Hi,

...

...
2) Then, it is suggested furthermore that if foo() is a (simple?) nested function, say T=double and,

double foo(double x) { return 2*x; }

then one should do 'functional composition', i.e. create the functor for the previous point 1) by

boost::bind(std::multiplies<double>(),2,_1)

Replace all that by just 2*_1

As you say, all this is very trivial, and was meant to understand why Karlsson is exactly doing this, i.e. replacing something like 2*_1 with std::multiplies<double>(),2,_1 (Its on page 256 in my copy in chapter Library 9: Bind) So, I was trying to see if the compiler transforms the latter into the former - which seemingly does not happen. (which leaves me a little nervous about what boost::bind will do to you in the end anyway)

...

the way you did it your iteration in your original post isn't really the best way to iterate, albeit most compilers will optimize that

as said, the code was not meant to be optimal, rather I was (naively) hoping that either case 1 or 2 would be comparable to case 3

...

...
At least for this little program, the old-style code is faster by roughly a factor of 2 as compared to the boost::bind approach. (compiled with gcc version 4.3.2 and optimized)

I ran your test. Case 3 runs in 0.65s, case 1 and 2 in 0.82s. That's with GCC 4.4.3.

That is really strange? For me Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s (I compile with gcc -O3 -Wall -lstdc++, nothing special) This leaves me even more nervous.

...

Comparing 3 and 1 is not even fair, since 3 does the unnecessary sin and cos calculations like 2, but the compiler seems to optimize these.

Well, that is why I have explicitly included both cases 1 and 2 in the code and already commented on this in the code, in order to show what you seem to have found yourself, namely, that it does not seem to be the number of calls to cos and sin which accounts for the runtime of case 1 vs. 2 since, as for you, case 1 and 2 need approximately the same time for me. Do you suggest, that if I'd look at the assembler I'd find only one sin and one cos call for case 1 and 2 but I'd find more calls for them in case 3 ? Again, this would leave me nervous with boost::bind an gcc. Moreover I have included case 0, where I perform no access to the vector<T> at all and only loop over the sin(), cos(), and random() stuff. For me this case runs ~0.16...0.19s. I.e. it seems that the 'old-fashioned' C-things are definitely not the problem, at least with gcc. Finally, I don't understand at all why for you case 3 runs even shorter than case 1 and 2. All this is very 'miraculous' - and I understand even less now when to go for functional composition with boost::bind ;) wb

WB

12:27 p.m.

Hi,

...

...
I ran your test. Case 3 runs in 0.65s, case 1 and 2 in 0.82s. That's with GCC 4.4.3.

That is really strange? For me Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s (I compile with gcc -O3 -Wall -lstdc++, nothing special) This leaves me even more nervous.

just one more followup on this. I tried Intels icc on the same thing. Then its: ~0.18s for case 0 ~0.57s for case 1 and 2 ~0.99s for case 3 on my machine. I.e., things are factor of ~2 slower when using boost::bind not only for gcc. (And it is well known that icc can make things run much faster than gcc) Now I'm really curious what your compiler is ? wb

Mathias Gaunard

4:56 p.m.

On 04/08/10 07:43, WB wrote:

...

As you say, all this is very trivial, and was meant to understand why Karlsson is exactly doing this, i.e. replacing something like 2*_1 with std::multiplies<double>(),2,_1 (Its on page 256 in my copy in chapter Library 9: Bind) So, I was trying to see if the compiler transforms the latter into the former - which seemingly does not happen. (which leaves me a little nervous about what boost::bind will do to you in the end anyway)

You don't seem to understand what 2*_1 is. It's a lambda expression. It generates a function object that takes one argument and returns 2 times that argument. It's basically the same as bind(multiplies<double>(), 2, _1), except multiplies is lazy and is replaced by operator* for nicer syntax. You also seem to be confusing Boost.Lambda, Boost.Bind, and the standard function object helpers.

...

That is really strange? For me Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s (I compile with gcc -O3 -Wall -lstdc++, nothing special) This leaves me even more nervous.

I used g++ -O3 -fomit-frame-pointer -march=native Ubuntu Linux, Intel Core 2 Duo.

...

Moreover I have included case 0, where I perform no access to the vector<T> at all and only loop over the sin(), cos(), and random() stuff. For me this case runs ~0.16...0.19s. I.e. it seems that the 'old-fashioned' C-things are definitely not the problem, at least with gcc.

Nothing prevents the compiler from evaluating your whole code at compile-time. For that reason, they're not really good performance tests. Chances are, the difference comes from the fact that the compiler managed to do more at compile-time in one of the cases.

WB

7:22 p.m.

On Wed, 04 Aug 2010 17:56:43 +0100 Mathias Gaunard wrote:

...

You don't seem to understand what 2*_1 is. It's a lambda expression. You also seem to be confusing Boost.Lambda, Boost.Bind, and the standard function object helpers.

hmm? http://www.boost.org/doc/libs/1_43_0/doc/html/lambda/s08.html#id1365369 <snip> Basically, the Boost Bind library (BB in the sequel) implements the bind expression part of BLL. <snap> and I did not experience any difference if using lambda's bind or the original bind

...

I ran your test. Case 3 runs in 0.65s, case 1 and 2 in 0.82s.

...

...
Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s (I compile with gcc -O3 -Wall -lstdc++, nothing special)

...
Now I'm really curious what your compiler is ?

...

I used g++ -O3 -fomit-frame-pointer -march=native Ubuntu Linux, Intel Core 2 Duo.

Ok., so I use your g++ -O3 -fomit-frame-pointer -march=native SuSE Linux, Intel Core 2 Duo P9500 2.53GHz and I get Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s. This is the same ratio I already had (at least I'm not confused about that). So your 0.65s for case 3 is really amazing. wb

David Abrahams

8:06 p.m.

On Aug 4, 2010, at 3:22 PM, WB wrote:

...

Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s. This is the same ratio I already had (at least I'm not confused about that). So your 0.65s for case 3 is really amazing.

Knowing nothing about your particular code... chances are good that somebody is not measuring what he thinks he's measuring. If you want a sample of the kinds of issues you should consider, have a look at http://github.com/boost-lib/parameter/blob/master/test/efficiency.cpp -- David Abrahams BoostPro Computing http://boostpro.com

OvermindDL1

5 Aug 5 Aug

3:29 a.m.

On Wed, Aug 4, 2010 at 2:06 PM, David Abrahams wrote:

...

On Aug 4, 2010, at 3:22 PM, WB wrote:

...
Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s. This is the same ratio I already had (at least I'm not confused about that). So your 0.65s for case 3 is really amazing.

Knowing nothing about your particular code... chances are good that somebody is not measuring what he thinks he's measuring. If you want a sample of the kinds of issues you should consider, have a look at http://github.com/boost-lib/parameter/blob/master/test/efficiency.cpp

Again, try this in phoenix: boost::transform(v, arg1 * 2); The arg1 * 2 creates this same thing: struct anonymousThing { template<typename T> T operator(const T& t) const { return t * 2; } } So the above executes something like this: boost:;transform(v, anonymousThing()); Which the compiler should rather well optimize completely out, at least it always does for me in VC. You can also use _1 in phoenix (*IF* you pull it from the phoenix namespace and you do *NOT* include boost/bind.hpp), but arg1 is better as it is completely unambiguous so you have no worry.

Gaetano Mendola

7:37 a.m.

On 08/04/2010 09:22 PM, WB wrote:

...

and I get Case 3 runs in ~1.8...1.9s, case 1 and 2 in 0.81...0.84s. This is the same ratio I already had (at least I'm not confused about that). So your 0.65s for case 3 is really amazing.

I'm getting: Using a Xeon W5590 at 3.33GHz, with GCC 4.1.2 0 -> 0.08 1 -> 0.7 2 -> 0.7 3 -> 0.98 Using a Xeon X5650 at 2.67Ghz, with GCC 4.4.3 0 -> 0.09 1 -> 0.61 2 -> 0.61 3 -> 0.52 on same server using GCC 4.1.3 0 -> 0.09 1 -> 0.81 2 -> 0.81 3 -> 1.15 Regards Gaetano Mendola

Kai Benndorf

12:03 p.m.

New subject: [interprocess] shared memory, condtions, deadlock

Just for the curious,

...

I'm trying to use the shared memory functionality of the boost interprocess library for implementing a client and a server like process, where the server calculates something for the client. I use condition variables for the process synchronization. The >implementation works initially, but I am getting deadlocks, now.

in the meanwhile i found a solution on my own. The lock on the mutex must be made directly at the beginning of each process, so that the mutex is only unlocked if wait() is called. Regards Kai Something like this works: class SharedData { // Excuted by the client process void calc() { scoped_lock lock( itsMutex ); while( somethingToCalc ) { // prepare data itsPrepared.notify_one(); itsCalced.wait( lock ); // process results } } // Excuted by the server process void serverCalc() { scoped_lock lock( itsMutex); while( somethingToCalc ) { itsPrepared.wait( lock ); // Calc itsRhsCalced.notify_one(); } } private: boost::interprocess::interprocess_mutex itsMutex; boost::interprocess::interprocess_condition itsPrepared; boost::interprocess::interprocess_condition itsCalced; // some further data };

David Abrahams

12 Aug 12 Aug

12:51 a.m.

New subject: [interprocess] shared memory, condtions, deadlock

At Thu, 05 Aug 2010 14:03:54 +0200, Kai Benndorf wrote:

...

The lock on the mutex must be made directly at the beginning of each process, so that the mutex is only unlocked if wait() is called.

Isn't that just the normal rule for using condition variables? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Gaetano Mendola

4 Aug 4 Aug

9:46 a.m.

On 08/04/2010 03:02 AM, Mathias Gaunard wrote:

...

...
2) Then, it is suggested furthermore that if foo() is a (simple?) nested function, say T=double and,

double foo(double x) { return 2*x; }

then one should do 'functional composition', i.e. create the functor for the previous point 1) by

boost::bind(std::multiplies<double>(),2,_1)

Replace all that by just 2*_1

Does it work? I was not able to: error: no match for ‘operator*’ in ‘2 * boost::lambda::<unnamed>::_1’ Regards Gaetano Mendola

Jeff Flinn

11:49 a.m.

Mathias Gaunard wrote:

...

Le 03/08/2010 15:35, Wolfram Brenig a écrit :

...
Hi,

as a casual/newbie boost-user I'm trying to come to grips with why/when nested functional composition with boost::bind might be of help - following Karlsson's book, in its fifth printing.

---------------------- 1) It seems that he suggests, that if one is into doing something like

T faa(int); T foo(const& T);

int i; vector<T> v(10); // container for(i=0; i<10; i++) { v[i] = faa(i); // set it v[i] = foo(v[i]); // operate on it }

then instead of the 'old style' for-loops it would be more desirable to use 'STL style' algorithms with the functors created from boost::bind, say like

transform(v.begin(),v.end(),boost::bind(&foo,_1));

You might as well write boost::transform(v, foo);

But, personally, since that's a fairly trivial algorithm, I don't find that much more interesting than foreach(int& i, v) i = foo(i);

There are other situations, though, where using higher-order programming really shines.

...
2) Then, it is suggested furthermore that if foo() is a (simple?) nested function, say T=double and,

double foo(double x) { return 2*x; }

then one should do 'functional composition', i.e. create the functor for the previous point 1) by

boost::bind(std::multiplies<double>(),2,_1)

Replace all that by just 2*_1

Huh, with which lib? Is _1 in the global namespace, or is this assuming a using namespace? Thanks, Jeff

Gaetano Mendola

1:11 p.m.

On 08/04/2010 01:49 PM, Jeff Flinn wrote:

...

Huh, with which lib? Is _1 in the global namespace, or is this assuming a using namespace?

#include boost::lambda::_1 Regards Gaetano Mendola

Thomas Heller

1:55 p.m.

Gaetano Mendola wrote:

...

On 08/04/2010 01:49 PM, Jeff Flinn wrote:

...
Huh, with which lib? Is _1 in the global namespace, or is this assuming

...
a using namespace?

#include

boost::lambda::_1 boost.lambda placeholders don't work with boost.bind. "::_1" should solve this problem (or don't pull in everything form the boost::lambda namespace).

...

Regards Gaetano Mendola

Mathias Gaunard

4:58 p.m.

On 04/08/10 14:55, Thomas Heller wrote:

...

Gaetano Mendola wrote:

...
On 08/04/2010 01:49 PM, Jeff Flinn wrote:

...
Huh, with which lib? Is _1 in the global namespace, or is this assuming

...
a using namespace?

#include

boost::lambda::_1

boost.lambda placeholders don't work with boost.bind. "::_1" should solve this problem (or don't pull in everything form the boost::lambda namespace).

In the code of the original poster, he used boost::lambda::bind, even if he referred to it as boost::bind.

Christopher Jefferson

5 Aug 5 Aug

10:40 a.m.

On 3 Aug 2010, at 15:35, Wolfram Brenig wrote:

...

Hi,

as a casual/newbie boost-user I'm trying to come to grips with why/when nested functional composition with boost::bind might be of help - following Karlsson's book, in its fifth printing.

You are not comparing like with like in your example, the compiler is over optimising. I tried comparing these functions: void f1(vector<int>& v) { for(int i=0;i

Mathias Gaunard

6 Aug 6 Aug

7:11 a.m.

Le 05/08/2010 11:40, Christopher Jefferson wrote:

...

One very useful g++ flag is "-fdump-tree-all". This will dump out a number of files which show you how g++ is optimising the code. The format isn't documented, but it is fairly readable.

Isn't it GIMPLE? That's quite well documented.

5393

Age (days ago)

5402

Last active (days ago)

List overview

Download

20 comments

11 participants

participants (11)

Christopher Jefferson
David Abrahams
Gaetano Mendola
Jeff Flinn
Kai Benndorf
Mathias Gaunard
OvermindDL1
Steven Watanabe
Thomas Heller
WB
Wolfram Brenig