
I'm a newbie to the list; if an answer to my question is in a FAQ I'm not aware of, please let me know; but I've googled the documentation and don't see it. In fact, I see little regarding performance factors to consider when using boost, so I may not be looking in the right places. We're running a performance-sensitive application and so I've begun to find points to optimize the code. The following is a digest of our experience bringing boost functionality into this effort (forgive the generic code; it was sanitized for intellectual property perservation, etc.): I am posting this experience in hopes that someone can confirm our reasoning and perhaps point us in the right direction for information, etc., for thinking through these kind of issues in the future. Any advice is appreciated. You'll see that, in part, my woes came from violating the cardinal rule of only optimizing that code which is most expensive; nonetheless, this specific instance is an involved enough example that I thought it worth pursuing... ============== During the writing of an interface, a function was added to a templated class, that has-a stl vector, to sum SomeValue of its contained elements: const unsigned allElementsValue() const { unsigned result = 0; std::vector<T>::const_iterator thisElement(sequence_.begin()); const std::vector<T>::const_iterator endElement (sequence_.end()); while (thisElement != endElement) { result += thisElement->getSomeValue(); } } At the time this interface was being written, it was a departure from an prior approach and so it was important that performance was not significantly affected. During this effort, we used KCachegrind and found out that this call was (relatively) more expensive than hoped. Since std::accumulate was supposed to be a quicker way to walk through elements, that approach was tried and the following function resulted: const unsigned allElementsValue() const { unsigned result = 0; return std::accumulate ( boost::make_transform_iterator (sequence_.begin(), boost::mem_fn(&T::getSomeValue) ), boost::make_transform_iterator (sequence_.end(), boost::mem_fn(&T::getSomeValue) ), 0 ); } We now note the use of make_transform_iterator and the use of a 0-ary member function of the constituent element objects. This code is semantically equivalent and yet yielded an relatively significant performance improvement. The performance increase can be taken in part to considering that STL and boost use templates and so some of what why otherwise have to be done at run-time can now be done at compile-time. So we naively took the moral of the story to be that std::accumulate is Good and walking through elements manually with a loop is Bad. The following code was later examined. Note that this particular code happens to be executed many, many times in our run. So any performance change to this code will have a major effect to the overall performance of the program. Certainly a ripe and ready candidate for applying this new conclusion. The code was changed from this... ----------- // Note that a and b (referred to in calcSomeValue below) are // references to other objects passed into the function that // has this code float result = 0.0; // SomeObjectPtrCont is a typedef for a stl vector that contains // boost::shared_ptrs managing pointers to SomeObject. // SomeObjectPtrs_ is an instance of SomeObjectPtrCont SomeObjectPtrCont::const_iterator thisIter(SomeObjectPtrs_.begin()); const SomeObjectPtrCont::const_iterator endIter (SomeObjectPtrs_.end()); while (thisIter != endIter) { result += (*thisIter)->calcSomeValue(a, b); ++thisIter; } ----------- ...to this: ----------- // Note that a and b (referred to in calcSomeValue below) are // references to other objects passed into the function that // has this code float result = 0.0; // Use a typedef so that we can handle overloading of calcSomeValue typedef float (SomeObject::* calcFunct)(const T&, const OtherObjB&) const; // Use stl accumulate to do same thing as the while loop result = std::accumulate ( boost::make_transform_iterator ( SomeObjectPtrs_.begin(), boost::bind(static_cast<calcFunct>(&SomeObject::calcSomeValue), _1, a, b) ), boost::make_transform_iterator ( SomeObjectPtrs_.end(), boost::bind(static_cast<evalFunct>(&SomeObject::calcSomeValue), _1, a, b) ), float(0.0) ); ----------- Note that this is a significantly more sophisticated use of std::accumulate with boost::make_transform_iterator and boost::bind. Indeed, now bind must take an argument, marked by the _1, to link the function being called, evaluate, with each element in the collection being walked through. The evaluate() function itself is binary and not a member of the constituent elements. Nonetheless, it is semantically equivalent to the more conventional while loop form. Remembering our earlier simplistic conclusion that std::accumulate is Good and that while-loops are Bad, this change was made, checked in, and pride swelled within the author (silly me) for so cleverly using boost, et al. The expectation at the time was that since templates would be involved more, then much of the logic would be resolved at compile-time rather than run-time giving us a performance increase just like with the simpler accumulation approach (basically since one form of accumulation is good, than all are better). We discovered that the revised code now ran 4 times SLOWER. KCachegrind showed the problem was the above code. So what the *&^% gives? Rather than analyze the situation, it's always better to tinker :), so the next attempt involved an approach that std::accumulate also allows: ----------- // Note that a and b (referred to in calcSomeValue below) are // references to other objects passed into the function that // has this code float result = 0.0; // Use a typedef so that we can handle overloading of calcSomeValue typedef float (SomeObject::* calcFunct)(const T&, const OtherObjB&) const; // Alternate accumulation approach result = std::accumulate ( SomeObjectPtrs_.begin(), SomeObjectPtrs_.end(), float(0.0), boost::bind (std::plus<float>(), _1, boost::bind(static_cast<calcFunct>(&SomeObject:calcSomeValue), _2, a, b) ) ); ----------- This sped up the code but now it was measured to be 3 times slower than the original form, though now not 4 times as slow. Hardly a ringing endorsement for the use of std::accumulate. Our best conclusion so far is that involving boost::bind in this kind of operation causes accumulate to go through additional function redirection at run-time to access the final evaluate() function desired. This more than offsets any potential gain that might have been realized by the std::accumulate because now its use of templated compilation is thwarted by all of this run-time redirection. Note that because of the arguments involved, the compiled code under boost::bind must now move variables onto the stack so that when the underlying accumulate calls this implicit functor, those values are part of its state. All of this is now done at run-time where as before, without having to move such variables, it was really just a change of a function pointer's value. So, the new current thinking is that std::accumulate is Better than simple while-loops that walk through elements which is still Bad. However, boost::bind under std::accumulate is Worse. At this point, the best advice is that one should use std::accumulate whenever simple accumulations are needed. However, once one has to start involving boost::bind on 1+-ary functions or anything else that requires new run-time function redirection, then retreating back to while-loop is a better performance solution. Thank you all for reading this far... Kind regards, Richard Newman Crowley Davis Research richard@nospam.cdres.com (take out the nospam. to email me directly)