
On Sun, Apr 3, 2011 at 1:23 PM, Lorenzo Caminiti <lorcaminiti@gmail.com> wrote:
I have done an (hopefully more correct) benchmark of Boost.Local performances compared with the alternative methods -- please check my doing :)
In summary: 1) Boost.Phoenix, global functors, and local functors run in ~15s. 2) Boost.Lambda runs in ~40s. 3) Boost.Local runs in ~53s. 4) I don't have a C++0x lambda compiler so I could not benchmark C++0x lambdas.
I don't know why Boost.Lambda takes longer than 1). Boost.Local seems to take longer than 1) because of the extra virtual function call introduced by the trick that allows to pass the local struct as a template parameter -- I will follow up with another email to explain this point in detail.
This is the trick used by Boost.Local to call a function defined within a local struct from a functor passed as template parameter: 1) The local struct inherits from abstract_function and implementing operator(). 2) The local function is of a non-local type `function` so it can be passed as template parameter. 3) The functor `function` calls abstract_function's virtual operator() which in turn calls the local struct implementation of operator(). The following code shows the idea: #include <iostream> #include <vector> #include <algorithm> #define N 10000 template<typename R, typename A0> struct abstract_function { virtual R operator()(A0) = 0; }; template<typename R, typename A0> struct function { function(abstract_function<R, A0>& ref): ptr_(&ref) {} R operator()(A0 a0) { return (*ptr_)(a0); } private: abstract_function<R, A0>* ptr_; }; int main() { double sum = 0.0; int factor = 10; struct add_function: abstract_function<void, const double&> { add_function(double& _sum, const int& _factor): sum_(_sum), factor_(_factor) {} void operator()(const double& num) { return body(num, sum_, factor_); } private: double& sum_; const int& factor_; void body(const double& num, double& sum, const int& factor) { sum += factor * num; } }; add_function functor_add(sum, factor); function<void, const double&> add(functor_add); std::vector<double> v(N * 100); std::fill(v.begin(), v.end(), 10); for (size_t n = 0; n < N; ++n) { // for (size_t i = 0; i < v.size(); ++i) { // functor_add(v[i]); // (1) // add(v[i]); // (2) // } std::for_each(v.begin(), v.end(), add); // (3) OK add as tparam! } std::cout << sum << std::endl; return 0; } The call at line (3) calls functionr::operator() --<<virtual>>--> abstract_function::operator() --> add_function::operator() so the local struct operator() is called. A) Now if I comment line (3), uncomment the for-loop and line (2): ... for (size_t n = 0; n < N; ++n) { for (size_t i = 0; i < v.size(); ++i) { // functor_add(v[i]); // (1) add(v[i]); // (2) } // std::for_each(v.begin(), v.end(), add); // (3) OK add as tparam! } ... This runs in 49s. B) But if I use line (1) instead of (2): ... for (size_t n = 0; n < N; ++n) { for (size_t i = 0; i < v.size(); ++i) { functor_add(v[i]); // (1) // add(v[i]); // (2) } // std::for_each(v.begin(), v.end(), add); // (3) OK add as tparam! } ... This runs in 15s!! As far I can see, line (1) is faster because it does not have the overhead of the virtual function call that line (2) has: A) Line (2) call: functionr::operator() --<<virtual>>--> abstract_function::operator() --> add_function::operator() (runs in 49s) B) Line (1) call: add_function::operator() (runs in 15s) What do you think? Is there any way I can make the A) faster (i.e., run-time similarly to B))? Thank you very much. -- Lorenzo