Re: [boost] BOOST_FOREACH slow?

19 Nov 2008

      Peter Bartlett wrote:
...
Quoting bjorn@4roald.org:
...
Ok, do anybody know of other compilers/libs that implement loop 
unrolling
in std::for_each?
Are you looking for optimizations specifically for std::for_each other 
than those done by the compiler for general for loops?
Neither, or both; I am interested in what effects loop unrolling has on 
performance in general, and any straight forward and practical means of 
achieving that effect given that it is worth its while.
...
If so, possibly of interest is the fact that new in GCC 4.3 is the 
(experimental) ability to call parallel versions of certain algorithms 
in <algorithm> and <numeric> via OpenMP:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
I think this is for completely different use-cases than those of the 
benchmark in question.  Concurrency in algorithms are certainly of 
interest, but I would intuitively use that if the body of the concurrent 
part is of some significant size.  Loop unrolling is for the case where 
the body of the loop is tiny.   I would be surprised if the OpenMP based 
concurrency in a bench mark like this is not less efficient as far as 
wall clock, and as far as CPU time it certainly will be.  With optimal 
partitioning of the loop iterations, it is possible an OpenMP version 
will win a wall clock battle, but that is something I doubt a library 
can archive.   I may be wrong.

Also, it is not clear to me how concurrency in std algorithms will 
effect correctness in my programs.
...
This could put some distance between BOOST_FOREACH and std::for_each 
for certain use cases though I haven't experimented myself.
Ok, I gave it a quick shot for the fun of it.  But I only get confused 
from the results.  It looks to me like the OpenMP code and flags have no 
effects on std::for_each, neither does explicit calls to 
__gnu_parallel::for_each give different results.  It looks as if it is 
falling back to the serialized version.  I have not spent any time 
figuring why.

I added:

int main()
{
    // don't know if these are needed, I think there may be sensible 
defaults
    const int threads_wanted = 4;  // I have a 4 core CPU
    omp_set_dynamic(false);
    omp_set_num_threads(threads_wanted);

    ....

#ifdef _GLIBCXX_PARALLEL
  {
    boost::timer timer;
    sum_func sum = __gnu_parallel::for_each( data.begin(), data.end(), 
sum_func() );
    double elapsedtime = timer.elapsed();
    std::cout << "__gnu_parallel::for_each took "
          << elapsedtime << " seconds."
          << std::endl;
    force_calculation(sum.sum);
  }
#endif

to the benchmark, and compiled and executed like this:

g++  -D_GLIBCXX_PARALLEL -I../../boost_1_37_0 -march=native -fopenmp -O3 foreach_benchmark.cpp-o foreach_benchmark && foreach_benchmark
Iterator accumulate took 0.09 seconds.
Pointer accumulate took 0.09 seconds.
Iterator for loop took 0.09 seconds.
Pointer for loop took 0.09 seconds.
Index for loop took 0.09 seconds.
Index for_each took 0.09 seconds.
Pointer for_each took 0.09 seconds.
BOOST_FOREACH took 0.09 seconds.
std::for_each took 0.1 seconds.
__gnu_parallel::for_each took 0.08 seconds.

Nothing exiting, tell me why?  

I guess that boost::timer may not be appropriate to measure this if we in fact run in parallel threads.  

-- 
Bjørn

Re: [boost] BOOST_FOREACH slow?

Bjørn Roald