Re: [boost] Accelerating algorithms with SIMD - Segmented iteratorsand alternatives

15 Oct 2010

      On 10/15/2010 11:06 AM, Robert Ramey wrote:
...
David Abrahams wrote:
...
...
By the way, I did run a comparison between a segmented iteration,
with nested for-loops, and a flattened iteration.  I was very much
surprised in the runtime difference.  It obviously depends on the
operation you do per iteration, but for just incrementing a counter,
the difference was 2 or 3 orders of magnitude.  Needless to say, I'm
more convinced of the value of exposing the segmentation of data
structures now.
2 or 3 orders of magnitude? 100 to 1000 times faster? is that right?
Robert Ramey
No, it's not right; upon looking at the assembly output (which I 
should've done prior to making hasty conclusions), I think the original 
nested for-loop test I tried was too easy for the compiler to optimize. 
  I changed the test to actually iterate over a segmented data structure 
(I was just doing the looping constructs originally, without any actual 
data structure), and the results are much less dramatic: the nested 
for-loop iteration is about 20% faster.  I think this is in line with 
the Austern paper.  Here is the code:

#include <iostream>
#include <vector>

#include <boost/timer.hpp>

int main(int argc, char* argv[])
{
     typedef std::vector< int > v_type;
     typedef v_type::iterator v_it_type;
     typedef std::vector< v_type > vv_type;
     typedef vv_type::iterator vv_it_type;
     vv_type vv(1000);
     for(int i = 0; i != 1000; ++i)
         vv[i].resize(1000);
     {
         boost::timer timer;
         for(int i = 0; i != 1000; ++i)
             for(vv_it_type vv_it = vv.begin(), vv_end = vv.end(); vv_it 
!= vv_end; ++vv_it) {
                 v_type& v = *vv_it;
                 for(v_it_type v_it = vv_it->begin(), v_end = 
vv_it->end(); v_it != v_end; ++v_it)
                     *v_it = 42;
             }
         const double elapsed = timer.elapsed();
         std::cout << elapsed << std::endl;
     }
     {
         boost::timer timer;
         for(int i = 0; i != 1000; ++i) {
             vv_it_type vv_it = vv.begin(), vv_end = vv.end();
             v_it_type v_it = vv_it->begin(), v_end = vv_it->end();
             while(vv_it != vv_end) {
                 *v_it = 42;
                 ++v_it;
                 if(v_it == v_end) {
                     ++vv_it;
                     if(vv_it != vv_end)
                         v_it = vv_it->begin(), v_end = vv_it->end();
                 }
             }
         }
         const double elapsed = timer.elapsed();
         std::cout << elapsed << std::endl;
     }
     return 0;
}

Still faster, and if you replace the std::vector's with, e.g., 
boost::iterator_range< boost::counting_iterator< ... > >'s, you might 
get a better measure of the actual loop overhead, as you'd avoid the 
memory accesses (speculation).

- Jeff

Re: [boost] Accelerating algorithms with SIMD - Segmented iteratorsand alternatives

Jeffrey Lee Hellrung, Jr.