
On 10/15/2010 11:06 AM, Robert Ramey wrote:
David Abrahams wrote:
By the way, I did run a comparison between a segmented iteration, with nested for-loops, and a flattened iteration. I was very much surprised in the runtime difference. It obviously depends on the operation you do per iteration, but for just incrementing a counter, the difference was 2 or 3 orders of magnitude. Needless to say, I'm more convinced of the value of exposing the segmentation of data structures now.
2 or 3 orders of magnitude? 100 to 1000 times faster? is that right?
Robert Ramey
No, it's not right; upon looking at the assembly output (which I should've done prior to making hasty conclusions), I think the original nested for-loop test I tried was too easy for the compiler to optimize. I changed the test to actually iterate over a segmented data structure (I was just doing the looping constructs originally, without any actual data structure), and the results are much less dramatic: the nested for-loop iteration is about 20% faster. I think this is in line with the Austern paper. Here is the code: #include <iostream> #include <vector> #include <boost/timer.hpp> int main(int argc, char* argv[]) { typedef std::vector< int > v_type; typedef v_type::iterator v_it_type; typedef std::vector< v_type > vv_type; typedef vv_type::iterator vv_it_type; vv_type vv(1000); for(int i = 0; i != 1000; ++i) vv[i].resize(1000); { boost::timer timer; for(int i = 0; i != 1000; ++i) for(vv_it_type vv_it = vv.begin(), vv_end = vv.end(); vv_it != vv_end; ++vv_it) { v_type& v = *vv_it; for(v_it_type v_it = vv_it->begin(), v_end = vv_it->end(); v_it != v_end; ++v_it) *v_it = 42; } const double elapsed = timer.elapsed(); std::cout << elapsed << std::endl; } { boost::timer timer; for(int i = 0; i != 1000; ++i) { vv_it_type vv_it = vv.begin(), vv_end = vv.end(); v_it_type v_it = vv_it->begin(), v_end = vv_it->end(); while(vv_it != vv_end) { *v_it = 42; ++v_it; if(v_it == v_end) { ++vv_it; if(vv_it != vv_end) v_it = vv_it->begin(), v_end = vv_it->end(); } } } const double elapsed = timer.elapsed(); std::cout << elapsed << std::endl; } return 0; } Still faster, and if you replace the std::vector's with, e.g., boost::iterator_range< boost::counting_iterator< ... > >'s, you might get a better measure of the actual loop overhead, as you'd avoid the memory accesses (speculation). - Jeff