
Dear boosters, In the "looping construct" thread, a while ago David Abrahams said that we should do a test, to compare all looping solutions. I've made such a test. It tests against: - keeping internal functions in crange<> - virtual functions - BOOST_FOREACH comparing to using raw iterators. The tests are pretty heavy. I've tested on a Win2000 box/512 Mb of RAM/1.8Ghz Intel processor (laptop), VC7.1 with all optimizations on. It seems that Dave, you were right about the virtual functions - the vtable can be cached. But it seems to happen only when the loop contents are pretty trivial (if within the loop there are some not-so-simple instructions, the vtable aren't cached - see the 'set_to_az' test). Please if any of you can spare about 15 minutes to run the tests, it would be great! Also, any improvement to the tests is welcome. I've attached my results and the code to test. Conclusions (mine :D): - BOOST_FOREACH rocks big time - vtables can be cached, in case the instructions within the loop are not too compilcated - the overhead is pretty large for both vtables and internal functions (at least for vc7.1) I'm eager to see how the tests run on different platforms. Best, John -- John Torjo Freelancer -- john@torjo.com -- http://www.torjo.com/logview/ - viewing/filtering logs is just too easy! ***************************************************************************** ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitmate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** *****************************************************************************

John Torjo <john.lists@torjo.com> writes:
Dear boosters, In the "looping construct" thread, a while ago David Abrahams said that we should do a test, to compare all looping solutions.
I've made such a test. It tests against: - keeping internal functions in crange<> - virtual functions - BOOST_FOREACH comparing to using raw iterators.
Where can we see the results?
The tests are pretty heavy. I've tested on a Win2000 box/512 Mb of RAM/1.8Ghz Intel processor (laptop), VC7.1 with all optimizations on.
It seems that Dave, you were right about the virtual functions - the vtable can be cached.
I didn't suggest that it'd be chached, but that some compiler might notice that the actual derived type was fixed at compile-time and eliminate the function pointer indirection, and that this was a little more likely with the compiler's virtual tables than with some simulation that we happen to write. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
John Torjo <john.lists@torjo.com> writes:
Dear boosters, In the "looping construct" thread, a while ago David Abrahams said that we should do a test, to compare all looping solutions.
I've made such a test. It tests against: - keeping internal functions in crange<> - virtual functions - BOOST_FOREACH comparing to using raw iterators.
Where can we see the results?
I noticed they were included in the .zip file John sent. Here is the summary: Averages for 128000elements (except first test): raw time : 229.127 secs func calls : 323.871 secs (avg percentage 183.074) vtable calls: 307.038 secs (avg percentage 174.966) foreach calls: 236.767 secs (avg percentage 104.929) -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> writes:
David Abrahams wrote:
John Torjo <john.lists@torjo.com> writes:
Dear boosters, In the "looping construct" thread, a while ago David Abrahams said that we should do a test, to compare all looping solutions.
I've made such a test. It tests against: - keeping internal functions in crange<> - virtual functions - BOOST_FOREACH comparing to using raw iterators. Where can we see the results?
I noticed they were included in the .zip file John sent.
I didn't see an attachment.
Here is the summary:
Averages for 128000elements (except first test): raw time : 229.127 secs func calls : 323.871 secs (avg percentage 183.074) vtable calls: 307.038 secs (avg percentage 174.966) foreach calls: 236.767 secs (avg percentage 104.929)
Note too that other compilers may exhibit different tradeoffs. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

John Torjo wrote:
In the "looping construct" thread, a while ago David Abrahams said that we should do a test, to compare all looping solutions.
I've made such a test. It tests against: - keeping internal functions in crange<> - virtual functions - BOOST_FOREACH comparing to using raw iterators.
The tests are pretty heavy. I've tested on a Win2000 box/512 Mb of RAM/1.8Ghz Intel processor (laptop), VC7.1 with all optimizations on.
Here are my test results (I had to add some typename's to make it compile on gcc): -------------------------------------------------------------------- 1) Pentium M, 1.6 GHz, 1GByte RAM (Laptop), Intel V8.0 -O3, W2K: Test 1 of 1 (for 128000 elements) Test set_to_idx, vector<int> took RAW : 1.071 secs func calls : 2.694 secs (251.541%) vtable calls: 3.095 secs (288.982%) foreach calls: 1.882 secs (175.724%) Test set_to_idx, list<int> took RAW : 6.61 secs func calls : 9.965 secs (150.756%) vtable calls: 7.33 secs (110.893%) foreach calls: 6.61 secs (100%) Test set_to_idx, deque<int> took RAW : 1.722 secs func calls : 3.465 secs (201.22%) vtable calls: 3.114 secs (180.836%) foreach calls: 1.723 secs (100.058%) Test set_to_idx, vector<seq> took RAW : 3.035 secs func calls : 3.035 secs (100%) vtable calls: 3.035 secs (100%) foreach calls: 3.035 secs (100%) Test set_to_idx, list<seq> took RAW : 2.574 secs func calls : 2.594 secs (100.777%) vtable calls: 2.603 secs (101.127%) foreach calls: 2.574 secs (100%) Test set_to_idx, deque<seq> took RAW : 2.613 secs func calls : 2.634 secs (100.804%) vtable calls: 2.613 secs (100%) foreach calls: 2.613 secs (100%) Test multiply_by_2, vector<int> took RAW : 1.122 secs func calls : 2.784 secs (248.128%) vtable calls: 2.573 secs (229.323%) foreach calls: 1.132 secs (100.891%) Test multiply_by_2, list<int> took RAW : 6.609 secs func calls : 6.69 secs (101.226%) vtable calls: 6.629 secs (100.303%) foreach calls: 6.609 secs (100%) Test multiply_by_2, deque<int> took RAW : 1.993 secs func calls : 3.715 secs (186.402%) vtable calls: 3.836 secs (192.474%) foreach calls: 2.063 secs (103.512%) Test multiply_by_max20, vector<int> took RAW : 6.93 secs func calls : 10.736 secs (154.921%) vtable calls: 9.804 secs (141.472%) foreach calls: 6.93 secs (100%) Test multiply_by_max20, list<int> took RAW : 10.225 secs func calls : 10.755 secs (105.183%) vtable calls: 11.416 secs (111.648%) foreach calls: 10.225 secs (100%) Test multiply_by_max20, deque<int> took RAW : 5.888 secs func calls : 9.214 secs (156.488%) vtable calls: 8.432 secs (143.207%) foreach calls: 6.209 secs (105.452%) Test add_10, vector<int> took RAW : 0.601 secs func calls : 1.522 secs (253.245%) vtable calls: 1.292 secs (214.975%) foreach calls: 0.611 secs (101.664%) Test add_10, list<int> took RAW : 3.525 secs func calls : 3.595 secs (101.986%) vtable calls: 3.525 secs (100%) foreach calls: 3.525 secs (100%) Test add_10, deque<int> took RAW : 1.092 secs func calls : 2.103 secs (192.582%) vtable calls: 1.962 secs (179.67%) foreach calls: 1.092 secs (100%) Test add_one_char, vector<seq> took RAW : 1.041 secs func calls : 1.412 secs (135.639%) vtable calls: 1.462 secs (140.442%) foreach calls: 1.291 secs (124.015%) Test add_one_char, list<seq> took RAW : 1.372 secs func calls : 1.372 secs (100%) vtable calls: 1.402 secs (102.187%) foreach calls: 1.562 secs (113.848%) Test add_one_char, deque<seq> took RAW : 1.261 secs func calls : 1.472 secs (116.733%) vtable calls: 1.522 secs (120.698%) foreach calls: 1.462 secs (115.94%) Test set_to_len, vector<seq> took RAW : 2.323 secs func calls : 2.353 secs (101.291%) vtable calls: 2.324 secs (100.043%) foreach calls: 2.353 secs (101.291%) Test set_to_len, list<seq> took RAW : 1.592 secs func calls : 1.602 secs (100.628%) vtable calls: 1.593 secs (100.063%) foreach calls: 1.592 secs (100%) Test set_to_len, deque<seq> took RAW : 1.892 secs func calls : 1.913 secs (101.11%) vtable calls: 1.893 secs (100.053%) foreach calls: 1.892 secs (100%) Test set_to_az, vector<seq> took RAW : 6.97 secs func calls : 6.97 secs (100%) vtable calls: 6.97 secs (100%) foreach calls: 6.97 secs (100%) Test set_to_az, list<seq> took RAW : 6.149 secs func calls : 6.209 secs (100.976%) vtable calls: 6.229 secs (101.301%) foreach calls: 6.149 secs (100%) Test set_to_az, deque<seq> took RAW : 6.149 secs func calls : 6.199 secs (100.813%) vtable calls: 6.209 secs (100.976%) foreach calls: 6.149 secs (100%) Test noop, vector<int> took RAW : 0.742 secs func calls : 3.074 secs (414.286%) vtable calls: 2.754 secs (371.159%) foreach calls: 0.742 secs (100%) Test noop, list<int> took RAW : 3.826 secs func calls : 4.817 secs (125.902%) vtable calls: 5.098 secs (133.246%) foreach calls: 3.846 secs (100.523%) Test noop, deque<int> took RAW : 1.753 secs func calls : 3.856 secs (219.966%) vtable calls: 3.865 secs (220.479%) foreach calls: 1.753 secs (100%) Test noop, vector<seq> took RAW : 0.29 secs func calls : 1.202 secs (414.483%) vtable calls: 1.152 secs (397.241%) foreach calls: 0.29 secs (100%) Test noop, list<seq> took RAW : 3.005 secs func calls : 3.515 secs (116.972%) vtable calls: 3.946 secs (131.314%) foreach calls: 3.345 secs (111.314%) Test noop, deque<seq> took RAW : 0.671 secs func calls : 1.462 secs (217.884%) vtable calls: 1.402 secs (208.942%) foreach calls: 0.671 secs (100%) -------------------------------------------------------------------- 2) Dual P4, 2.8GHz, 1GHz RAM, Intel V8.0 -O3, Linux Heavy Load! Test 1 of 1 (for 128000 elements) Test set_to_idx, vector<int> took RAW : 0.72 secs func calls : 2.68 secs (372.222%) vtable calls: 3.13 secs (434.722%) foreach calls: 0.72 secs (100%) Test set_to_idx, list<int> took RAW : 2.89 secs func calls : 3.61 secs (124.913%) vtable calls: 3.75 secs (129.758%) foreach calls: 2.89 secs (100%) Test set_to_idx, deque<int> took RAW : 1.11 secs func calls : 3.77 secs (339.64%) vtable calls: 4.22 secs (380.18%) foreach calls: 1.34 secs (120.721%) Test set_to_idx, vector<seq> took RAW : 5.82 secs func calls : 5.82 secs (100%) vtable calls: 5.82 secs (100%) foreach calls: 5.82 secs (100%) Test set_to_idx, list<seq> took RAW : 6.02 secs func calls : 6.02 secs (100%) vtable calls: 6.02 secs (100%) foreach calls: 6.02 secs (100%) Test set_to_idx, deque<seq> took RAW : 6.39 secs func calls : 6.39 secs (100%) vtable calls: 6.39 secs (100%) foreach calls: 6.39 secs (100%) Test multiply_by_2, vector<int> took RAW : 1.12 secs func calls : 3.59 secs (320.536%) vtable calls: 3.62 secs (323.214%) foreach calls: 1.12 secs (100%) Test multiply_by_2, list<int> took RAW : 2.68 secs func calls : 4.17 secs (155.597%) vtable calls: 4.15 secs (154.851%) foreach calls: 2.74 secs (102.239%) Test multiply_by_2, deque<int> took RAW : 1.93 secs func calls : 4.28 secs (221.762%) vtable calls: 5.09 secs (263.731%) foreach calls: 2.32 secs (120.207%) Test multiply_by_max20, vector<int> took RAW : 19.06 secs func calls : 24.11 secs (126.495%) vtable calls: 24.81 secs (130.168%) foreach calls: 19.22 secs (100.839%) Test multiply_by_max20, list<int> took RAW : 14.7 secs func calls : 18.65 secs (126.871%) vtable calls: 18.84 secs (128.163%) foreach calls: 14.7 secs (100%) Test multiply_by_max20, deque<int> took RAW : 15.2 secs func calls : 19.29 secs (126.908%) vtable calls: 19.51 secs (128.355%) foreach calls: 15.77 secs (103.75%) Test add_10, vector<int> took RAW : 0.56 secs func calls : 1.75 secs (312.5%) vtable calls: 1.87 secs (333.929%) foreach calls: 0.56 secs (100%) Test add_10, list<int> took RAW : 1.39 secs func calls : 2 secs (143.885%) vtable calls: 2 secs (143.885%) foreach calls: 1.39 secs (100%) Test add_10, deque<int> took RAW : 0.97 secs func calls : 2.19 secs (225.773%) vtable calls: 2.44 secs (251.546%) foreach calls: 1.06 secs (109.278%) Test add_one_char, vector<seq> took RAW : 0.57 secs func calls : 0.72 secs (126.316%) vtable calls: 0.75 secs (131.579%) foreach calls: 0.72 secs (126.316%) Test add_one_char, list<seq> took RAW : 0.69 secs func calls : 0.69 secs (100%) vtable calls: 0.69 secs (100%) foreach calls: 0.69 secs (100%) Test add_one_char, deque<seq> took RAW : 0.6 secs func calls : 0.78 secs (130%) vtable calls: 0.73 secs (121.667%) foreach calls: 0.76 secs (126.667%) Test set_to_len, vector<seq> took RAW : 4.16 secs func calls : 4.36 secs (104.808%) vtable calls: 4.37 secs (105.048%) foreach calls: 4.18 secs (100.481%) Test set_to_len, list<seq> took RAW : 4.07 secs func calls : 4.42 secs (108.6%) vtable calls: 4.51 secs (110.811%) foreach calls: 4.3 secs (105.651%) Test set_to_len, deque<seq> took RAW : 4.72 secs func calls : 4.72 secs (100%) vtable calls: 4.89 secs (103.602%) foreach calls: 4.76 secs (100.847%) Test set_to_az, vector<seq> took RAW : 5.7 secs func calls : 5.72 secs (100.351%) vtable calls: 5.7 secs (100%) foreach calls: 5.7 secs (100%) Test set_to_az, list<seq> took RAW : 5.75 secs func calls : 5.75 secs (100%) vtable calls: 5.75 secs (100%) foreach calls: 5.75 secs (100%) Test set_to_az, deque<seq> took RAW : 5.83 secs func calls : 6.1 secs (104.631%) vtable calls: 5.83 secs (100%) foreach calls: 5.83 secs (100%) Test noop, vector<int> took RAW : 0.84 secs func calls : 3.86 secs (459.524%) vtable calls: 4.14 secs (492.857%) foreach calls: 0.87 secs (103.571%) Test noop, list<int> took RAW : 1.48 secs func calls : 4.25 secs (287.162%) vtable calls: 4.47 secs (302.027%) foreach calls: 1.71 secs (115.541%) Test noop, deque<int> took RAW : 2.08 secs func calls : 5 secs (240.385%) vtable calls: 5.33 secs (256.25%) foreach calls: 2.18 secs (104.808%) Test noop, vector<seq> took RAW : 0.33 secs func calls : 1.51 secs (457.576%) vtable calls: 1.62 secs (490.909%) foreach calls: 0.37 secs (112.121%) Test noop, list<seq> took RAW : 2.02 secs func calls : 2.75 secs (136.139%) vtable calls: 2.35 secs (116.337%) foreach calls: 2.02 secs (100%) Test noop, deque<seq> took RAW : 0.9 secs func calls : 2.04 secs (226.667%) vtable calls: 2.37 secs (263.333%) foreach calls: 1 secs (111.111%) -------------------------------------------------------------------- 3) Dual P4, 2.8GHz, 1GHz RAM, gcc 3.2.2 -O3, Linux Heavy Load! Test 1 of 1 (for 128000 elements) Test set_to_idx, vector<int> took RAW : 0.58 secs func calls : 3.59 secs (618.966%) vtable calls: 3.69 secs (636.207%) foreach calls: 0.77 secs (132.759%) Test set_to_idx, list<int> took RAW : 2.75 secs func calls : 3.83 secs (139.273%) vtable calls: 3.97 secs (144.364%) foreach calls: 2.82 secs (102.545%) Test set_to_idx, deque<int> took RAW : 1.02 secs func calls : 3.87 secs (379.412%) vtable calls: 3.7 secs (362.745%) foreach calls: 1.02 secs (100%) Test set_to_idx, vector<seq> took RAW : 8.56 secs func calls : 9.35 secs (109.229%) vtable calls: 8.56 secs (100%) foreach calls: 8.56 secs (100%) Test set_to_idx, list<seq> took RAW : 8.23 secs func calls : 8.23 secs (100%) vtable calls: 8.74 secs (106.197%) foreach calls: 8.63 secs (104.86%) Test set_to_idx, deque<seq> took RAW : 9.34 secs func calls : 9.34 secs (100%) vtable calls: 9.34 secs (100%) foreach calls: 9.34 secs (100%) Test multiply_by_2, vector<int> took RAW : 1.76 secs func calls : 4.56 secs (259.091%) vtable calls: 4.79 secs (272.159%) foreach calls: 1.87 secs (106.25%) Test multiply_by_2, list<int> took RAW : 3.53 secs func calls : 4.96 secs (140.51%) vtable calls: 5.08 secs (143.909%) foreach calls: 3.73 secs (105.666%) Test multiply_by_2, deque<int> took RAW : 2.52 secs func calls : 5.3 secs (210.317%) vtable calls: 5.51 secs (218.651%) foreach calls: 2.52 secs (100%) Test multiply_by_max20, vector<int> took RAW : 25.63 secs func calls : 29.64 secs (115.646%) vtable calls: 29.97 secs (116.933%) foreach calls: 25.63 secs (100%) Test multiply_by_max20, list<int> took RAW : 19.96 secs func calls : 23.97 secs (120.09%) vtable calls: 23.34 secs (116.934%) foreach calls: 19.96 secs (100%) Test multiply_by_max20, deque<int> took RAW : 18.89 secs func calls : 23.05 secs (122.022%) vtable calls: 23.61 secs (124.987%) foreach calls: 19.96 secs (105.664%) Test add_10, vector<int> took RAW : 0.75 secs func calls : 2.04 secs (272%) vtable calls: 2.39 secs (318.667%) foreach calls: 0.78 secs (104%) Test add_10, list<int> took RAW : 1.89 secs func calls : 2.66 secs (140.741%) vtable calls: 2.79 secs (147.619%) foreach calls: 1.89 secs (100%) Test add_10, deque<int> took RAW : 1.1 secs func calls : 2.4 secs (218.182%) vtable calls: 2.63 secs (239.091%) foreach calls: 1.1 secs (100%) Test add_one_char, vector<seq> took RAW : 3.16 secs func calls : 3.67 secs (116.139%) vtable calls: 3.67 secs (116.139%) foreach calls: 3.53 secs (111.709%) Test add_one_char, list<seq> took RAW : 3.75 secs func calls : 3.79 secs (101.067%) vtable calls: 3.95 secs (105.333%) foreach calls: 3.87 secs (103.2%) Test add_one_char, deque<seq> took RAW : 3.84 secs func calls : 4.49 secs (116.927%) vtable calls: 4.58 secs (119.271%) foreach calls: 3.94 secs (102.604%) Test set_to_len, vector<seq> took RAW : 8.4 secs func calls : 9.04 secs (107.619%) vtable calls: 9.34 secs (111.19%) foreach calls: 8.98 secs (106.905%) Test set_to_len, list<seq> took RAW : 9.07 secs func calls : 9.47 secs (104.41%) vtable calls: 9.12 secs (100.551%) foreach calls: 9.07 secs (100%) Test set_to_len, deque<seq> took RAW : 9.1 secs func calls : 9.2 secs (101.099%) vtable calls: 9.26 secs (101.758%) foreach calls: 9.25 secs (101.648%) Test set_to_az, vector<seq> took RAW : 12.75 secs func calls : 12.75 secs (100%) vtable calls: 12.75 secs (100%) foreach calls: 12.75 secs (100%) Test set_to_az, list<seq> took RAW : 11.66 secs func calls : 11.82 secs (101.372%) vtable calls: 11.71 secs (100.429%) foreach calls: 11.67 secs (100.086%) Test set_to_az, deque<seq> took RAW : 11.81 secs func calls : 12.47 secs (105.588%) vtable calls: 13.12 secs (111.092%) foreach calls: 12.89 secs (109.145%) Test noop, vector<int> took RAW : 1.29 secs func calls : 4.78 secs (370.543%) vtable calls: 4.93 secs (382.171%) foreach calls: 1.38 secs (106.977%) Test noop, list<int> took RAW : 1.86 secs func calls : 4.92 secs (264.516%) vtable calls: 5.17 secs (277.957%) foreach calls: 1.95 secs (104.839%) Test noop, deque<int> took RAW : 1.51 secs func calls : 5.14 secs (340.397%) vtable calls: 5.25 secs (347.682%) foreach calls: 1.56 secs (103.311%) Test noop, vector<seq> took RAW : 0.48 secs func calls : 1.93 secs (402.083%) vtable calls: 2.03 secs (422.917%) foreach calls: 0.54 secs (112.5%) Test noop, list<seq> took RAW : 0.72 secs func calls : 2.04 secs (283.333%) vtable calls: 2 secs (277.778%) foreach calls: 0.75 secs (104.167%) Test noop, deque<seq> took RAW : 0.64 secs func calls : 1.96 secs (306.25%) vtable calls: 2.14 secs (334.375%) foreach calls: 0.68 secs (106.25%) Regards Hartmut

John Torjo wrote:
I've made such a test. It tests against:
It does not compile with gcc 3.4 unless the following patch is applied. Results for gcc 3.4 are also attached.
Conclusions (mine :D): - BOOST_FOREACH rocks big time
That's true :-) The only benchmark we'd need to for compilation time ;-) - Volodya

On 5 May 2004, at 19:14, John Torjo wrote:
I'm eager to see how the tests run on different platforms.
I've run the tests on MacOS X using CW8.3 (haven't got CW9) and GCC 3.3. I had to make minor changes for both compilers to build the test. I applied Vladimir's changes for GCC, and had to modify the is_const construct to use Metrowerks::is_const for CodeWarrior. Regards, Martin ***************************************************************************** ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitmate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** *****************************************************************************

Martin Taylor <martin.taylor@artwork-systems.com> writes:
On 5 May 2004, at 19:14, John Torjo wrote:
I'm eager to see how the tests run on different platforms.
I've run the tests on MacOS X using CW8.3 (haven't got CW9) and GCC 3.3. I had to make minor changes for both compilers to build the test. I applied Vladimir's changes for GCC, and had to modify the is_const construct to use Metrowerks::is_const for CodeWarrior.
It surprises me that BOOST_FOREACH is ever measurably different from a hand-coded loop. Can anyone explain that? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
It surprises me that BOOST_FOREACH is ever measurably different from a hand-coded loop. Can anyone explain that?
I could hazard a guess. BOOST_FOREACH uses two nested for loops. The sole purpose of the inner loop is to rebind the loop variable in case it is a reference. There is some fancy footwork to make sure the inner loop executes exactly once for each iteration of the outer loop. In addition, if the inner loop is exited with a "break," that information must be relayed to the outer loop. This is all accomplished with the help of a hidden bool, which is set, checked and reset at each iteration. It's the only overhead. For all but the most trivial loops, the cost of reading and writing a bool is lost in the noise. IMO, being able have reference loop variables is worth the very small price. -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> writes:
For all but the most trivial loops, the cost of reading and writing a bool is lost in the noise. IMO, being able have reference loop variables is worth the very small price.
"There's gotta be a better way" ;-) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
"Eric Niebler" <eric@boost-consulting.com> writes:
For all but the most trivial loops, the cost of reading and writing a bool is lost in the noise. IMO, being able have reference loop variables is worth the very small price.
"There's gotta be a better way" ;-)
There isn't, and it's easy to see why. If you want to support reference loop variables, you need to be able to rebind the reference at each iteration. References aren't rebindable, so you need a declaration. In C++, there is only one way to declare a varible, inject it into a subsequent scope and *not* evaluate it in boolean context: the for loop. Hence, the inner for loop is necessary, as is a mechanism for executing the loop only once and for handling break statements. If the compiler had excellent flow analysis *and* the loop body did not contain a break statement, it should be possible for the compiler to optimize the boolean away. But it looks like compiler technology isn't quite there yet. -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> writes:
David Abrahams wrote:
"Eric Niebler" <eric@boost-consulting.com> writes:
For all but the most trivial loops, the cost of reading and writing a bool is lost in the noise. IMO, being able have reference loop variables is worth the very small price. "There's gotta be a better way" ;-)
There isn't, and it's easy to see why. If you want to support reference loop variables, you need to be able to rebind the reference at each iteration. References aren't rebindable, so you need a declaration. In C++, there is only one way to declare a varible, inject it into a subsequent scope and *not* evaluate it in boolean context: the for loop. Hence, the inner for loop is necessary, as is a mechanism for executing the loop only once and for handling break statements.
That isn't enough to convince me that a boolean is needed, yet. Maybe something could be done with "goto"? I haven't looked at BOOST_FOREACH carefully, but I'm imagining something like: for (C::iterator s = c.begin(), f = c.end(); s != f; ++s) // outer { for(C::reference r = *s;;) // inner { // Body goes here goto continue_; } break; // if there's a break in the body, break out all the way continue_: } Not sure if you can generate those labels reliably, but it seems possible as long as nobody wants to put two FOREACHes on a single line. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Maybe something could be done with "goto"? I haven't looked at BOOST_FOREACH carefully, but I'm imagining something like:
for (C::iterator s = c.begin(), f = c.end(); s != f; ++s) // outer { for(C::reference r = *s;;) // inner { // Body goes here goto continue_; } break; // if there's a break in the body, break out all the way continue_: }
That won't work. Look how BOOST_FOREACH is used: BOOST_FOREACH( int &i, int_vector ) { // body goes here } BOOST_FOREACH must expand to something such that the following scope is treated as the loop body. There is no place to put the continue_ label, and no place to put the goto. I thought about putting a goto somewhere in the inner for loop, like "for( ...; ; goto continue_ )" but C++ doesn't allow that. And even if that worked, it's still borken because a break in the loop body would skip the goto. -- Eric Niebler Boost Consulting www.boost-consulting.com

David Abrahams <dave@boost-consulting.com> writes:
"Eric Niebler" <eric@boost-consulting.com> writes:
David Abrahams wrote:
"There's gotta be a better way" ;-)
There isn't, and it's easy to see why. If you want to support reference loop variables, you need to be able to rebind the reference at each iteration. References aren't rebindable, so you need a declaration. In C++, there is only one way to declare a varible, inject it into a subsequent scope and *not* evaluate it in boolean context: the for loop. Hence, the inner for loop is necessary, as is a mechanism for executing the loop only once and for handling break statements.
That isn't enough to convince me that a boolean is needed, yet. Maybe something could be done with "goto"? I haven't looked at BOOST_FOREACH carefully, but I'm imagining something like:
for (C::iterator s = c.begin(), f = c.end(); s != f; ++s) // outer { for(C::reference r = *s;;) // inner { // Body goes here goto continue_; } break; // if there's a break in the body, break out all the way continue_: }
Not sure if you can generate those labels reliably, but it seems possible as long as nobody wants to put two FOREACHes on a single line.
Err, this could only work if you were willing to pass the loop block to the macro. Might be worth the pain, though (?) -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Err, this could only work if you were willing to pass the loop block to the macro. Might be worth the pain, though (?)
If you are willing to pass the loop body to the macro, you don't need the inner for loop at all. for (C::iterator s = c.begin(), f = c.end(); s != f; ++s) { C::reference r = *s; // Body goes here } -- Rainer Deyke - rainerd@eldwood.com - http://eldwood.com

Rainer Deyke wrote:
David Abrahams wrote:
Err, this could only work if you were willing to pass the loop block to the macro. Might be worth the pain, though (?)
If you are willing to pass the loop body to the macro, you don't need the inner for loop at all.
That's all true, but IMO it isn't worth the pain. I'm OK with BOOST_FOREAH being 4% slower than a hand-coded loop on average (until compilers get better flow control analysis). In exchange, users get a nice, first-class feel. It behaves like a new keyword instead of a macro, which is intended. It's certainly a judgement call, but the perf/usability trade-off was a conscious one. The tiny perf hit seems a small price. -- Eric Niebler Boost Consulting www.boost-consulting.com
participants (7)
-
David Abrahams
-
Eric Niebler
-
hartmutkaiser@t-online.de
-
John Torjo
-
Martin Taylor
-
Rainer Deyke
-
Vladimir Prus