data:image/s3,"s3://crabby-images/8ac67/8ac674cdc2e195af5da24e64f987283bf48693a8" alt=""
Michael Marcin wrote:
Erik wrote:
Or is it possible to configure BOOST_FOREACH to be as efficient as my macro? I don't know but that is a good question.
I considered using BOOST_FOREACH until I checked its generated output... which was worse than std::for_each with a boost::bind which was worse than std::for_each with a hand coded functor which was worse than a hand coded for loop like yours above.
I won't deny that the abstraction penalty of BOOST_FOREACH is not zero, but have either of you actually measured the overhead? I have, and I found BOOST_FOREACH to be about 5% slower than the equivalent hand-coded loop when compiler optimizations are turned on. It's really very small, and that 5% buys you a lot of expressivity. YMMV I have not made any timings, but looked at the number of instructions in
Eric Niebler skrev:
the assembly. I assume this corresponds to code size. The good news is
that with g++-4.2.0 -O3 there appears to be no abstraction penalty at
all. BOOST_FOREACH is equivalent to the handcoded loop and my macro.
They both have 21 instructions (which appear to be equivalent; some of
them are reordered and some have their parameters swapped). But with -Os
it is possible to get only 20 instructions with handcoded/my macro,
while BOOST_FOREACH actually gives 57 instructions (-Os giving more
instructions than -O3 for ANY code looks like a compiler bug to me). And
then I tried -O2 of course. handcoded/my macro yield 21 instructions
while BOOST_FOREACH yields 31.
With g++-4.1.2 handcoded/my macro gives the same result as with
g++-4.2.0. But with BOOST_FOREACH it is another story. At -O3
BOOST_FOREACH yields as much as 28 instructions, at -O2 35 instructions
and at -Os 90 instructions.
So I suppose that as long as everyone uses g++-4.2.0 (or higher I i
suppose but did not test) and -O3 it should be fine to use
BOOST_FOREACH. Those who rely on getting small code with
-Os can forget about using BOOST_FOREACH with any of those compiler
versions.
So it seems like the gcc people did a quite good job of optimizing this
very complex hack. I suppose I will suggest that we start using
BOOST_FOREACH in our project. We already use -O3 for release builds and
in a few months (or maybe a year) almost everyone will have at least
gcc-4.2.0. But it would be interesting if others could verify my
measurements and maybe test other examples of code. I did some tests and
got the exact same results for the following use of BOOST_FOREACH (and
the corresponding use of my macro and a handcoded loop):
//////////////////////////////////////////////////////
#include