Re: [boost] Re: Formal Review: FOREACH macro

In-Reply-To: <4276462D.1020501@boost-consulting.com> eric@boost-consulting.com (Eric Niebler) wrote (abridged):
It injects 5 local variables. I haven't checked, but I suspect that some can be optimized away sometimes.
I doubt they will be in debug mode, which is when I need to look at them.
I'm not sure why users will need to understand how FOREACH uses its hidden variables. Can you explain?
When I am debugging, I need to understand what the code is doing else how I can I understand what it is doing wrong? For example, from looking at the implementation it seems that the macro caches the end() value in a variable called _foreach_end. This means that if some random memory corruption changes that variable the loop may never terminate. I need to know. Incidently, if it is caching then code like: BOOST_FOREACH( int i, vec ) if (condition(i)) vec.push_back( filter(i) ); will not work the same as: for (iterator i = vec.begin(); i != vec.end(); ++i) if (condition(*i)) vec.push_back( filter(*i) ); Which at least needs to be documented. Again the implementation is complex enough that I can't tell at a glance whether these are the same, so I may have this wrong, but I strongly suspect they are different. -- Dave Harris, Nottingham, UK.

Dave Harris wrote:
In-Reply-To: <4276462D.1020501@boost-consulting.com> eric@boost-consulting.com (Eric Niebler) wrote (abridged):
I'm not sure why users will need to understand how FOREACH uses its hidden variables. Can you explain?
When I am debugging, I need to understand what the code is doing else how I can I understand what it is doing wrong?
For example, from looking at the implementation it seems that the macro caches the end() value in a variable called _foreach_end. This means that if some random memory corruption changes that variable the loop may never terminate. I need to know.
OK, that's true. But the same can be said of any third-party library. I guess the only problem is that most debuggers don't let you step into a macro.
Incidently, if it is caching then code like:
BOOST_FOREACH( int i, vec ) if (condition(i)) vec.push_back( filter(i) );
will not work the same as:
for (iterator i = vec.begin(); i != vec.end(); ++i) if (condition(*i)) vec.push_back( filter(*i) );
Neither of these loops "work" -- they are both broken in the same way. They both invalidate the iterator they are using. I don't think it's reasonable to assume rational behavior when you are altering the sequence while you are iterating it. But perhaps it should be worth noting in the docs that FOREACH doesn't perform any heroics to make this work. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler schrieb:
Incidently, if it is caching then code like:
BOOST_FOREACH( int i, vec ) if (condition(i)) vec.push_back( filter(i) );
will not work the same as:
vec.reserve(a lot);
for (iterator i = vec.begin(); i != vec.end(); ++i) if (condition(*i)) vec.push_back( filter(*i) );
Neither of these loops "work" -- they are both broken in the same way. They both invalidate the iterator they are using.
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end. i is never following the insertion point. -- Stefan Strasser

Stefan Strasser wrote:
Eric Niebler schrieb:
Incidently, if it is caching then code like:
BOOST_FOREACH( int i, vec ) if (condition(i)) vec.push_back( filter(i) );
will not work the same as:
vec.reserve(a lot);
Haha! OK.
for (iterator i = vec.begin(); i != vec.end(); ++i) if (condition(*i)) vec.push_back( filter(*i) );
Neither of these loops "work" -- they are both broken in the same way. They both invalidate the iterator they are using.
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
i is never following the insertion point.
Right, *now* you have an interesting point. Disregarding the fact (opinion) that writing loops that add/remove elements from a sequence while iterating it is playing with fire, you're absolutely right that the second loops is guaranteed to work by the standard (for proper values of "a lot") whereas the first does not. How serious an issue is this? It's a sincere question. For some sequence types (std::list, all the associative containers) the end iterator is stable even when adding/removing elements from the sequence. It certainly seems worth making a note about this in the FOREACH docs, but what should I say? An outright ban on adding/removing elements from the sequence during iteration seems too extreme. Perhaps something like: "If adding/removing elements from the collection type can cause iterator invalidation under any circumstances, the effect of doing such from the body of FOREACH is undefined." Or maybe it would be better to define FOREACH's behavior in terms of the equivalent for loop: for( iterator-type begin = begin(col), end = end(col); begin != end; ++begin ) { loop-variable = *begin; loop-body } That way people can work out for themselves whether iterator invalidation will occur or not. -- Eric Niebler Boost Consulting www.boost-consulting.com

Dave Harris schrieb:
For example, from looking at the implementation it seems that the macro caches the end() value in a variable called _foreach_end.
(...)
Incidently, if it is caching then code like:
BOOST_FOREACH( int i, vec ) if (condition(i)) vec.push_back( filter(i) );
will not work the same as:
for (iterator i = vec.begin(); i != vec.end(); ++i) if (condition(*i)) vec.push_back( filter(*i) );
it is cached, but why? end is end and stays end, unless it is invalidated. -- Stefan Strasser

Stefan Strasser wrote:
Dave Harris schrieb:
For example, from looking at the implementation it seems that the macro caches the end() value in a variable called _foreach_end.
it is cached, but why? end is end and stays end, unless it is invalidated.
Performance. Even when writing loops by hand, it's a good idea to save the result of .end() in a local to avoid repeated function calls that are not necessary. It's called "hoisting". It's one of the side-benefits of using FOREACH and std:: algorithms. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler schrieb:
Stefan Strasser wrote:
Dave Harris schrieb:
For example, from looking at the implementation it seems that the macro caches the end() value in a variable called _foreach_end.
it is cached, but why? end is end and stays end, unless it is invalidated.
Performance. Even when writing loops by hand, it's a good idea to save the result of .end() in a local to avoid repeated function calls that are not necessary. It's called "hoisting". It's one of the side-benefits of using FOREACH and std:: algorithms.
I think it's not worth it here. besides that it isn't a performance overhead on most containers you don't expect that from a keyword. and it is not consistent: char str[]={'a','b',0,0}; foreach(char c,str){ str[2]='c'; std::cerr << c << std::endl; } output is "abc". std::string str="ab"; foreach(char c,str){ if(str.length() == 2) str+='c'; std::cerr << c << std::endl; } output is "ab". -- Stefan Strasser

Stefan Strasser wrote:
Eric Niebler schrieb:
Even when writing loops by hand, it's a good idea to save the result of .end() in a local to avoid repeated function calls that are not necessary. It's called "hoisting". It's one of the side-benefits of using FOREACH and std:: algorithms.
I think it's not worth it here. besides that it isn't a performance overhead on most containers you don't expect that from a keyword.
Huh, *I* would expect that from a keyword.
and it is not consistent:
char str[]={'a','b',0,0}; foreach(char c,str){ str[2]='c'; std::cerr << c << std::endl; }
output is "abc".
Right. When iterating a null-terminated string, there isn't any need to find the end of the sequence up front. I just quit at the first null-terminator.
std::string str="ab"; foreach(char c,str){ if(str.length() == 2) str+='c'; std::cerr << c << std::endl; }
output is "ab".
See the discussion about iterator invalidation. This code has undefined behavior. You have found an interesting inconsistency, though. In the case of null-terminated strings, I felt I could be smarter than Boost.Range (which uses std::strlen() to find the end). You can only detect the difference in this one scenario, though, and I consider the perf win to be worth it. But I don't have strong feelings on the issue. -- Eric Niebler Boost Consulting www.boost-consulting.com

"Eric Niebler" <eric@boost-consulting.com> writes:
Stefan Strasser wrote:
Eric Niebler schrieb:
Even when writing loops by hand, it's a good idea to save the result of .end() in a local to avoid repeated function calls that are not necessary. It's called "hoisting". It's one of the side-benefits of using FOREACH and std:: algorithms.
I think it's not worth it here. besides that it isn't a performance overhead on most containers you don't expect that from a keyword.
Huh, *I* would expect that from a keyword.
Me too. Furthermore, I wouldn't use this library unless it did hoisting. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Eric Niebler schrieb:
I think it's not worth it here. besides that it isn't a performance overhead on most containers you don't expect that from a keyword.
Huh, *I* would expect that from a keyword.
can you explain that? why do expect a keyword to create temporaries and use outdated values? (and what are the semantics in other languages which have that keyword?)
std::string str="ab"; foreach(char c,str){ if(str.length() == 2) str+='c'; std::cerr << c << std::endl; }
See the discussion about iterator invalidation. This code has undefined behavior.
you're right (even the SGI stl documentation admits that the std::string invalidation rules are strange). -- Stefan Strasser
participants (4)
-
brangdon@cix.compulink.co.uk
-
David Abrahams
-
Eric Niebler
-
Stefan Strasser