RE: [boost] Wave C++ Review Begins Today - February 7, 2005

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser Sent: woensdag 9 februari 2005 14:42 To: boost@lists.boost.org Subject: RE: [boost] Wave C++ Review Begins Today - February 7, 2005
Michiel Salters wrote:
one serious bug in wave\cpplexer\re2clex\cpp_re2c_lexer.hpp: template <typename IteratorT, typename PositionT> inline lexer<IteratorT, PositionT>::lexer(IteratorT const &first, IteratorT const &last, PositionT const &pos, boost::wave::language_support language) : filename(pos.get_file()), at_eof(false) { memset(&scanner, '\0', sizeof(Scanner)); scanner.fd = -1; scanner.eol_offsets = aq_create(); scanner.first = scanner.act = (uchar *)&(*first); scanner.last = (uchar *)&(*last); ... [ IteratorT = std::string::iterator in the example ]
This assumes that the range [first,last] is dereferenacable and contiguous in memory. This of course is not mandated by the standard.
Yes, this is a known problem (see the Changelog file), I didn't realise, that it will assert on some systems. Seems, that I'll have to fix that soon ;-). This shouldn't be a problem on other systems, though, if you're using iterators acting on a continuous chunk of memory (such as std::string::iterator).
No, you're mistaken here and in the Changelog as well. Even std::string::iterator doesn't have to refer to contiguous memory. std::vector<char>::iterator would, though, so I built a string around that and found the actual bug (and a VC8 bug as well I think). The bug is that context<IteratorT> stores the two iterators first and last as const&. The cpp example passed .begin() and .end(), which are temporaries and they go out of scope. The result is that the context ends up with references to destructed iterators. Again, the VC8 STL iterator debugging found the problem on first access. Line 281, cpp_context.hpp I really have to say Dinkumware did a fine job here, this would have been very hard to find without these checks.
OTOH this isn't exactly a problem in the preprocessor library, it's merely a problem of the re2c generated lexer used in the driver. The other provided lexer (SLEX based) won't expose this problem (i.e. the cpp_tokens example shouldn't assert).
Might be (didn't get there yet), but it too uses cpp_context.hpp, right? BTW, if you try VC8, mail me for the string class I used. Using std::vector<char> directly appears to trigger a VC 8(beta1) bug with typenames longer than 2KB Regards, Michiel Salters This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

On Feb 9, 2005, at 10:50 AM, Michiel Salters wrote:
Michiel Salters wrote:
one serious bug in wave\cpplexer\re2clex\cpp_re2c_lexer.hpp: template <typename IteratorT, typename PositionT> inline lexer<IteratorT, PositionT>::lexer(IteratorT const &first, IteratorT const &last, PositionT const &pos, boost::wave::language_support language) : filename(pos.get_file()), at_eof(false) { memset(&scanner, '\0', sizeof(Scanner)); scanner.fd = -1; scanner.eol_offsets = aq_create(); scanner.first = scanner.act = (uchar *)&(*first); scanner.last = (uchar *)&(*last); ... [ IteratorT = std::string::iterator in the example ]
This assumes that the range [first,last] is dereferenacable and contiguous in memory. This of course is not mandated by the standard.
Yes, this is a known problem (see the Changelog file), I didn't realise, that it will assert on some systems. Seems, that I'll have to fix that soon ;-). This shouldn't be a problem on other systems, though, if you're using iterators acting on a continuous chunk of memory (such as std::string::iterator).
No, you're mistaken here and in the Changelog as well. Even std::string::iterator doesn't have to refer to contiguous memory. std::vector<char>::iterator would, though, so I built a string around that and found the actual bug (and a VC8 bug as well I think).
The bug is that context<IteratorT> stores the two iterators first and last as const&. The cpp example passed .begin() and .end(), which are temporaries and they go out of scope. The result is that the context ends up with references to destructed iterators. Again, the VC8 STL iterator debugging found the problem on first access.
Line 281, cpp_context.hpp
Even dereferencing *last could be a bug, if it is past-the-end. For iterators referencing contiguous memory (vectors and pointers only, as you've said), one would need to write scanner.last = scanner.first + std::distance(first, last);
I really have to say Dinkumware did a fine job here, this would have been very hard to find without these checks.
FWIW, STLport and newer versions of GCC (Apple GCC 3.3, FSF GCC 3.4) have similar debugging modes. They've been invaluable in rooting out subtle bugs like those that stem from iterator invalidation. Doug

Doug Gregor wrote:
Even dereferencing *last could be a bug, if it is past-the-end. For iterators referencing contiguous memory (vectors and pointers only, as you've said), one would need to write
scanner.last = scanner.first + std::distance(first, last);
Duh, just another bug! I'll fix that asap. The re2c lexing component has to be rewritten to make proper use of iterators anyway. That's already on my todo list for some time.
I really have to say Dinkumware did a fine job here, this would have been very hard to find without these checks.
FWIW, STLport and newer versions of GCC (Apple GCC 3.3, FSF GCC 3.4) have similar debugging modes. They've been invaluable in rooting out subtle bugs like those that stem from iterator invalidation.
Nice to know! Regards Hartmut

Michiel Salters wrote:
This assumes that the range [first,last] is dereferenacable and contiguous in memory. This of course is not mandated by the standard.
Yes, this is a known problem (see the Changelog file), I didn't realise, that it will assert on some systems. Seems, that I'll have to fix that soon ;-). This shouldn't be a problem on other systems, though, if you're using iterators acting on a continuous chunk of memory (such as std::string::iterator).
No, you're mistaken here and in the Changelog as well. Even std::string::iterator doesn't have to refer to contiguous memory. std::vector<char>::iterator would, though, so I built a string around that and found the actual bug (and a VC8 bug as well I think).
I stand corrected here.
The bug is that context<IteratorT> stores the two iterators first and last as const&. The cpp example passed .begin() and .end(), which are temporaries and they go out of scope. The result is that the context ends up with references to destructed iterators. Again, the VC8 STL iterator debugging found the problem on first access.
Line 281, cpp_context.hpp
Thanks for catching this! I'll remove the references, so the iterators get copied. As only I've installed my own copy of vc8 and verified the correctness of my changes I'll upload a new version of Wave (since IMHO this bug is too serious to leave it alone for the review process).
I really have to say Dinkumware did a fine job here, this would have been very hard to find without these checks.
Indeed!
OTOH this isn't exactly a problem in the preprocessor library, it's merely a problem of the re2c generated lexer used in the driver. The other provided lexer (SLEX based) won't expose this problem (i.e. the cpp_tokens example shouldn't assert).
Might be (didn't get there yet), but it too uses cpp_context.hpp, right?
The lexer components are completely independent from the preprocessor and freely interchangable. Using the SLEX based lexer at least would circumvent the problem regarding the continuous memory.
BTW, if you try VC8, mail me for the string class I used. Using std::vector<char> directly appears to trigger a VC 8(beta1) bug with typenames longer than 2KB
Could you send it directly to me? Many thanks and regards Hartmut
participants (3)
-
Doug Gregor
-
Hartmut Kaiser
-
Michiel Salters