Wave: how to recover after errors?
I have a simple problem: wave warnings (specifically, macro re-definition) are reported as exceptions thrown from [potentially, any of, at least I don't yet know better] the methods of pp_iterator. It is very simple to catch one and print a message, however I can't figure out how to recover after one. Can't seem to find any mention of error recovery in documentation, tried some variations of last_known_good_iterator (store a copy before every increment, restore back to known copy in exception handler) -- it recovers simple cases but on more complex files there appears to be internal state corruption. Also, even when this recovery works, it leaves macro in the pre-redefinition state whereas the usual cpp policy is to allow redefinition and print a warning. I thought of using a preventive strategy -- undefine the existing macro from a policy method -- but unfortunately the exception is thrown before defined_macro() is called. Looks like it's time to ask the owners of the library: how do I do this? :-) Thanks in advance, ...Max... PS. I am using 1.33.0; should I _really_ _really_ be upgrading to 1.33.1 to get this resolved?
Max Motovilov wrote:
I have a simple problem: wave warnings (specifically, macro re-definition) are reported as exceptions thrown from [potentially, any of, at least I don't yet know better] the methods of pp_iterator. It is very simple to catch one and print a message, however I can't figure out how to recover after one.
Can't seem to find any mention of error recovery in documentation, tried some variations of last_known_good_iterator (store a copy before every increment, restore back to known copy in exception handler) -- it recovers simple cases but on more complex files there appears to be internal state corruption. Also, even when this recovery works, it leaves macro in the pre-redefinition state whereas the usual cpp policy is to allow redefinition and print a warning.
I thought of using a preventive strategy -- undefine the existing macro from a policy method -- but unfortunately the exception is thrown before defined_macro() is called.
Looks like it's time to ask the owners of the library: how do I do this? :-)
You hit the nail on the head (like we Germans say). Wave currently doesn't really support error recovery. The main focus of Wave was conformance and not usability. Adding error recovery is one of my high priority tasks for Wave, but I'm not sure yet how to design this. I'm afraid I don't have any better answer for you. Regards Hartmut
Hartmut,
You hit the nail on the head (like we Germans say). Wave currently doesn't really support error recovery. The main focus of Wave was conformance and not usability. Adding error recovery is one of my high priority tasks for Wave, but I'm not sure yet how to design this.
Perhaps you could give me some insight into what is going on within Wave context when an error condition arises? Currently I am only interested in very common situtations, perhaps macro re-definition IS the only error recovery case I'll ever need to support. I was thinking along the lines of last_known_good paradigm but as I have absolutely no idea whether all necessary state information is indeed copied along with an iterator and which methods trigger the next_token() step, I can't do it in an intelligent way. I guess, my questions to you would be: - Is there any way to return the state of the Wave context back to a specific position? I probably don't even care much if such an operation would be expensive since error conditions like that are relatively rare in real C++ code, as long as it is not as expensive as restarting the parsing from the beginning. - Which methods cause next_token() to be called? operator ==/!=, operator*, operator++? Only some of them? - Also, it would be nice if I could extract the cause of error at the point where an exception is caught, but it looks like the current codebase doesn't preserve the error itself, only severity. It would be enough for me to know that the last two tokens consumed were <#define> and <Identifier> but since those are processed by the Wave itself I will not know about it. So, any suggestions how I could determine that the cause of error was indeed macro re-definition? A bit of background: I am trying to use Wave for a small project that might or might not grow into something useful [e.g. for Boost] and am in no way pressed for time, so if you think you might be able to answer some of these concerns later, in subsequent versions of Wave, please let me know. Also I am perfectly willing to try out any of the alpha quality (or worse :) ) code you might be developing. My platform for this project is Visual C++ 8.0 (VStudio 2005), I don't plan to port it elsewhere until I decide the whole thing I am trying to put together is indeed worthwhile. Regards, ...Max...
Max Motovilov wrote:
You hit the nail on the head (like we Germans say). Wave currently doesn't really support error recovery. The main focus of Wave was conformance and not usability. Adding error recovery is one of my high priority tasks for Wave, but I'm not sure yet how to design this.
Perhaps you could give me some insight into what is going on within Wave context when an error condition arises? Currently I am only interested in very common situtations, perhaps macro re-definition IS the only error recovery case I'll ever need to support. I was thinking along the lines of last_known_good paradigm but as I have absolutely no idea whether all necessary state information is indeed copied along with an iterator and which methods trigger the next_token() step, I can't do it in an intelligent way. I guess, my questions to you would be:
The iterators carry _all_ of the context information, so from this point of view I don't expect any difficulties. The problems I'm not able to disgust currently is how to _reliably_ find a synchronisation point where the preprocessor has to be restarted.
- Is there any way to return the state of the Wave context back to a specific position? I probably don't even care much if such an operation would be expensive since error conditions like that are relatively rare in real C++ code, as long as it is not as expensive as restarting the parsing from the beginning.
- Which methods cause next_token() to be called? operator ==/!=, operator*, operator++? Only some of them?
It's in the operator!=() and operator++(). So most, if not all of the exceptions are generated here.
- Also, it would be nice if I could extract the cause of error at the point where an exception is caught, but it looks like the current codebase doesn't preserve the error itself, only severity. It would be enough for me to know that the last two tokens consumed were <#define> and <Identifier> but since those are processed by the Wave itself I will not know about it. So, any suggestions how I could determine that the cause of error was indeed macro re-definition?
That's a very good point. I've added it to the exception classes.
A bit of background: I am trying to use Wave for a small project that might or might not grow into something useful [e.g. for Boost] and am in no way pressed for time, so if you think you might be able to answer some of these concerns later, in subsequent versions of Wave, please let me know. Also I am perfectly willing to try out any of the alpha quality (or worse :) ) code you might be developing. My platform for this project is Visual C++ 8.0 (VStudio 2005), I don't plan to port it elsewhere until I decide the whole thing I am trying to put together is indeed worthwhile.
Understood. Generally Wave issues two different types of errors: - errors occuring during a processing step not generating tokens on the context::iterator level. For example the macro redefinition errors you are interested in belong to this group of errors. It is quite easy to handle these. Just use the following code snippet instead of the simple "while (first != last) { ++first; }" loop. The following snippet is taken from the wave driver, where I've implemented the new techniques: // loop over all generated tokens outputting the generated text bool finished = false; do { try { while (first != last) { // store the last known good token position current_position = (*first).get_position(); // print out the current token value output << (*first).get_value(); // advance to the next token ++first; } finished = true; } catch (boost::wave::cpp_exception const &e) { // some preprocessing error cerr << e.file_name() << "(" << e.line_no() << "): " << e.description() << endl; } } while (!finished); - errors occuring during processing of language constructs supposed to produce tokens on the context::iterator level. A good example for this are errors occuring during macro expansion. It is a lot harder to recover from these errors. The solution given above will not always work in this case. As a first simple attempt I've added a function bool is_recoverable(cpp_exception const&) giving back, whether it is possible to recover using the solution above. This allows to write catch (boost::wave::cpp_exception const &e) { // some preprocessing error if (boost::wave::is_recoverable(e)) { cerr << e.file_name() << "(" << e.line_no() << "): " << e.description() << endl; } else { throw; } } Certainly you will have to add an addditional catch clause outside of the loop. BTW, most of the thrown exceptions are recoverable (that was a surprise for me). Please note, that lexing_exception's are currently non-recoverable, because there is no destinction where these were thrown from. HTH Regards Hartmut
Hartmut,
That's a very good point. I've added it to the exception classes.
these. Just use the following code snippet instead of the simple "while (first != last) { ++first; }" loop. The following snippet is taken from
But that's not in the CVS yet, right? 'Cause I didn't see it there :) the
wave driver, where I've implemented the new techniques:
Not sure what's the point of storing the current_position in this snippet as I don't see you using it anywhere at the time of error recovery. In any case, I didn't see any suggestion of being able to position the scanner directly using position_type (and I doublt that's feasible considering that every logical position in the file has semantic state of the preprocessor associated with it). I was thinking of achieving recovery with a copy of the _iterator_ but it doesn't look like this does what I expected it to. Here's my sample snippet: try { wave_context::iterator_type p = ctx.begin(), last_known_good = p; bool done = false; while( !done ) { try { done = p == ctx.end(); std::cout << p->get_value(); ++p; last_known_good = p; } catch( boost::wave::preprocess_exception& err ) { if( err.get_severity() >= boost::wave::util::severity_error ) throw; complain( err ); p = last_known_good; } } } catch( boost::wave::cpp_exception& err ) { complain( err ); return false; } I have half-expected it to go into an infinite loop (as returning back to the failure point should cause the same error to repeat itself) then I realized that tokens that would NOT normally be returned to the user (bosy of the re-defined macro, in my case) may not be part of the state associated with the iterator. Yet something fishy happens with the state I can't quite put my finger on. Here's my example input: ============= #define Foo(x) bar##x Foo(foo1) #define Foo(x) x##bar #define Bar(x) x##foo Foo(foo2) Bar(bar1) ============= This is the result: ============= #line 2 "..\\test.cpp" barfoo1 #define Bar(x) x##foo barfoo2 Bar(bar1) ============= Note that the definition of Bar went through verbatim, which means that recovery was not achieved properly. My guess was that parsing of the body of the second definition of Foo somehow consumed the newline and didn't return it back to the stream, thus invalidating the production for the next #define (which, I imagine, was <EOL> <#> <Keyword>). I tried the modified version: ============= #define Foo(x) bar##x Foo(foo1) #define Foo(x) x##bar #define Bar(x) x##foo Foo(foo2) Bar(bar1) ============= and, sure enough, I got: ============= #line 2 "..\\test.cpp" barfoo1 #line 6 "..\\test.cpp" barfoo2 bar1foo ============= So the newline was indeed consumed irrecoverably [no pun intended!] and recovery failed. But, that's NOT the most interesting part. The most interesting part is that when I commented out all use of last_known_good from the above snippet, the code behaved EXACTLY THE SAME in both cases. Which makes me doubt very much that state has indeed been preserved within the copy of the iterator. Perhaps Wave iterator has shallow copy? I'd deduce that from the code but it's been a long day... maybe tomorrow. Regards, ...Max...
On 12/5/05 11:28 PM, "Max Motovilov"
Here's my sample snippet:
try { wave_context::iterator_type p = ctx.begin(), last_known_good = p; bool done = false;
while( !done ) { try { done = p == ctx.end();
There's a problem with the previous line. You can't dereference "p" if you're at the end, but you don't prevent it.
std::cout << p->get_value(); ++p; last_known_good = p; } catch( boost::wave::preprocess_exception& err ) { if( err.get_severity() >= boost::wave::util::severity_error ) throw; complain( err ); p = last_known_good; } } } catch( boost::wave::cpp_exception& err ) { complain( err ); return false; } [TRUNCATE]
Maybe your problems are caused by dereference violation. Try something like: //======================================================================== using namespace boost::wave; try { typedef wave_context::iterator_type iterator; iterator last_known_good = ctx.begin(); iterator const e = ctx.end(); for ( iterator i = last_known_good ; e != i ; ) { try { std::cout << i->get_value(); last_known_good = ++i; } catch ( preprocess_exception & err ) { if ( err.get_severity() >= util::severity_error ) throw; complain( err ); i = last_known_good; } } //... } catch ( cpp_exception & err ) { complain( err ); return false; } //======================================================================== Looking at this, I still don't see how you avoid reading the problem context element over and over.... -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle,
There's a problem with the previous line. You can't dereference "p" if you're at the end, but you don't prevent it.
Point taken. Here's the modified part: for(;;) { try { if( p == ctx.end() ) break; The only detected change is in the version that preserves the last_known_good: there is now an extra empty line after "barfoo1" in both cases. Incorrect processing of #define did not go away (EOL still consumed and not returned).
Looking at this, I still don't see how you avoid reading the problem context element over and over....
Heh! My point exactly! Yet it is not being read over and over and the difference between preserving and restoring the iterator and simply ignoring the exception is cosmetic as far as results are concerned. Looks like I'll have to dig through Hartmut's code to see for myself whether I can achieve what I want with the current version or forget it and wait for ??? next one. Regards, ...Max...
Max Motovilov wrote:
That's a very good point. I've added it to the exception classes.
But that's not in the CVS yet, right? 'Cause I didn't see it there :)
It should be there. Perhaps you were hit by the delay of the anonymous access to the CVS?
these. Just use the following code snippet instead of the simple "while (first != last) { ++first; }" loop. The following snippet is taken from the wave driver, where I've implemented the new techniques:
Not sure what's the point of storing the current_position in this snippet as I don't see you using it anywhere at the time of error recovery. In any case, I didn't see any suggestion of being able to position the scanner directly using position_type (and I doublt that's feasible considering that every logical position in the file has semantic state of the preprocessor associated with it). I was thinking of achieving recovery with a copy of the _iterator_ but it doesn't look like this does what I expected it to.
The current position is used by the error reporting in the wave driver in case of unexpected exceptions et.al. (see cpp.cpp of the wave applet). It does not relate to the error recovery as discussed. I should have removed it from the code in my last mail.
I have half-expected it to go into an infinite loop (as returning back to the failure point should cause the same error to repeat itself) then I realized that tokens that would NOT normally be returned to the user (bosy of the re-defined macro, in my case) may not be part of the state associated with the iterator.
That was my first expectation as well. But as I pointed out in my last mail, there are two different kinds of errors, error reproted from preprocessing stages not going to produce any tokens and errors from processing stage supposed to produce tokens. These are very different in terms of error recovery.
Yet something fishy happens with the state I can't quite put my finger on. Here's my example input:
============= #define Foo(x) bar##x Foo(foo1) #define Foo(x) x##bar #define Bar(x) x##foo Foo(foo2) Bar(bar1)
=============
[snip] Yes, I'll have to look into this. My current implementation was a first shot and Ive expected to get problems. Wave is not written for error recovery, I'll have to invest some time to insert reliable synchronisation points.
So the newline was indeed consumed irrecoverably [no pun intended!] and recovery failed.
It's a problem of internal lookahead. The newline is not the problem but the #define. It already was consumed by the preprocessor.
But, that's NOT the most interesting part. The most interesting part is that when I commented out all use of last_known_good from the above snippet, the code behaved EXACTLY THE SAME in both cases. Which makes me doubt very much that state has indeed been preserved within the copy of the iterator.
Yes, I expected that.
Perhaps Wave iterator has shallow copy? I'd deduce that from the code but it's been a long day... maybe tomorrow.
Yes, the Wave iterators shallow copied. To do a deep copy would be too expensive. But I'll try to look into your issue asap. Regards Hartmut
participants (3)
-
Daryle Walker
-
Hartmut Kaiser
-
Max Motovilov