[Wave] extracting preprocessing grammar and comments from sourcefile
After looking at the documentation of Wave I believe it can be used to extract the preprocessor grammar and I am therefore looking for some tips on how to get started on my problem. I want to adapt wave into a mechanism for exptracting all preprocessor grammar from a source file without modifying the library itself, and I will try to explain what I want to use wave for. What I want to do is to *extracting the position and value of preprocessor tokens like #ifdef's, #defines, #warning etc. *evaluate the preprocessor conditionals *after evaluating the preprocessor conditional, extract the portion which was evaluated as false as a string #define positive #ifdef int positive #endif /*Extract this false part as string*/ int x; #endif *I am also interested in the value and position of unexpanded macros *extracting the C/C++ statement or expression that the unexpanded macro (and expanded) is a part of. *extract the value and position of all C and C++ comments Which work should I bace my work upon and which data structures should I reimplement and for what? I have looked at the documentation and the code, and it seems easy to do some parts but other are not obvious to me at this point. Thanks, Andreas Saebjoernsen
Andreas Sæbjørnsen wrote:
After looking at the documentation of Wave I believe it can be used to extract the preprocessor grammar and I am therefore looking for some tips on how to get started on my problem. I want to adapt wave into a mechanism for exptracting all preprocessor grammar from a source file without modifying the library itself, and I will try to explain what I want to use wave for. What I want to do is to *extracting the position and value of preprocessor tokens like #ifdef's, #defines, #warning etc.
Let me elaborate a bit. Wave is built as a layered iterator. At the bottom (on top of the iterators of the input stream) there is the lexer component which constructs the C++ tokens from the input. The tokens you are looking for (#ifdef etc.) are contained in the token sequence generated by the lexer iterators. On top of the lexer we have the preprocessing component which does the actual preprocessing (as you might have expected). The token sequence produced by the preprocessing component obviously doesn't contain these tokens anymore. What you could do in this situation is - build your own lexer intercepting the tokens you're interested in and storing the information you need somewhere else. This makes it very difficult to track down the (virtual) position of these tokens in the preprocessed token stream. - adding some additional hooks to the library allowing to get notified on these tokens. I'm not sure of the implications, though and how much this is different from the first bullet :-P - Perhaps you have another idea on this?
*evaluate the preprocessor conditionals
What do you have in mind? Do you mean the (macro-)expanded conditional expression?
*after evaluating the preprocessor conditional, extract the portion which was evaluated as false as a string
#define positive #ifdef int positive #endif /*Extract this false part as string*/ int x; #endif
Hmmm. This one is tough. The preprocessor is designed to skip this information, so I'll have to look at the code base how to best access the corresponding code fragments. Perhaps a special hook could be introduced to get called for every skipped token.
*I am also interested in the value and position of unexpanded macros
Undefined macros?
*extracting the C/C++ statement or expression that the unexpanded macro (and expanded) is a part of.
This conceptually isn't possible at the preprocessor level because it has no notion of a C++ statement/expression.
*extract the value and position of all C and C++ comments
This one is easy. Just enable the preserve comments mode and all the comment tokens will be part of the generated output token sequence.
Which work should I bace my work upon and which data structures should I reimplement and for what? I have looked at the documentation and the code, and it seems easy to do some parts but other are not obvious to me at this point.
Generally you should look at the existing preprocessing hooks and if these can provide you with sufficient information. It should be quite straight forward to add additional hooks to the library, so any suggestions are welcome. HTH Regards Hartmut
Thanks, Andreas Saebjoernsen
On 11/1/05, Hartmut Kaiser
Let me elaborate a bit. Wave is built as a layered iterator. At the bottom (on top of the iterators of the input stream) there is the lexer component which constructs the C++ tokens from the input. The tokens you are looking for (#ifdef etc.) are contained in the token sequence generated by the lexer iterators. On top of the lexer we have the preprocessing component which does the actual preprocessing (as you might have expected). The token sequence produced by the preprocessing component obviously doesn't contain these tokens anymore.
What you could do in this situation is - build your own lexer intercepting the tokens you're interested in and storing the information you need somewhere else. This makes it very difficult to track down the (virtual) position of these tokens in the preprocessed token stream. - adding some additional hooks to the library allowing to get notified on these tokens. I'm not sure of the implications, though and how much this is different from the first bullet :-P - Perhaps you have another idea on this?
I was hoping to avoid modifying the lexer itself, so I have been more reclined towards the approach of adding hooks to the library. I was thinking more in the direction of doing something similar to what struct default_preprocessing_hooks (in preprocessing_hooks.hpp) does for macros, since the user can reimplement this for the instantiation of the template<> class context. Another option is to add this to template<> class context much similar to the way it is currently done for macros. What leads me to favour the struct default_preprocessing_hooks solution over modifying template<> class context is that it already handles similar problems and you could also argue that these hooks does not fit in template<> class context. So my two options are: - add more hooks to struct default_preprocessing_hooks - add more member functions which work as hooks within template<> class context The hooks must be provided with all the necessary information for extracting the preprocessor grammar and evaluating the preprocessor conditionals. What do you think is the best solution, its feasibility and how do you think it would fit into the wave preprocessor library?
*evaluate the preprocessor conditionals
What do you have in mind? Do you mean the (macro-)expanded conditional expression?
yes, so that I can extract the unexpanded preprocessor conditional expression and when this expression is (macro-)expanded if the result is positive or negative. For instance in the example #define BAR #ifdef BAR int x; #ifdef FOO int y; #endif #endif I would be interested in extracting '#ifdef BAR' and also that it is evaluated as true. I would also be interested in extracting '#ifdef FOO' and that it is evaluated as false.
*after evaluating the preprocessor conditional,
extract the portion which was evaluated as false as a string
#define positive #ifdef int positive #endif /*Extract this false part as string*/ int x; #endif
Hmmm. This one is tough. The preprocessor is designed to skip this information, so I'll have to look at the code base how to best access the corresponding code fragments. Perhaps a special hook could be introduced to get called for every skipped token.
That would be great! :) Do you think this can be done through struct default_preprocessing_hooks?
*I am also interested in the value and position
of unexpanded macros
Undefined macros?
I am only interested in defined macros. To be more specific I am interested in when the preprocessor recognises a macro. For each macro it is interesting to extract the value and position in the file. The macro can be found in two forms; the one before macro-expansion and the one after. Both forms are interesting. But this is from what I have seen already handled in struct default_preprocessing_hooks. 1: #define FOO int x; 2: FOO On line 2 in this example code the macro FOO is found. This macro can be expanded to 'int x', which to the preprocessor is equivalent to the unexpanded macro FOO found on line 1.
*extracting the C/C++ statement or expression
that the unexpanded macro (and expanded) is a part of.
This conceptually isn't possible at the preprocessor level because it has no notion of a C++ statement/expression.
I do not want to use Wave as a C/C++ parser, only to understand a subst of it's grammar. Let me corroborate for why I think it is doable and that the information necessary to do this is already easily available. What I was thinking was that -since brackets ('{' and '}') and semicolon should is found within the tokens from the lexer, you should as far as I can see be able to fully define the grammar necessary to recognize what can be an expression or statment. For instance a variable declaration statement in C/C++ always ends with a ';'. - a function definition statement has a basic block (body) which is always limited by the bracket ({...}). -reference expression also tend to end with an ';', like for instance a function reference expression " foo(); ". Therefore I would argue that since I do not think that you need an understanding of C/C++ syntax and only hopefully a fairly limited view of the C/C++ grammar (and the information for this is in the token-stream returned from the lexer) this should be doable. I think it can be a little bit difficult though, but I have to draw on your expertize here. Do you have any ideas for this?
*extract the value and position of all C and
C++ comments
This one is easy. Just enable the preserve comments mode and all the comment tokens will be part of the generated output token sequence.
Great. :) What about making a hook for this within stuct default_preprocessing_hooks also?
Which work should I bace my work upon and which
data structures should I reimplement and for what? I have looked at the documentation and the code, and it seems easy to do some parts but other are not obvious to me at this point.
Generally you should look at the existing preprocessing hooks and if these can provide you with sufficient information. It should be quite straight forward to add additional hooks to the library, so any suggestions are welcome.
It would be very interesting to do some work on this, and it would be useful to hear what you think about adding the additional hooks we have been talking about. Maybe these hooks should be better specified. Regards Andreas
Andreas Sæbjørnsen wrote:
I was hoping to avoid modifying the lexer itself, so I have been more reclined towards the approach of adding hooks to the library. I was thinking more in the direction of doing something similar to what struct default_preprocessing_hooks (in preprocessing_hooks.hpp) does for macros, since the user can reimplement this for the instantiation of the template<> class context. Another option is to add this to template<> class context much similar to the way it is currently done for macros. What leads me to favour the struct default_preprocessing_hooks solution over modifying template<> class context is that it already handles similar problems and you could also argue that these hooks does not fit in template<> class context. So my two options are: - add more hooks to struct default_preprocessing_hooks - add more member functions which work as hooks within template<> class context The hooks must be provided with all the necessary information for extracting the preprocessor grammar and evaluating the preprocessor conditionals. What do you think is the best solution, its feasibility and how do you think it would fit into the wave preprocessor library?
I was thinking about to add new hooks to the preprocessing hooks template from the very beginning. Sorry for not beeing consise enough. For consistency reasons I'ld suggest to add a hook: template <typename TokenT> void found_directive(TokenT const& directive); Where directive refers to the token containing the found pp directive.
*evaluate the preprocessor conditionals
What do you have in mind? Do you mean the (macro-)expanded conditional expression?
yes, so that I can extract the unexpanded preprocessor conditional expression and when this expression is (macro-)expanded if the result is positive or negative. For instance in the example #define BAR #ifdef BAR int x; #ifdef FOO int y; #endif #endif I would be interested in extracting '#ifdef BAR' and also that it is evaluated as true. I would also be interested in extracting '#ifdef FOO' and that it is evaluated as false.
What about: template <typename ContainerT> void evaluated_conditional_expression(ContainerT const& expression, bool expression_value); Where: - expression contains the (not expanded) expression tokensequence and - expression_value is the result of the evaluation of this expression.
*after evaluating the preprocessor conditional, extract the portion which was evaluated as false as a string
#define positive #ifdef int positive #endif /*Extract this false part as string*/ int x; #endif
Hmmm. This one is tough. The preprocessor is designed to skip this information, so I'll have to look at the code base how to best access the corresponding code fragments. Perhaps a special hook could be introduced to get called for every skipped token.
That would be great! :) Do you think this can be done through struct default_preprocessing_hooks?
template <typename ContainerT> void skipped_token(TokenT const& token); Where token is the skipped token. This hook will be called for each token which gets skipped due to a false preproccessing condition.
*I am also interested in the value and position of unexpanded macros
Undefined macros?
I am only interested in defined macros. To be more specific I am interested in when the preprocessor recognises a macro. For each macro it is interesting to extract the value and position in the file. The macro can be found in two forms; the one before macro-expansion and the one after. Both forms are interesting. But this is from what I have seen already handled in struct default_preprocessing_hooks. 1: #define FOO int x; 2: FOO On line 2 in this example code the macro FOO is found. This macro can be expanded to 'int x', which to the preprocessor is equivalent to the unexpanded macro FOO found on line 1.
Yeah, I was already wondering how you might want to decide whether a identifier actually is a 'undefined' macro :-P This information already should be avalable through the existing preproceesing hooks.
*extracting the C/C++ statement or expression that the unexpanded macro (and expanded) is a part of.
This conceptually isn't possible at the preprocessor level because it has no notion of a C++ statement/expression.
I do not want to use Wave as a C/C++ parser, only to understand a subst of it's grammar. Let me corroborate for why I think it is doable and that the information necessary to do this is already easily available. What I was thinking was that -since brackets ('{' and '}') and semicolon should is found within the tokens from the lexer, you should as far as I can see be able to fully define the grammar necessary to recognize what can be an expression or statment. For instance a variable declaration statement in C/C++ always ends with a ';'. - a function definition statement has a basic block (body) which is always limited by the bracket ({...}). -reference expression also tend to end with an ';', like for instance a function reference expression " foo(); ". Therefore I would argue that since I do not think that you need an understanding of C/C++ syntax and only hopefully a fairly limited view of the C/C++ grammar (and the information for this is in the token-stream returned from the lexer) this should be doable. I think it can be a little bit difficult though, but I have to draw on your expertize here. Do you have any ideas for this?
I would not like to put any functionality into the library which does not belong to its purpose: preprocessing C++. I'm pretty sure, that you'll be able to build this on top of Wave.
*extract the value and position of all C and C++ comments
This one is easy. Just enable the preserve comments mode and all the comment tokens will be part of the generated output token sequence.
Great. :) What about making a hook for this within stuct default_preprocessing_hooks also?
Why? The preprocessing hooks are there to allow to access information not available from the generated token stream itself, i.e. information about the actual work inside the preprocessor. But the comments are available in the generated token stream already. If I would add such a hook, somebody else would like to have a special hook for line endings etc.
It would be very interesting to do some work on this, and it would be useful to hear what you think about adding the additional hooks we have been talking about. Maybe these hooks should be better specified.
Do these new hooks satisfy your needs? Regards Hartmut
The hooks you are proposing satisfies my needs very well and appreciate that
you add these hooks to Wave as it makes and will make my life a lot easier.
I agree to your point on the comments and can build the rest I need on top
of Wave. When and how should I expect to be able to test a new Wave version
with these hooks?
Thanks,
Andreas
On 11/1/05, Hartmut Kaiser
extracting the C/C++ statement or expression
that the unexpanded macro (and expanded) is a part of. I was thinking about to add new hooks to the preprocessing hooks template from the very beginning. Sorry for not beeing consise enough.
For consistency reasons I'ld suggest to add a hook:
template <typename TokenT> void found_directive(TokenT const& directive);
Where directive refers to the token containing the found pp directive.
yes, so that I can extract the unexpanded preprocessor conditional expression and when this expression is (macro-)expanded if the result is positive or negative. For instance in the example #define BAR #ifdef BAR int x; #ifdef FOO int y; #endif #endif I would be interested in extracting '#ifdef BAR' and also that it is evaluated as true. I would also be interested in extracting '#ifdef FOO' and that it is evaluated as false.
What about:
template <typename ContainerT> void evaluated_conditional_expression(ContainerT const& expression, bool expression_value);
Where: - expression contains the (not expanded) expression tokensequence and - expression_value is the result of the evaluation of this expression.
*after evaluating the preprocessor conditional, extract the portion which was evaluated as false as a string
#define positive #ifdef int positive #endif /*Extract this false part as string*/ int x; #endif
template <typename ContainerT> void skipped_token(TokenT const& token);
Where token is the skipped token. This hook will be called for each token which gets skipped due to a false preproccessing condition.
I am only interested in defined macros. To be more specific I am interested in when the preprocessor recognises a macro. For each macro it is interesting to extract the value and position in the file. The macro can be found in two forms; the one before macro-expansion and the one after. Both forms are interesting. But this is from what I have seen already handled in struct default_preprocessing_hooks. 1: #define FOO int x; 2: FOO On line 2 in this example code the macro FOO is found. This macro can be expanded to 'int x', which to the preprocessor is equivalent to the unexpanded macro FOO found on line 1.
Yeah, I was already wondering how you might want to decide whether a identifier actually is a 'undefined' macro :-P This information already should be avalable through the existing preproceesing hooks.
*extracting the C/C++ statement or expression that the unexpanded macro (and expanded) is a part of.
This conceptually isn't possible at the preprocessor level because it has no notion of a C++ statement/expression.
I would not like to put any functionality into the library which does not belong to its purpose: preprocessing C++. I'm pretty sure, that you'll be able to build this on top of Wave.
*extract the value and position of all C and C++ comments
This one is easy. Just enable the preserve comments mode and all the comment tokens will be part of the generated output token sequence.
Great. :) What about making a hook for this within stuct default_preprocessing_hooks also?
Why? The preprocessing hooks are there to allow to access information not available from the generated token stream itself, i.e. information about the actual work inside the preprocessor. But the comments are available in the generated token stream already. If I would add such a hook, somebody else would like to have a special hook for line endings etc.
It would be very interesting to do some work on this, and it would be useful to hear what you think about adding the additional hooks we have been talking about. Maybe these hooks should be better specified.
Do these new hooks satisfy your needs?
Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Andreas Sæbjørnsen wrote:
The hooks you are proposing satisfies my needs very well and appreciate that you add these hooks to Wave as it makes and will make my life a lot easier. I agree to your point on the comments and can build the rest I need on top of Wave. When and how should I expect to be able to test a new Wave version with these hooks?
I'm not sure, how long it will take me to implement the new stuff. I'll find some time for that hopefully not later than the next weekend. As soon as I have something ready I'll check it into the Boost CVS::HEAD, from where you should be able to get and test it. Regards Hartmut
Andreas Sæbjørnsen wrote:
The hooks you are proposing satisfies my needs very well and appreciate that you add these hooks to Wave as it makes and will make my life a lot easier. I agree to your point on the comments and can build the rest I need on top of Wave. When and how should I expect to be able to test a new Wave version with these hooks?
I added the discussed preprocessing hooks to the Wave library (see the Boost CVS::HEAD). Additionally I added a new sample application demonstrating the new hooks (it's called advanced_hooks). This sample outputs not only the preprocessed tokens, but additionally any conditional directive found and the complete non-expanded source code from false conditional blocks. I.e. for the following snippet #define TEST 1 #if defined(TEST) "TEST was defined: " TEST #else "TEST was not defined!" #endif the generated output looks like: //"#if defined(TEST) "TEST was defined: " 1 //"#else //"TEST was not defined!" //"#endif HTH Regards Hartmut
Thank you very much. This is really excellent work, and for us it is really
useful. We are making an open source BSD license style tool called ROSE for
easily building C/C++ translators which takes the source code as input and
outputs source code. Our tool understand the C/C++ syntax, unlike normal
compilers it can understand the syntax of a library and you can easily work
within the AST to analyze and change the AST. The aim of the tool is
scientific computing and especially optimization of code which uses high
level abstractions, but the tool is general and can be applied to a lot of
software engineering problems.
Since our intermediate representation is an AST and our frontend is a
compiler frontend we lost some information about the part of the
preprocessor directives which is evaluated as negative, we had knowledge
about the macro declarations but no knowledge about the macros calls and
also no simple solution to some details about how many digits the floating
point values is declared with in the code (we handle the floating point
value as value not string, but now we will probably have both). It seems
like we are now able to extract this information using Wave and that is
excellent, but we are yet to hook it up to our tool. So expect us to use
Wave on some of the most advanced C/C++ codes out there. :)
Thanks
Andreas
On 11/4/05, Hartmut Kaiser
Andreas Sæbjørnsen wrote:
The hooks you are proposing satisfies my needs very well and appreciate that you add these hooks to Wave as it makes and will make my life a lot easier. I agree to your point on the comments and can build the rest I need on top of Wave. When and how should I expect to be able to test a new Wave version with these hooks?
I added the discussed preprocessing hooks to the Wave library (see the Boost CVS::HEAD). Additionally I added a new sample application demonstrating the new hooks (it's called advanced_hooks). This sample outputs not only the preprocessed tokens, but additionally any conditional directive found and the complete non-expanded source code from false conditional blocks. I.e. for the following snippet
#define TEST 1 #if defined(TEST) "TEST was defined: " TEST #else "TEST was not defined!" #endif
the generated output looks like:
//"#if defined(TEST) "TEST was defined: " 1 //"#else //"TEST was not defined!" //"#endif
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Andreas Sæbjørnsen wrote:
Thank you very much. This is really excellent work, and for us it is really useful. We are making an open source BSD license style tool called ROSE for easily building C/C++ translators which takes the source code as input and outputs source code. Our tool understand the C/C++ syntax, unlike normal compilers it can understand the syntax of a library and you can easily work within the AST to analyze and change the AST. The aim of the tool is scientific computing and especially optimization of code which uses high level abstractions, but the tool is general and can be applied to a lot of software engineering problems.
Do you have a link?
Since our intermediate representation is an AST and our frontend is a compiler frontend we lost some information about the part of the preprocessor directives which is evaluated as negative, we had knowledge about the macro declarations but no knowledge about the macros calls and also no simple solution to some details about how many digits the floating point values is declared with in the code (we handle the floating point value as value not string, but now we will probably have both). It seems like we are now able to extract this information using Wave and that is excellent, but we are yet to hook it up to our tool. So expect us to use Wave on some of the most advanced C/C++ codes out there. :)
Nice to hear. Thanks! Regards Hartmut
The project is in the process of being released for open download, but for
now it is only available to people collaborating with the ROSE team. A
preliminary link can be found at:
http://www.llnl.gov/CASC/rose/
but this is an incomplete webpage and the final project will probably be
hosted elsewhere.
Andreas
On 11/5/05, Hartmut Kaiser
Andreas Sæbjørnsen wrote:
Thank you very much. This is really excellent work, and for us it is really useful. We are making an open source BSD license style tool called ROSE for easily building C/C++ translators which takes the source code as input and outputs source code. Our tool understand the C/C++ syntax, unlike normal compilers it can understand the syntax of a library and you can easily work within the AST to analyze and change the AST. The aim of the tool is scientific computing and especially optimization of code which uses high level abstractions, but the tool is general and can be applied to a lot of software engineering problems.
Do you have a link?
Since our intermediate representation is an AST and our frontend is a compiler frontend we lost some information about the part of the preprocessor directives which is evaluated as negative, we had knowledge about the macro declarations but no knowledge about the macros calls and also no simple solution to some details about how many digits the floating point values is declared with in the code (we handle the floating point value as value not string, but now we will probably have both). It seems like we are now able to extract this information using Wave and that is excellent, but we are yet to hook it up to our tool. So expect us to use Wave on some of the most advanced C/C++ codes out there. :)
Nice to hear. Thanks!
Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Andreas Sæbjørnsen
-
Hartmut Kaiser