
Hi, I need some advice regarding Wave. What I want to do is to operate with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is that I want the token stream after expansion to have the positions that another preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO }; Exands into #line 2 int main(){ int x; }; Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ? Thanks Andreas

Andreas Sæbjørnsen wrote:
I need some advice regarding Wave. What I want to do is to operate with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is that I want the token stream after expansion to have the positions that another preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ?
IIUC you want to discard the information about the macro definition and make the token stream look like it was the original input stream. This is not possible with Wave as it is. My first thought was to use boost::transform_iterator on top of the wave iterators and to maintain a separate file position based on the tokens passed through it. But this doesn't work, because this iterator kicks in every time the transform_iterator is dereferenced, so you end up correcting the current file position each time. Clearly indirection doesn't work easily here... So I'm not sure yet, how to achieve this, but at least I have some ideas, at which points this could be injected into Wave. Please give me some more time to contemplate... Do you have any thought's on that? Regards Hartmut

Hi Hartmut,
IIUC you want to discard the information about the macro definition and make the token stream look like it was the original input stream. This is not possible with Wave as it is.
It seems that we agree, but let me clarify my previous problem description so that we know that I put this in a clear way. I am using preprocessor hooks to fetch the macro before and after expansion. My problem is that the macro after expansion has the positions of the macro definition while I want it to have the positions of the current line. The reason why I want this is that I want to compare the position of tokens in Wave with the positions of constructs in an Abstract Syntax Tree, so e.g identifier 'x' in token stream is found to have the same positions as an identifier 'x' of a variable declaration in the AST.
My first thought was to use boost::transform_iterator on top of the wave iterators and to maintain a separate file position based on the tokens passed through it. But this doesn't work, because this iterator kicks in every time the transform_iterator is dereferenced, so you end up correcting the current file position each time. Clearly indirection doesn't work easily here...
So I'm not sure yet, how to achieve this, but at least I have some ideas, at which points this could be injected into Wave. Please give me some more time to contemplate...
Thank you very much for looking into this! It is very much appreciated.
Do you have any thought's on that?
I am not so versed in the implementation details that I feel confident suggesting too much, but I will contemplate on this too. A separate position would be great, but in the absence of that maybe the positions could be changed through a normalization stage after macro expansion although I am worried about that not working too well with the preprocessor hooks. Thanks Andreas

Andreas,
I need some advice regarding Wave. What I want to do is to operate with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is that I want the token stream after expansion to have the positions that another preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
///////////////////////////////////////////////////////////////////////////
//
// The function 'generated_token' will be called by the library whenever a
// token is about to be returned from the library.
//
// The parameter 'ctx' is a reference to the context object used for
// instantiating the preprocessing iterators by the user.
//
// The parameter 't' is the token about to be returned from the library.
// This function may alter the token, but in this case it must be
// implemented with a corresponding signature:
//
// Token const&
// generated_token(Context const& ctx, Token& t);
//
// which makes it possible to modify the token in place.
//
// The default behavior is to return the token passed as the parameter
// without modification.
//
///////////////////////////////////////////////////////////////////////////
template

Wow! You are fast! I have experimented a little bit with this code and
it looks very good, but there are two things which needs to be fixed
before I can use it:
* the position of a token in the original source code is
gone(default in standard Wave)
* the line positioning is incorrect/non-standard (see below for an
example).
The column information seems correct to me. But the line information
is not correct as it does not behave like a standard preprocessor
which would insert #line directives to map the positions to the
original source file(s). Example:
///BEGIN TEST.C
#include "test.h"
#define FOO int x;
int main(){
FOO; /*comment */double y;
};
//END TEST.C
//BEGIN TEST.h
class Foo{
};
//END TEST.h
You would expect the line position of 'double' in test.C line 5 to
still be 5, but instead it is 6 because
1. the line in test.C with just an "\n" in test.C does not count in
the line positioning (-1 line in position)
2. there is no separation between the lines in test.h and test.C
(+2 lines in position)
thanks
Andreas
On 11/7/06, Hartmut Kaiser
Andreas,
I need some advice regarding Wave. What I want to do is to operate with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is that I want the token stream after expansion to have the positions that another preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
/////////////////////////////////////////////////////////////////////////// // // The function 'generated_token' will be called by the library whenever a // token is about to be returned from the library. // // The parameter 'ctx' is a reference to the context object used for // instantiating the preprocessing iterators by the user. // // The parameter 't' is the token about to be returned from the library. // This function may alter the token, but in this case it must be // implemented with a corresponding signature: // // Token const& // generated_token(Context const& ctx, Token& t); // // which makes it possible to modify the token in place. // // The default behavior is to return the token passed as the parameter // without modification. // /////////////////////////////////////////////////////////////////////////// template
Token const& generated_token(Context const& ctx, Token const& t) { return t; } This should help solving your problem.
To show how this hook function may help you I added a new sample 'real_positions' to Wave demonstrating its use. Everything is in the Boost CVS::HEAD.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andreas Sæbjørnsen wrote:
Wow! You are fast! I have experimented a little bit with this code and it looks very good, but there are two things which needs to be fixed before I can use it: * the position of a token in the original source code is gone(default in standard Wave)
What do you mean by that?
* the line positioning is incorrect/non-standard (see below for an example).
This does not depend on the Wave library itself, but it's a matter of the real_positions example. Do you mind to submit a fix for that (I now changed the sample to output the detailed token information, which makes it easier to track the output)?
The column information seems correct to me. But the line information is not correct as it does not behave like a standard preprocessor which would insert #line directives to map the positions to the original source file(s). Example: ///BEGIN TEST.C #include "test.h" #define FOO int x;
int main(){ FOO; /*comment */double y; }; //END TEST.C
//BEGIN TEST.h class Foo{ }; //END TEST.h
You would expect the line position of 'double' in test.C line 5 to still be 5, but instead it is 6 because 1. the line in test.C with just an "\n" in test.C does not count in the line positioning (-1 line in position) 2. there is no separation between the lines in test.h and test.C (+2 lines in position)
The real_positions example does not take into account #line directives, this could be easily added, though. I simply didn't know, if you need it. Regards Hartmut
thanks Andreas On 11/7/06, Hartmut Kaiser
wrote: Andreas,
I need some advice regarding Wave. What I want to do is
with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is
the token stream after expansion to have the positions
to operate that I want that another
preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
//////////////////////////////////////////////////////////////////////
///// // // The function 'generated_token' will be called by the library whenever a // token is about to be returned from the library. // // The parameter 'ctx' is a reference to the context object used for // instantiating the preprocessing iterators by the user. // // The parameter 't' is the token about to be returned from the library. // This function may alter the token, but in this case it must be // implemented with a corresponding signature: // // Token const& // generated_token(Context const& ctx, Token& t); // // which makes it possible to modify the token in place. // // The default behavior is to return the token passed as the parameter // without modification. //
////////////////////////////////////////////////////////////// /////////////
template
Token const& generated_token(Context const& ctx, Token const& t) { return t; } This should help solving your problem.
To show how this hook function may help you I added a new sample 'real_positions' to Wave demonstrating its use. Everything
is in the
Boost CVS::HEAD.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

I'm confused... looking at the documentation for the "techniques" used in the preprocessor library (found at http://www.boost.org/libs/preprocessor/doc/topics/techniques.html), I see the following statements in the discussion of BOOST_PP_EMPTY : How: BOOST_PP_EMPTY() expands to nothing and can be used as an unused parameter. Note: BOOST_PP_EMPTY with the () never gets expanded. The () is necessary to invoke a function-like macro. The first statement suggests that BOOST_PP_EMPTY() is expanded. The second statement asserts that BOOST_PP_EMPTY with the () never gets expanded. (Perhaps the "with" here should be "without"?) Is this a conflict/bug in the documentation, or am I missing something? Thanks, Mike

* the position of a token in the original source code is gone(default in standard Wave)
What do you mean by that?
What I need is both the original position and the position from the real_positions example available on a token. This makes it possible to locate a token using the real_positions (which is equivalent to positions on output from a preprocessor) and then get the original source code position from that token. Do you see any other way of achieving this?
* the line positioning is incorrect/non-standard (see below for an example).
This does not depend on the Wave library itself, but it's a matter of the real_positions example. Do you mind to submit a fix for that (I now changed the sample to output the detailed token information, which makes it easier to track the output)?
Yes. It was very easy to work with your code so that should be no problem. Are the real_positions set before or after the tokens are send to the preprocessing hooks? Thanks Andrea
On 11/7/06, Hartmut Kaiser
wrote: Andreas,
I need some advice regarding Wave. What I want to do is
with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is
the token stream after expansion to have the positions
to operate that I want that another
preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
//////////////////////////////////////////////////////////////////////
///// // // The function 'generated_token' will be called by the library whenever a // token is about to be returned from the library. // // The parameter 'ctx' is a reference to the context object used for // instantiating the preprocessing iterators by the user. // // The parameter 't' is the token about to be returned from the library. // This function may alter the token, but in this case it must be // implemented with a corresponding signature: // // Token const& // generated_token(Context const& ctx, Token& t); // // which makes it possible to modify the token in place. // // The default behavior is to return the token passed as the parameter // without modification. //
////////////////////////////////////////////////////////////// /////////////
template
Token const& generated_token(Context const& ctx, Token const& t) { return t; } This should help solving your problem.
To show how this hook function may help you I added a new sample 'real_positions' to Wave demonstrating its use. Everything
is in the
Boost CVS::HEAD.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andreas Sæbjørnsen wrote:
* the position of a token in the original source code is gone(default in standard Wave)
What do you mean by that?
What I need is both the original position and the position from the real_positions example available on a token. This makes it possible to locate a token using the real_positions (which is equivalent to positions on output from a preprocessor) and then get the original source code position from that token. Do you see any other way of achieving this?
I changed the real_positions example to use a new token type carrying both, the original and the corrected positions. This makes it a bit more tricky, though, because now you have to explicitely instantiate some of the internal Wave library template types for this new token type. These explicit instantiations are normally done in the Wave library code (for the default token type). Note: the new token type is essentially a copy of the default token type which has added the second position instance. For simplicity reasons I removed the class allocators, though. You'll have to re-add these, if needed. Please look at the file cpp_lex_token.hpp for reference).
* the line positioning is incorrect/non-standard (see
below for
an example).
This does not depend on the Wave library itself, but it's a matter of the real_positions example. Do you mind to submit a fix for that (I now changed the sample to output the detailed token information, which makes it easier to track the output)?
Yes. It was very easy to work with your code so that should be no problem. Are the real_positions set before or after the tokens are send to the preprocessing hooks?
The corrected position is set inside the generated_token() pp hook. IIUC correctly you're asking, whether the corrected token position will be available in other hooks as well. The answer is no. The generated_token pp hook is call at the very last point before the preprocessed token gets returned to the calling application. I see no other way to do this, because only at this point I know everything about the final token sequence. Any ideas? HTH Regards Hartmut
Thanks Andrea
On 11/7/06, Hartmut Kaiser
wrote: Andreas,
I need some advice regarding Wave. What I want to do is
with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is
the token stream after expansion to have the positions
preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is
to operate that I want that another this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
//
///// // // The function 'generated_token' will be called by
//////////////////////////////////////////////////////////////////// the library
whenever a // token is about to be returned from the library. // // The parameter 'ctx' is a reference to the context object used for // instantiating the preprocessing iterators by the user. // // The parameter 't' is the token about to be returned from the library. // This function may alter the token, but in this case it must be // implemented with a corresponding signature: // // Token const& // generated_token(Context const& ctx, Token& t); // // which makes it possible to modify the token in place. // // The default behavior is to return the token passed as the parameter // without modification. //
////////////////////////////////////////////////////////////// /////////////
template
Token const& generated_token(Context const& ctx, Token const& t) { return t; } This should help solving your problem.
To show how this hook function may help you I added a
new sample
'real_positions' to Wave demonstrating its use. Everything is in the Boost CVS::HEAD.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

I changed the real_positions example to use a new token type carrying both, the original and the corrected positions. This makes it a bit more tricky, though, because now you have to explicitely instantiate some of the internal Wave library template types for this new token type. These explicit instantiations are normally done in the Wave library code (for the default token type).
I think this probably is the optimal solution and it is not difficult to instantiate the templates in my library although I expect it to slow down compilation considerably (?).
Note: the new token type is essentially a copy of the default token type which has added the second position instance. For simplicity reasons I removed the class allocators, though. You'll have to re-add these, if needed. Please look at the file cpp_lex_token.hpp for reference).
In which instances would I need to use these class allocators? What are they used for in Wave with the standard token type?
Yes. It was very easy to work with your code so that should be no problem. Are the real_positions set before or after the tokens are send to the preprocessing hooks?
The corrected position is set inside the generated_token() pp hook. IIUC correctly you're asking, whether the corrected token position will be available in other hooks as well. The answer is no. The generated_token pp hook is call at the very last point before the preprocessed token gets returned to the calling application. I see no other way to do this, because only at this point I know everything about the final token sequence. Any ideas?
An idea would be to allow calling of generated_token() from one of the other hooks. For instance, if you in rescanned_macro() know when the first macro has been fully texpanded then you can call generated_token() at that point on every token provided to rescanned_macro(). The reason why I say first macro is that a macro in it's formal definition can call another macro, so first macro calls second macro etc. The suggestion I made above can probably also be considered an optimization (?) because you could also imagine recomputing the correct positions on every call to rescanned_macro. Do you see any reason for this not working? Thanks Andreas
On 11/7/06, Hartmut Kaiser
wrote: Andreas,
I need some advice regarding Wave. What I want to do is
with Wave on a token stream and extract the token stream representing macro calls after expansion. The twist is
the token stream after expansion to have the positions
preprocessor would see on the already expanded output from Wave. E.g: #define MACRO_CALL int x; int main(){ MACRO };
Exands into #line 2 int main(){ int x; };
Where 'int x' should have a position of line 3 instead of line 1 after expansion. I also want the column number to represent what we see on line 3 instead of what we see on line 1. Is
to operate that I want that another this possible ?
Ok, I've added a new pp hook (what else? :-P) to Wave:
//
///// // // The function 'generated_token' will be called by
//////////////////////////////////////////////////////////////////// the library
whenever a // token is about to be returned from the library. // // The parameter 'ctx' is a reference to the context object used for // instantiating the preprocessing iterators by the user. // // The parameter 't' is the token about to be returned from the library. // This function may alter the token, but in this case it must be // implemented with a corresponding signature: // // Token const& // generated_token(Context const& ctx, Token& t); // // which makes it possible to modify the token in place. // // The default behavior is to return the token passed as the parameter // without modification. //
////////////////////////////////////////////////////////////// /////////////
template
Token const& generated_token(Context const& ctx, Token const& t) { return t; } This should help solving your problem.
To show how this hook function may help you I added a
new sample
'real_positions' to Wave demonstrating its use. Everything is in the Boost CVS::HEAD.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andreas,
I changed the real_positions example to use a new token type carrying both, the original and the corrected positions. This makes it a bit more tricky, though, because now you have to explicitely instantiate some of the internal Wave library template types for this new token type. These explicit instantiations are normally done in the Wave library code (for the default token type).
I think this probably is the optimal solution and it is not difficult to instantiate the templates in my library although I expect it to slow down compilation considerably (?).
These need to be recompiled only if you have changed the token class, which should happen rarely.
Note: the new token type is essentially a copy of the default token type which has added the second position instance. For simplicity reasons I removed the class allocators, though. You'll have to re-add these, if needed. Please look at the file cpp_lex_token.hpp for reference).
In which instances would I need to use these class allocators? What are they used for in Wave with the standard token type?
Yes, the default token type in Wave uses a boost::pool based class allocator to improve performance. Please compare the real_positions_token and the default Wave token type in the file cpp_lex_token.hpp.
Yes. It was very easy to work with your code so that should be no problem. Are the real_positions set before or after the tokens are send to the preprocessing hooks?
The corrected position is set inside the generated_token() pp hook. IIUC correctly you're asking, whether the corrected token position will be available in other hooks as well. The answer is no. The generated_token pp hook is call at the very last point before the preprocessed token gets returned to the calling application. I see no other way to do this, because only at this point I know everything about the final token sequence. Any ideas?
An idea would be to allow calling of generated_token() from one of the other hooks. For instance, if you in rescanned_macro() know when the first macro has been fully texpanded then you can call generated_token() at that point on every token provided to rescanned_macro(). The reason why I say first macro is that a macro in it's formal definition can call another macro, so first macro calls second macro etc.
The suggestion I made above can probably also be considered an optimization (?) because you could also imagine recomputing the correct positions on every call to rescanned_macro. Do you see any reason for this not working?
If the generated_token() gets call for instance in the rescanned_macro() hook you won't be able to track the resulting position of the token because it's impossible to tell, at which expansion level inside a macro the current token got generated. Do I miss something? Regards Hartmut

An idea would be to allow calling of generated_token() from one of the other hooks. For instance, if you in rescanned_macro() know when the first macro has been fully texpanded then you can call generated_token() at that point on every token provided to rescanned_macro(). The reason why I say first macro is that a macro in it's formal definition can call another macro, so first macro calls second macro etc.
The suggestion I made above can probably also be considered an optimization (?) because you could also imagine recomputing the correct positions on every call to rescanned_macro. Do you see any reason for this not working?
If the generated_token() gets call for instance in the rescanned_macro() hook you won't be able to track the resulting position of the token because it's impossible to tell, at which expansion level inside a macro the current token got generated. Do I miss something?
You know that for each call to the hook expand_*_like_macro() there is one call to rescanned_macro(), so by using this information I think that you can know the epansion level. In order to avoid multiple calls to generated_token() it is possible to set a bit in the token which tells that it was already set or maybe the correct positions should be negative to begin with(?). Do you see any problems with such an approach?
Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andreas,
If the generated_token() gets call for instance in the rescanned_macro() hook you won't be able to track the resulting position of the token because it's impossible to tell, at which expansion level inside a macro the current token got generated. Do I miss something?
You know that for each call to the hook expand_*_like_macro() there is one call to rescanned_macro(), so by using this information I think that you can know the epansion level. In order to avoid multiple calls to generated_token() it is possible to set a bit in the token which tells that it was already set or maybe the correct positions should be negative to begin with(?). Do you see any problems with such an approach?
Even if macro expansion in Wave is implemented as a purely recursive process, it isn't one in reality. Is more like a gliding window moving over the input token stream steadily producing tokens. I see no way to determine (at least not yet - will have to think about it), whether a certain token 'is finished' or if it still will get modified by a pending rescanning step. In any case the correct position information will be available only _after_ the last rescanning. The question is, when exactly you _need_ the corrected position information of the tokens. If you need to access the corrected token information only _after_ the preprocessing, my current approach will work fine, because even if you store some of the tokens during macro expansion and the position information gets corrected only later, the corrected position information will be available. The reason is, that the real_positions_token stores the position information 'by reference' which makes it available to stored copies of the same token. Does this make sense? Regards Hartmut
participants (3)
-
Andreas Sæbjørnsen
-
Hartmut Kaiser
-
Young, Michael