Boost.Wave getting raw input tokens for code transformation
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! I'm trying to utilize Wave for simple code transformations. My starting point is to do an identity mapping through Wave: read in a source file, tokenize it, and output it again. The output shall be the same as the input on a byte by byte basis. I modified the cpp_lexer example to just output the token.value() texts. From that I discovered whitespace in "# define" will be dropped. Ok, fixed this by modifying Wave source code. Now my main problem are line continuations using backslash like in multiline macro definitions. Those get processed at a very low level, I guess. The backslash will never be reported as a token, but the adjacent characters will be fused into one token. Is there a way to define a "raw" token mode that would report the backslash and keep the adjacent characters as distinct tokens? I don't understand the tokenizer level at all. Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAlC1w/8ACgkQhAOUmAZhnmrUsQCeIVBxkYTUg1qIU2DTsSZuh5u1 2v8AniMF44HV4agBashrP81N6XN3bTW1 =A8Sp -----END PGP SIGNATURE-----
I'm trying to utilize Wave for simple code transformations. My starting point is to do an identity mapping through Wave: read in a source file, tokenize it, and output it again. The output shall be the same as the input on a byte by byte basis. I modified the cpp_lexer example to just output the token.value() texts. From that I discovered whitespace in "# define" will be dropped. Ok, fixed this by modifying Wave source code.
Now my main problem are line continuations using backslash like in multiline macro definitions. Those get processed at a very low level, I guess. The backslash will never be reported as a token, but the adjacent characters will be fused into one token. Is there a way to define a "raw" token mode that would report the backslash and keep the adjacent characters as distinct tokens? I don't understand the tokenizer level at all.
The processing of backslash/eol character sequences is handled below the tokenizer, before the input token stream is even processed by the lexer. If you want to disable that, just modify the function is_backslash() (boost/libs/wave/src/cpplexer/re2clex/cpp_re.cpp, line 182) to always return 'false'. You might also have to adjust starting line 295 to avoid checking for backslashes there. I'm not sure, though, what consequences this change might have on the rest of the library. If you come up with a general solution controllable by a flag or so I'd be happy to accept a patch. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! Am 28.11.12 13:20, schrieb Hartmut Kaiser:
If you want to disable that, just modify the function is_backslash() (boost/libs/wave/src/cpplexer/re2clex/cpp_re.cpp, line 182) to always return 'false'.
Thanks, this has worked. Maybe I can make this into a feature. Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: keyserver x-hkp://pool.sks-keyservers.net iEYEARECAAYFAlC5A3MACgkQhAOUmAZhnmrF2wCgj6tsB8QpgvqnP/bWKQuYpenP cz4An3GbBz8iR4RvxyLCUc6OAkl4jb+4 =doas -----END PGP SIGNATURE-----
participants (2)
-
Frank Birbacher
-
Hartmut Kaiser