[wave] bug report for wave
When preprocessing the code
#define $jack int test;
$jack
using any preprocessor using the wave library, in this case the
samples/lexed_tokens/lexed_tokens preprocessor , gives the following error:
PP_DEFINE (#369) at test.C ( 1/ 1): >#define<
SPACE (#393) at test.C ( 1/ 8): > <
lexed_tokens:
/home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497:
boost::wave::util::flex_string
Andreas Sæbjørnsen wrote:
When preprocessing the code #define $jack int test; $jack using any preprocessor using the wave library, in this case the samples/lexed_tokens/lexed_tokens preprocessor , gives the following error:
PP_DEFINE (#369) at test.C ( 1/ 1): >#define< SPACE (#393) at test.C ( 1/ 8): > < lexed_tokens: /home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497 : boost::wave::util::flex_string
, char*> > boost::wave::get_token_name(boost::wave::token_id): Assertion `id < T_LAST_TOKEN-T_FIRST_TOKEN' failed. Aborted using the preprocesor cpp (from the gcc team) I get (as cpp test.C): # 1 "test.C" # 1 "<built-in>" # 1 "<command line>" # 1 "test.C"
int test;
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.
Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier. I fixed the lexed_tokens sample (actually the Wave library) so you won't get an assertion anymore, but a meaningful output. Now the output is: PP_DEFINE (#369) at test.cpp ( 1/ 1): >#define< SPACE (#393) at test.cpp ( 1/ 8): > < <UnknownToken> (#36 ) at test.cpp ( 1/ 9): >$< IDENTIFIER (#381) at test.cpp ( 1/10): >jack< SPACE (#393) at test.cpp ( 1/14): > < INT (#335) at test.cpp ( 1/15): >int< SPACE (#393) at test.cpp ( 1/18): > < IDENTIFIER (#381) at test.cpp ( 1/19): >test< SEMICOLON (#297) at test.cpp ( 1/23): >;< NEWLINE (#395) at test.cpp ( 1/24): >\n< NEWLINE (#395) at test.cpp ( 2/ 1): >\n< <UnknownToken> (#36 ) at test.cpp ( 3/ 1): >$< IDENTIFIER (#381) at test.cpp ( 3/ 2): >jack< NEWLINE (#395) at test.cpp ( 3/ 6): >\n< BTW: when using the wave driver for the code given above you get: test.cpp(1): error: ill formed preprocessor directive: #define What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions? HTH Regards Hartmut
I think the best soulution would be like you suggested to add the '$'
character to the valid basic source character set to allow identifiers
containeing a $. I therefore tested out how different major compilers and
versions of these compilers handle this. The following compilers and
versions of the compilers allow a '$' character within the identifiers (of
macros):
gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2
icc (intel c compiler) 8.0, 8.1
I would corroborate for the fact that it seems like there is some traction
for allowing '$' characters within identifiers in major compiler
preprocessors.
I have not been able to reproduce the error you got with GCC or with any
other compiler . Which compiler and version did you use when you got your
error message?
Thanks
Andreas
On 11/18/05, Hartmut Kaiser
Andreas Sæbjørnsen wrote:
When preprocessing the code #define $jack int test; $jack using any preprocessor using the wave library, in this case the samples/lexed_tokens/lexed_tokens preprocessor , gives the following error:
PP_DEFINE (#369) at test.C ( 1/ 1): >#define< SPACE (#393) at test.C ( 1/ 8): > < lexed_tokens: /home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497 : boost::wave::util::flex_string
, char*> > boost::wave::get_token_name(boost::wave::token_id): Assertion `id < T_LAST_TOKEN-T_FIRST_TOKEN' failed. Aborted using the preprocesor cpp (from the gcc team) I get (as cpp test.C): # 1 "test.C" # 1 "<built-in>" # 1 "<command line>" # 1 "test.C"
int test;
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.
Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.
I fixed the lexed_tokens sample (actually the Wave library) so you won't get an assertion anymore, but a meaningful output. Now the output is:
PP_DEFINE (#369) at test.cpp ( 1/ 1): >#define< SPACE (#393) at test.cpp ( 1/ 8): > < <UnknownToken> (#36 ) at test.cpp ( 1/ 9): >$< IDENTIFIER (#381) at test.cpp ( 1/10): >jack< SPACE (#393) at test.cpp ( 1/14): > < INT (#335) at test.cpp ( 1/15): >int< SPACE (#393) at test.cpp ( 1/18): > < IDENTIFIER (#381) at test.cpp ( 1/19): >test< SEMICOLON (#297) at test.cpp ( 1/23): >;< NEWLINE (#395) at test.cpp ( 1/24): >\n< NEWLINE (#395) at test.cpp ( 2/ 1): >\n< <UnknownToken> (#36 ) at test.cpp ( 3/ 1): >$< IDENTIFIER (#381) at test.cpp ( 3/ 2): >jack< NEWLINE (#395) at test.cpp ( 3/ 6): >\n<
BTW: when using the wave driver for the code given above you get:
test.cpp(1): error: ill formed preprocessor directive: #define
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Andreas Sæbjørnsen wrote:
I think the best soulution would be like you suggested to add the '$' character to the valid basic source character set to allow identifiers containeing a $. I therefore tested out how different major compilers and versions of these compilers handle this. The following compilers and versions of the compilers allow a '$' character within the identifiers (of macros):
gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2 icc (intel c compiler) 8.0, 8.1
I would corroborate for the fact that it seems like there is some traction for allowing '$' characters within identifiers in major compiler preprocessors.
I have not been able to reproduce the error you got with GCC or with any other compiler . Which compiler and version did you use when you got your error message?
I agree with you, that most (if not all) of the existing compilers allow '$' to be used in identifiers. So I decided to (optionally) include this behaviour into Wave as well (and I made it the default behaviour). To maintain the possibility of having a strictly Standards conforming preprocessor it is possible now to configure the Wave library by defining the BOOST_WAVE_USE_STRICT_LEXER constant during compilation, which will revert to the previous behaviour. HTH Regards Hartmut
Thanks. I look forward to testing it. :)
Regards
Andreas
On 11/19/05, Hartmut Kaiser
Andreas Sæbjørnsen wrote:
I think the best soulution would be like you suggested to add the '$' character to the valid basic source character set to allow identifiers containeing a $. I therefore tested out how different major compilers and versions of these compilers handle this. The following compilers and versions of the compilers allow a '$' character within the identifiers (of macros):
gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2 icc (intel c compiler) 8.0, 8.1
I would corroborate for the fact that it seems like there is some traction for allowing '$' characters within identifiers in major compiler preprocessors.
I have not been able to reproduce the error you got with GCC or with any other compiler . Which compiler and version did you use when you got your error message?
I agree with you, that most (if not all) of the existing compilers allow '$' to be used in identifiers. So I decided to (optionally) include this behaviour into Wave as well (and I made it the default behaviour). To maintain the possibility of having a strictly Standards conforming preprocessor it is possible now to configure the Wave library by defining the BOOST_WAVE_USE_STRICT_LEXER constant during compilation, which will revert to the previous behaviour.
HTH Regards Hartmut
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 11/18/05 3:11 PM, "Hartmut Kaiser"
Andreas Sæbjørnsen wrote: [SNIP]
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.
Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.
Nope. Right answer, wrong reason. An identifier does _not_ have to be made solely of basic source characters (look in section 2.10 of the standard). A universal character name can also be part of an identifier. Not all UCNs are allowed in identifiers; the real problem is that the '$' character is _also_ in the excluded set of UCNs. (The list of allowed UCNs is in Annex E of the standard.) [SNIP]
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?
Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing. Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed. Other posts in this thread suggest that everyone out there accepts '$' as an identifier character. Was that intentional? I've never heard of anyone using '$' within an identifier until now. Maybe we should submit bug reports for everyone on their lexers. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
Andreas Sæbjørnsen wrote: [SNIP]
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.
Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.
Nope. Right answer, wrong reason. An identifier does _not_ have to be made solely of basic source characters (look in section 2.10 of the standard). A universal character name can also be part of an identifier. Not all UCNs are allowed in identifiers; the real problem is that the '$' character is _also_ in the excluded set of UCNs. (The list of allowed UCNs is in Annex E of the standard.)
Thanks for clarifying this. But since '$' is in the exluded set of UCNs it boils down to the same effect in the end (at least for Wave - see below).
[SNIP]
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?
Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing.
It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).
Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.
I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable.
Other posts in this thread suggest that everyone out there accepts '$' as an identifier character. Was that intentional? I've never heard of anyone using '$' within an identifier until now. Maybe we should submit bug reports for everyone on their lexers.
This is probably done since '$' is valid for most assembler languages and the mentioned compilers all have some assembler inline directives. Regards Hartmut
On 11/20/05 8:37 PM, "Hartmut Kaiser"
Daryle Walker wrote:
Hartmut (?) wrote: [SNIP]
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?
Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing.
It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).
This isn't internal-to-execution translation, but source-to-internal instead. Any non-basic characters, even if they have an actual symbol (like '$') should get resolved like the \u or \U notation. But this is dependent on whatever character set is used for a platform's text files.
Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.
I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable. [TRUNCATE]
I would suggest keeping '$' as an extended character, but put it (optionally) in the identifier-legal list. That way we minimize the amount of power '$' gets. Also, you do allow \u and \U notation characters to be placed in identifiers, right? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
Daryle Walker wrote:
It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).
This isn't internal-to-execution translation, but source-to-internal instead. Any non-basic characters, even if they have an actual symbol (like '$') should get resolved like the \u or \U notation. But this is dependent on whatever character set is used for a platform's text files.
Sure.
Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.
I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable. [TRUNCATE]
I would suggest keeping '$' as an extended character, but put it (optionally) in the identifier-legal list.
Ok, agreed. '$' is allowed only inside identifiers (or makes up an identifier by itself).
That way we minimize the amount of power '$' gets. Also, you do allow \u and \U notation characters to be placed in identifiers, right?
Yes, it's allowed. Regards Hartmut
We've noticed some changed behavior in a module that's using lexical_cast,
having gone from 1.32.0 to 1.33.0. I tracked it down to this change:
1.32.0:
bool operator>>(InputStreamable &output)
{
return !is_pointer<InputStreamable>::value &&
stream >> output &&
(stream >> std::ws).eof();
}
1.33.0:
bool operator>>(InputStreamable &output)
{
return !is_pointer<InputStreamable>::value &&
stream >> output &&
stream.get() ==
#if defined(__GNUC__) && (__GNUC__<3) && defined(BOOST_NO_STD_WSTRING)
// GCC 2.9x lacks std::char_traits<>::eof().
// We use BOOST_NO_STD_WSTRING to filter out STLport and libstdc++-v3
// configurations, which do provide std::char_traits<>::eof().
EOF;
#else
std::char_traits
On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:
We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==
that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?
This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1. Doug
Doug Gregor
On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:
We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to
this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==
that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?
This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1.
Looks like the change was not made in the RC.... -------------------------------- John Wismar john.wismar@autozone.com
http://lists.boost.org/Archives/boost/2005/08/91573.php
On 12/5/05, john.wismar@autozone.com
Doug Gregor
wrote on 11/18/2005 03:41:30 PM: On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:
We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==
that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?
This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1.
Looks like the change was not made in the RC....
-------------------------------- John Wismar john.wismar@autozone.com
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Thomas Matelich
Thanks for the pointer. I missed that in my search.... And I agree with your comment in the linked-to email. The list of things I have to do to get Boost to work after each update grows longer and longer.... -------------------------------- John Wismar john.wismar@autozone.com
If I were smart, I'd just implement an overloadable lexical_convert as
has been discussed a few times and be done with it. Not sure why I
put sooo much stock in plain Boost support. Probably out of hope that
someday, lexical_cast will be standardized. Laziness too I suppose
:).
On 12/5/05, john.wismar@autozone.com
Thomas Matelich
wrote on 12/05/2005 12:40:34 PM: Thanks for the pointer. I missed that in my search....
And I agree with your comment in the linked-to email. The list of things I have to do to get Boost to work after each update grows longer and longer....
-------------------------------- John Wismar john.wismar@autozone.com
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (6)
-
Andreas Sæbjørnsen
-
Daryle Walker
-
Doug Gregor
-
Hartmut Kaiser
-
john.wismar@autozone.com
-
Thomas Matelich