[wave] bug report for wave

Andreas Sæbjørnsen

18 Nov 2005 18 Nov '05

6:36 p.m.

When preprocessing the code #define $jack int test; $jack using any preprocessor using the wave library, in this case the samples/lexed_tokens/lexed_tokens preprocessor , gives the following error: PP_DEFINE (#369) at test.C ( 1/ 1): >#define< SPACE (#393) at test.C ( 1/ 8): > < lexed_tokens: /home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497: boost::wave::util::flex_string<char, std::char_traits<char>, std::allocator<char>, boost::wave::util::CowString<boost::wave::util::AllocatorStringStorage<char, std::allocator<char> >, char*> > boost::wave::get_token_name(boost::wave::token_id): Assertion `id < T_LAST_TOKEN-T_FIRST_TOKEN' failed. Aborted using the preprocesor cpp (from the gcc team) I get (as cpp test.C): # 1 "test.C" # 1 "<built-in>" # 1 "<command line>" # 1 "test.C" int test; I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition. Thanks Andreas Saebjoernsen

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Hartmut Kaiser

18 Nov 18 Nov

8:11 p.m.

Andreas Sæbjørnsen wrote:

...

When preprocessing the code #define $jack int test; $jack using any preprocessor using the wave library, in this case the samples/lexed_tokens/lexed_tokens preprocessor , gives the following error:

PP_DEFINE (#369) at test.C ( 1/ 1): >#define< SPACE (#393) at test.C ( 1/ 8): > < lexed_tokens: /home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497 : boost::wave::util::flex_string<char, std::char_traits<char>, std::allocator<char>, boost::wave::util::CowString<boost::wave::util::AllocatorStrin gStorage<char, std::allocator<char> >, char*> > boost::wave::get_token_name(boost::wave::token_id): Assertion `id < T_LAST_TOKEN-T_FIRST_TOKEN' failed. Aborted

using the preprocesor cpp (from the gcc team) I get (as cpp test.C): # 1 "test.C" # 1 "<built-in>" # 1 "<command line>" # 1 "test.C"

int test;

I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.

Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier. I fixed the lexed_tokens sample (actually the Wave library) so you won't get an assertion anymore, but a meaningful output. Now the output is: PP_DEFINE (#369) at test.cpp ( 1/ 1): >#define< SPACE (#393) at test.cpp ( 1/ 8): > < <UnknownToken> (#36 ) at test.cpp ( 1/ 9): >$< IDENTIFIER (#381) at test.cpp ( 1/10): >jack< SPACE (#393) at test.cpp ( 1/14): > < INT (#335) at test.cpp ( 1/15): >int< SPACE (#393) at test.cpp ( 1/18): > < IDENTIFIER (#381) at test.cpp ( 1/19): >test< SEMICOLON (#297) at test.cpp ( 1/23): >;< NEWLINE (#395) at test.cpp ( 1/24): >\n< NEWLINE (#395) at test.cpp ( 2/ 1): >\n< <UnknownToken> (#36 ) at test.cpp ( 3/ 1): >$< IDENTIFIER (#381) at test.cpp ( 3/ 2): >jack< NEWLINE (#395) at test.cpp ( 3/ 6): >\n< BTW: when using the wave driver for the code given above you get: test.cpp(1): error: ill formed preprocessor directive: #define What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions? HTH Regards Hartmut

Andreas Sæbjørnsen

10:32 p.m.

I think the best soulution would be like you suggested to add the '$' character to the valid basic source character set to allow identifiers containeing a $. I therefore tested out how different major compilers and versions of these compilers handle this. The following compilers and versions of the compilers allow a '$' character within the identifiers (of macros): gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2 icc (intel c compiler) 8.0, 8.1 I would corroborate for the fact that it seems like there is some traction for allowing '$' characters within identifiers in major compiler preprocessors. I have not been able to reproduce the error you got with GCC or with any other compiler . Which compiler and version did you use when you got your error message? Thanks Andreas On 11/18/05, Hartmut Kaiser <hartmut.kaiser@gmail.com> wrote:

...

Andreas Sæbjørnsen wrote:

...
When preprocessing the code #define $jack int test; $jack using any preprocessor using the wave library, in this case the samples/lexed_tokens/lexed_tokens preprocessor , gives the following error:

PP_DEFINE (#369) at test.C ( 1/ 1): >#define< SPACE (#393) at test.C ( 1/ 8): > < lexed_tokens: /home/saebjornsen1/projects/boost/boost/wave/token_ids.hpp:497 : boost::wave::util::flex_string<char, std::char_traits<char>, std::allocator<char>, boost::wave::util::CowString<boost::wave::util::AllocatorStrin gStorage<char, std::allocator<char> >, char*> > boost::wave::get_token_name(boost::wave::token_id): Assertion `id < T_LAST_TOKEN-T_FIRST_TOKEN' failed. Aborted

using the preprocesor cpp (from the gcc team) I get (as cpp test.C): # 1 "test.C" # 1 "<built-in>" # 1 "<command line>" # 1 "test.C"

int test;

I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.

Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.

I fixed the lexed_tokens sample (actually the Wave library) so you won't get an assertion anymore, but a meaningful output. Now the output is:

PP_DEFINE (#369) at test.cpp ( 1/ 1): >#define< SPACE (#393) at test.cpp ( 1/ 8): > < <UnknownToken> (#36 ) at test.cpp ( 1/ 9): >$< IDENTIFIER (#381) at test.cpp ( 1/10): >jack< SPACE (#393) at test.cpp ( 1/14): > < INT (#335) at test.cpp ( 1/15): >int< SPACE (#393) at test.cpp ( 1/18): > < IDENTIFIER (#381) at test.cpp ( 1/19): >test< SEMICOLON (#297) at test.cpp ( 1/23): >;< NEWLINE (#395) at test.cpp ( 1/24): >\n< NEWLINE (#395) at test.cpp ( 2/ 1): >\n< <UnknownToken> (#36 ) at test.cpp ( 3/ 1): >$< IDENTIFIER (#381) at test.cpp ( 3/ 2): >jack< NEWLINE (#395) at test.cpp ( 3/ 6): >\n<

BTW: when using the wave driver for the code given above you get:

test.cpp(1): error: ill formed preprocessor directive: #define

What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?

HTH Regards Hartmut

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hartmut Kaiser

19 Nov 19 Nov

6:43 p.m.

Andreas Sæbjørnsen wrote:

...

I think the best soulution would be like you suggested to add the '$' character to the valid basic source character set to allow identifiers containeing a $. I therefore tested out how different major compilers and versions of these compilers handle this. The following compilers and versions of the compilers allow a '$' character within the identifiers (of macros):

gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2 icc (intel c compiler) 8.0, 8.1

I would corroborate for the fact that it seems like there is some traction for allowing '$' characters within identifiers in major compiler preprocessors.

I have not been able to reproduce the error you got with GCC or with any other compiler . Which compiler and version did you use when you got your error message?

I agree with you, that most (if not all) of the existing compilers allow '$' to be used in identifiers. So I decided to (optionally) include this behaviour into Wave as well (and I made it the default behaviour). To maintain the possibility of having a strictly Standards conforming preprocessor it is possible now to configure the Wave library by defining the BOOST_WAVE_USE_STRICT_LEXER constant during compilation, which will revert to the previous behaviour. HTH Regards Hartmut

Andreas Sæbjørnsen

11:04 p.m.

Thanks. I look forward to testing it. :) Regards Andreas On 11/19/05, Hartmut Kaiser <hartmut.kaiser@gmail.com> wrote:

...

Andreas Sæbjørnsen wrote:

...
I think the best soulution would be like you suggested to add the '$' character to the valid basic source character set to allow identifiers containeing a $. I therefore tested out how different major compilers and versions of these compilers handle this. The following compilers and versions of the compilers allow a '$' character within the identifiers (of macros):

gcc 2.95, 3.1, 3.2, 3.3.3, 3.4.3 and 4.0.2 icc (intel c compiler) 8.0, 8.1

I would corroborate for the fact that it seems like there is some traction for allowing '$' characters within identifiers in major compiler preprocessors.

I have not been able to reproduce the error you got with GCC or with any other compiler . Which compiler and version did you use when you got your error message?

I agree with you, that most (if not all) of the existing compilers allow '$' to be used in identifiers. So I decided to (optionally) include this behaviour into Wave as well (and I made it the default behaviour). To maintain the possibility of having a strictly Standards conforming preprocessor it is possible now to configure the Wave library by defining the BOOST_WAVE_USE_STRICT_LEXER constant during compilation, which will revert to the previous behaviour.

HTH Regards Hartmut

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Daryle Walker

21 Nov 21 Nov

12:27 a.m.

On 11/18/05 3:11 PM, "Hartmut Kaiser" <hartmut.kaiser@gmail.com> wrote:

...

Andreas Sæbjørnsen wrote: [SNIP]

...
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.

Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.

Nope. Right answer, wrong reason. An identifier does _not_ have to be made solely of basic source characters (look in section 2.10 of the standard). A universal character name can also be part of an identifier. Not all UCNs are allowed in identifiers; the real problem is that the '$' character is _also_ in the excluded set of UCNs. (The list of allowed UCNs is in Annex E of the standard.) [SNIP]

...

What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?

Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing. Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed. Other posts in this thread suggest that everyone out there accepts '$' as an identifier character. Was that intentional? I've never heard of anyone using '$' within an identifier until now. Maybe we should submit bug reports for everyone on their lexers. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Hartmut Kaiser

1:37 a.m.

Daryle Walker wrote:

...

...
Andreas Sæbjørnsen wrote: [SNIP]

...
I believe the problem lies in that the lexer does not recognize the '$' as a valid character within the name of macro definition.

Accordingly to the Standard the '$' character is _not_ part of the basic source character set (see 2.2.1 [lex.charset]). For this reason it won't get recognized as the part of a identifier.

Nope. Right answer, wrong reason. An identifier does _not_ have to be made solely of basic source characters (look in section 2.10 of the standard). A universal character name can also be part of an identifier. Not all UCNs are allowed in identifiers; the real problem is that the '$' character is _also_ in the excluded set of UCNs. (The list of allowed UCNs is in Annex E of the standard.)

Thanks for clarifying this. But since '$' is in the exluded set of UCNs it boils down to the same effect in the end (at least for Wave - see below).

...

[SNIP]

...
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?

Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing.

It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).

...

Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.

I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable.

...

Other posts in this thread suggest that everyone out there accepts '$' as an identifier character. Was that intentional? I've never heard of anyone using '$' within an identifier until now. Maybe we should submit bug reports for everyone on their lexers.

This is probably done since '$' is valid for most assembler languages and the mentioned compilers all have some assembler inline directives. Regards Hartmut

Daryle Walker

10:06 p.m.

On 11/20/05 8:37 PM, "Hartmut Kaiser" <hartmut.kaiser@gmail.com> wrote:

...

Daryle Walker wrote:

...
Hartmut (?) wrote: [SNIP]

...
What certainly could be done additionally is to add the '$' character to the valid basic source character set to allow identifiers conatining a '$', but this weakens the Standards conformance of Wave. Any suggestions?

Maybe it can be moved to the valid UCN list instead. This assumes that Wave is currently capable of UCN processing.

It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).

This isn't internal-to-execution translation, but source-to-internal instead. Any non-basic characters, even if they have an actual symbol (like '$') should get resolved like the \u or \U notation. But this is dependent on whatever character set is used for a platform's text files.

...

...
Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.

I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable. [TRUNCATE]

I would suggest keeping '$' as an extended character, but put it (optionally) in the identifier-legal list. That way we minimize the amount of power '$' gets. Also, you do allow \u and \U notation characters to be placed in identifiers, right? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Hartmut Kaiser

11:28 p.m.

Daryle Walker wrote:

...

...
It is. And it checks for UCN validity, but only as long these are specified as \uxxxx or \Uxxxxxxxxx. (Wave leaves the actual translation into the execution character set to the compiler which processes the preprocessed Wave output - it acts solely on the character level).

This isn't internal-to-execution translation, but source-to-internal instead. Any non-basic characters, even if they have an actual symbol (like '$') should get resolved like the \u or \U notation. But this is dependent on whatever character set is used for a platform's text files.

Sure.

...

...
...
Either way, you should make this optional, and disabled by default, to allow Standards conformance when needed.

I've added the '$' to the basic source character set and it is allowed to be part of an identifier name now. I made this optional (configurable at compile time). Currently its on by default ('$' is recognised), but this arguable. [TRUNCATE]

I would suggest keeping '$' as an extended character, but put it (optionally) in the identifier-legal list.

Ok, agreed. '$' is allowed only inside identifiers (or makes up an identifier by itself).

...

That way we minimize the amount of power '$' gets. Also, you do allow \u and \U notation characters to be placed in identifiers, right?

Yes, it's allowed. Regards Hartmut

john.wismar＠autozone.com

18 Nov 18 Nov

9 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to this change: 1.32.0: bool operator>>(InputStreamable &output) { return !is_pointer<InputStreamable>::value && stream >> output && (stream >> std::ws).eof(); } 1.33.0: bool operator>>(InputStreamable &output) { return !is_pointer<InputStreamable>::value && stream >> output && stream.get() == #if defined(__GNUC__) && (__GNUC__<3) && defined(BOOST_NO_STD_WSTRING) // GCC 2.9x lacks std::char_traits<>::eof(). // We use BOOST_NO_STD_WSTRING to filter out STLport and libstdc++-v3 // configurations, which do provide std::char_traits<>::eof(). EOF; #else std::char_traits<char_type>::eof(); #endif } It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() == that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior? -------------------------------- John Wismar john.wismar@autozone.com

Doug Gregor

9:41 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:

...

We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==

that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?

This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1. Doug

john.wismar＠autozone.com

5 Dec 5 Dec

4:16 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

Doug Gregor <dgregor@cs.indiana.edu> wrote on 11/18/2005 03:41:30 PM:

...

On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:

...
We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to

...

...
this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==

that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?

This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1.

Looks like the change was not made in the RC.... -------------------------------- John Wismar john.wismar@autozone.com

Thomas Matelich

6:40 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

http://lists.boost.org/Archives/boost/2005/08/91573.php On 12/5/05, john.wismar@autozone.com <john.wismar@autozone.com> wrote:

...

Doug Gregor <dgregor@cs.indiana.edu> wrote on 11/18/2005 03:41:30 PM:

...
On Nov 18, 2005, at 4:00 PM, john.wismar@autozone.com wrote:

...
We've noticed some changed behavior in a module that's using lexical_cast, having gone from 1.32.0 to 1.33.0. I tracked it down to this change: [snip It turns out that we are passing a string containing a number with a trailing space. The 1.32.0 version of lexical_cast did not mind the trailing space, but the 1.33.0 version throws a bad_lexical_cast exception. I found that if I change this: stream.get() == to this: (stream >> std::ws).get() ==

that my issue goes away, but the workaround for compilers without the eof() function still works.... Would it be possilbe to add this into 1.34.0, or is this undesirable behavior?

This looks like a reasonably change to me; I hope that someone more involved in lexical_cast<> can give us more information, either rationale for the 1.33.0 behavior or confirmation that this is a bug. If it is a change that needs to be made, we'll put it in 1.33.1.

Looks like the change was not made in the RC....

-------------------------------- John Wismar john.wismar@autozone.com

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

john.wismar＠autozone.com

7:15 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

Thomas Matelich <matelich@gmail.com> wrote on 12/05/2005 12:40:34 PM:

...

http://lists.boost.org/Archives/boost/2005/08/91573.php

Thanks for the pointer. I missed that in my search.... And I agree with your comment in the linked-to email. The list of things I have to do to get Boost to work after each update grows longer and longer.... -------------------------------- John Wismar john.wismar@autozone.com

Thomas Matelich

9:02 p.m.

New subject: [lexical_cast][1.33.0] Changed behavior from 1.32.0?

If I were smart, I'd just implement an overloadable lexical_convert as has been discussed a few times and be done with it. Not sure why I put sooo much stock in plain Boost support. Probably out of hope that someday, lexical_cast will be standardized. Laziness too I suppose :). On 12/5/05, john.wismar@autozone.com <john.wismar@autozone.com> wrote:

...

Thomas Matelich <matelich@gmail.com> wrote on 12/05/2005 12:40:34 PM:

...
http://lists.boost.org/Archives/boost/2005/08/91573.php

Thanks for the pointer. I missed that in my search....

And I agree with your comment in the linked-to email. The list of things I have to do to get Boost to work after each update grows longer and longer....

-------------------------------- John Wismar john.wismar@autozone.com

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

7201

Age (days ago)

7218

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

Andreas Sæbjørnsen
Daryle Walker
Doug Gregor
Hartmut Kaiser
john.wismar＠autozone.com
Thomas Matelich