[preprocessor] check if a token is a keyword (was "BOOST_PP_IS_UNARY()")

Lorenzo Caminiti

13 Aug 2010 13 Aug '10

2:52 p.m.

Hello all, I asked this question before but received no answer: Is there any problem in using `BOOST_PP_UNARY()` to check if a token matches a predefined keyword as indicated by the code below? The following `IS_PUBLIC(token)` macro expands to 1 if `token` is `public`, to 0 otherwise: #define PUBLIC_public (1) #define IS_PUBLIC(token) BOOST_PP_IS_UNARY(BOOST_PP_CAT(PUBLIC_, token)) IS_PUBLIC(public) // Expand to 1. IS_PUBLIC(abc) // Expand to 0. This works just fine on both GCC and MSVC. However, `BOOST_PP_UNARY()` is only part of Boost.Preprocessor private API because: Paul Mensonides mentioned in http://lists.boost.org/Archives/boost/2004/08/70238.php

...

Paul Mensonides wrote: I know. They (and other types of detection macros) are very useful. Right now, the Borland configuration is used by the IBM preprocessor and the Sun preprocessor also. I'd like to know whether or not those macro definitions work correctly on those preprocessors before I add it. Though those macros aren't part of the public interface, they are part of the internal interface. I.e. they are used internally as if they were a public interface (though they are used carefully on some preprocessors). They are stable, and the worst thing that might happen in the future is that they will be moved to a different directory. Thus, you can use them, but beware of VC and versions of MWCW prior to v9. They in particular have severe problems with expansion order. E.g. VC will even fail this test:

(Boost.Preprocessor, Boost.Typeof, and Boost.Spirit already use `BOOST_PP_UNARY()` internally.) Thank you very much. -- Lorenzo

Show replies by date

Paul Mensonides

16 Aug 16 Aug

7:24 a.m.

On 8/13/2010 7:52 AM, Lorenzo Caminiti wrote:

...

Hello all,

I asked this question before but received no answer: Is there any problem in using `BOOST_PP_UNARY()` to check if a token matches a predefined keyword as indicated by the code below?

The following `IS_PUBLIC(token)` macro expands to 1 if `token` is `public`, to 0 otherwise:

#define PUBLIC_public (1) #define IS_PUBLIC(token) BOOST_PP_IS_UNARY(BOOST_PP_CAT(PUBLIC_, token))

IS_PUBLIC(public) // Expand to 1. IS_PUBLIC(abc) // Expand to 0.

This works just fine on both GCC and MSVC. However, `BOOST_PP_UNARY()` is only part of Boost.Preprocessor private API because:

IS_PUBLIC(+) results in undefined behavior. More generally, the result of token-pasting must result in a single (preprocessing) token. So, for example, BOOST_PP_CAT(abc, 123) and BOOST_PP_CAT(+, +) are okay, but BOOST_PP_CAT(abc, 123.0) and BOOST_PP_CAT(+, -) are not. Similarly for what you have above which will work provided everything used as an argument is an identifier (there are no "keywords" to the preprocessor) or a pp-number that doesn't contain any decimal points. Regards, Paul Mensonides

Lorenzo Caminiti

17 Aug 17 Aug

4:21 a.m.

On Mon, Aug 16, 2010 at 3:24 AM, Paul Mensonides <pmenso57@comcast.net> wrote:

...

On 8/13/2010 7:52 AM, Lorenzo Caminiti wrote:

...
Hello all,

I asked this question before but received no answer: Is there any problem in using `BOOST_PP_UNARY()` to check if a token matches a predefined keyword as indicated by the code below?

The following `IS_PUBLIC(token)` macro expands to 1 if `token` is `public`, to 0 otherwise:

#define PUBLIC_public (1) #define IS_PUBLIC(token) BOOST_PP_IS_UNARY(BOOST_PP_CAT(PUBLIC_, token))

IS_PUBLIC(public) // Expand to 1. IS_PUBLIC(abc) // Expand to 0.

This works just fine on both GCC and MSVC. However, `BOOST_PP_UNARY()` is only part of Boost.Preprocessor private API because:

IS_PUBLIC(+) results in undefined behavior. More generally, the result of token-pasting must result in a single (preprocessing) token. So, for example, BOOST_PP_CAT(abc, 123) and BOOST_PP_CAT(+, +) are okay, but BOOST_PP_CAT(abc, 123.0) and BOOST_PP_CAT(+, -) are not. Similarly for what you have above which will work provided everything used as an argument is an identifier (there are no "keywords" to the preprocessor) or a pp-number that doesn't contain any decimal points.

Yes, I am aware of this "limitation". However, for my application it is not a problem to limit the argument of `IS_PUBLIC()` to pp-identifiers and pp-numbers with no decimal points (if interested, see "MY APPLICATION" below). 1) Out of curiosity, is there a way to implement `IS_PUBLIC()` (perhaps without using `BOOST_PP_CAT()`) so it does not have this limitation? (I could not think of any.) 2) Also, does the expansion of any of the following result in undefined behavior? (I don't think so...) IS_PUBLIC(public abc) // Expand to 1. IS_PUBLIC(public::) // Expand to 1. IS_PUBLIC(public(abc, ::)) // Expand to 1. IS_PUBLIC(public (abc) (yxz)) // Expand to 1. (My application relies on some of these expansions to work.) MY APPLICATION I am using `IS_PUBLIC()` and similar macros to program the preprocessor to *parse* a Boost.Preprocessor sequence of tokens that represents a function signature. For example: class c { public: void f(int x) const; // Usual function declaration. }; class c { PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences. (public) (void) (f)( (int)(x) ) (const) ); }; The parser macro above can say "the signature sequence starts with `public` so this is a member function" at a preprocessor metaprogramming level and then expand to special code as a library might need to handle member functions. The parser macros can even do some basic syntax error checking -- for example, if `(const)` is specified as cv-qualifier at the end of the signature sequence of a non-member function, the parser macro can check that and expand to a compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using `BOOST_MPL_ASSERT_MSG()`). Most of the tokens within C++ funciton signatures are composed of pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are some exceptions like `,` to separate funciton parameters, `<`/`>` for templates, `:` for constructors' member initializers, etc. The grammar of my preprocessor parser macros requires the use of different tokens in these cases. For example, parenthesis `(`/`)` are used for templates instead of `<`/`>`: template< typename T > f(T x); // Usual. PARSE_FUNCTION_DECL( // PP-sequence. (template)( (typename)(T) ) (f)( (T)(x) ) ); (Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which will have caused the parser macro to fail when inspecting `(<)` via one of the `IS_XXX()` macros as per the limitation from using `BOOST_PP_CAT()` mentioned above.) The grammar of my preprocessor parser macros clearly documents that only pp-identifiers can be passed as tokens of the function signature sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above is not a problem for my application. Thank you very much. -- Lorenzo

Paul Mensonides

6:12 a.m.

On 8/16/2010 9:21 PM, Lorenzo Caminiti wrote:

...

Yes, I am aware of this "limitation". However, for my application it is not a problem to limit the argument of `IS_PUBLIC()` to pp-identifiers and pp-numbers with no decimal points (if interested, see "MY APPLICATION" below).

1) Out of curiosity, is there a way to implement `IS_PUBLIC()` (perhaps without using `BOOST_PP_CAT()`) so it does not have this limitation? (I could not think of any.)

The limitation is not BOOST_PP_CAT per se, but token-pasting in general. The "good" part of using BOOST_PP_CAT in combination with BOOST_PP_IS_NULLARY, et al, is that they have been "hacked" together for preprocessors that are broken. Effectively, the detection macros work by manipulating the operational syntax of macro expansion. For that to work, stuff has to happen (namely, macros being expanded) at roughly the correct time. The basic problem with VC++, for example, is that they don't, so the pp-lib works overtime to attempt to _force_ expansions all over the library. Unfortunately, there is a limit to what can be forced--particularly with more advanced manipulations of the macro expansion process such as those used by Chaos where there is analogy to the uncertainty principle (e.g. you cannot force expansion in may contexts without changing the result = you cannot measure particle velocity and position at the same time). Even with those types of manipulations, however, there is no way to do the above with "smashing the particles together and seeing what comes out." The limitation is caused by the ridiculous limitation that token-pasting arbitrary tokens together where the result is not a single token results in undefined behavior. Even to detect this scenario, the simplest implementation in a preprocessor is to simply juxtapose the characters making up the tokens and re-tokenize them. If there is more than one, issue diagnostic, otherwise insert the single token. A better definition would be simply to insert the resulting sequence of tokens.

...

2) Also, does the expansion of any of the following result in undefined behavior? (I don't think so...)

IS_PUBLIC(public abc) // Expand to 1. IS_PUBLIC(public::) // Expand to 1. IS_PUBLIC(public(abc, ::)) // Expand to 1. IS_PUBLIC(public (abc) (yxz)) // Expand to 1.

(My application relies on some of these expansions to work.)

All of those look fine. Basically, what happens in the following #define M(a) id ## a The appearance of the formal parameter 'a' adjacent to the token-pasting operator affects _which_ actual parameter is substituted. Namely, the version of the actual parameter which has _not_ had macros replaced in it. However, the token-pasting operation doesn't occur until after that substitution, and its operands are only the two _tokens_ immediately adjacent to it. E.g. #define A() 123 #define B(x) x id ## x B(A()) => 123 id ## A() => 123 idA() I.e. the token-pasting operator affects the expansion of the actual parameter (at least in that substitution context), but its operands are only the tokens on either side after that substitution. Because of that, you're basically getting: PREFIX_ ## public abc PREFIX_ ## public :: PREFIX_ ## public ( abc , :: ) PREFIX_ ## public ( abc ) ( yxz ) ...all of which are okay.

...

MY APPLICATION

I am using `IS_PUBLIC()` and similar macros to program the preprocessor to *parse* a Boost.Preprocessor sequence of tokens that represents a function signature. For example:

class c { public: void f(int x) const; // Usual function declaration. };

class c { PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences. (public) (void) (f)( (int)(x) ) (const) ); };

What happens with stuff like pointers, or does that not matter for your application? E.g. (public) (void) (f)( (int*)(x) ) (const) ?

...

The parser macro above can say "the signature sequence starts with `public` so this is a member function" at a preprocessor metaprogramming level and then expand to special code as a library might need to handle member functions. The parser macros can even do some basic syntax error checking -- for example, if `(const)` is specified as cv-qualifier at the end of the signature sequence of a non-member function, the parser macro can check that and expand to a compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using `BOOST_MPL_ASSERT_MSG()`).

Most of the tokens within C++ funciton signatures are composed of pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are some exceptions like `,` to separate funciton parameters, `<`/`>` for templates, `:` for constructors' member initializers, etc. The grammar of my preprocessor parser macros requires the use of different tokens in these cases. For example, parenthesis `(`/`)` are used for templates instead of `<`/`>`:

template< typename T> f(T x); // Usual.

PARSE_FUNCTION_DECL( // PP-sequence. (template)( (typename)(T) ) (f)( (T)(x) ) );

(Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which will have caused the parser macro to fail when inspecting `(<)` via one of the `IS_XXX()` macros as per the limitation from using `BOOST_PP_CAT()` mentioned above.)

The grammar of my preprocessor parser macros clearly documents that only pp-identifiers can be passed as tokens of the function signature sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above is not a problem for my application.

Thank you very much.

You're welcome. I don't know the ultimate purpose of this encoding, but the encoding itself doesn't look too bad. Regards, Paul Mensonides

Lorenzo Caminiti

12:17 p.m.

On Tue, Aug 17, 2010 at 2:12 AM, Paul Mensonides <pmenso57@comcast.net> wrote:

...

On 8/16/2010 9:21 PM, Lorenzo Caminiti wrote:

...
Yes, I am aware of this "limitation". However, for my application it is not a problem to limit the argument of `IS_PUBLIC()` to pp-identifiers and pp-numbers with no decimal points (if interested, see "MY APPLICATION" below).

1) Out of curiosity, is there a way to implement `IS_PUBLIC()` (perhaps without using `BOOST_PP_CAT()`) so it does not have this limitation? (I could not think of any.)

The limitation is not BOOST_PP_CAT per se, but token-pasting in general. The "good" part of using BOOST_PP_CAT in combination with BOOST_PP_IS_NULLARY, et al, is that they have been "hacked" together for preprocessors that are broken. Effectively, the detection macros work by

Yes, I understand.

...

manipulating the operational syntax of macro expansion. For that to work, stuff has to happen (namely, macros being expanded) at roughly the correct time. The basic problem with VC++, for example, is that they don't, so the pp-lib works overtime to attempt to _force_ expansions all over the library.

I got my pp-parsers to successfully work under both GCC and MSVC. Especially on MSVC, I also had to do "hack" some of the macros to make sure they expand when they are supposed to -- BTW, having a library like Boost.Preprocessor has proven to be immensely useful.

...

Unfortunately, there is a limit to what can be forced--particularly with more advanced manipulations of the macro expansion process such as those used by Chaos where there is analogy to the uncertainty principle (e.g. you cannot force expansion in may contexts without changing the result = you cannot measure particle velocity and position at the same time). Even with those types of manipulations, however, there is no way to do the above with "smashing the particles together and seeing what comes out."

That's an interesting analogy :) (I do have an engineering/physics background).

...

The limitation is caused by the ridiculous limitation that token-pasting arbitrary tokens together where the result is not a single token results in undefined behavior. Even to detect this scenario, the simplest implementation in a preprocessor is to simply juxtapose the characters making up the tokens and re-tokenize them. If there is more than one, issue diagnostic, otherwise insert the single token. A better definition would be simply to insert the resulting sequence of tokens.

...
2) Also, does the expansion of any of the following result in undefined behavior? (I don't think so...)

IS_PUBLIC(public abc) // Expand to 1. IS_PUBLIC(public::) // Expand to 1. IS_PUBLIC(public(abc, ::)) // Expand to 1. IS_PUBLIC(public (abc) (yxz)) // Expand to 1.

(My application relies on some of these expansions to work.)

All of those look fine. Basically, what happens in the following

#define M(a) id ## a

The appearance of the formal parameter 'a' adjacent to the token-pasting operator affects _which_ actual parameter is substituted. Namely, the version of the actual parameter which has _not_ had macros replaced in it. However, the token-pasting operation doesn't occur until after that substitution, and its operands are only the two _tokens_ immediately adjacent to it. E.g.

#define A() 123 #define B(x) x id ## x

B(A()) => 123 id ## A() => 123 idA()

OK, now I understand much better how my `IS_PUBLIC()` macro actually works -- thanks a lot!

...

I.e. the token-pasting operator affects the expansion of the actual parameter (at least in that substitution context), but its operands are only the tokens on either side after that substitution.

Because of that, you're basically getting:

PREFIX_ ## public abc PREFIX_ ## public :: PREFIX_ ## public ( abc , :: ) PREFIX_ ## public ( abc ) ( yxz )

...all of which are okay.

...
MY APPLICATION

I am using `IS_PUBLIC()` and similar macros to program the preprocessor to *parse* a Boost.Preprocessor sequence of tokens that represents a function signature. For example:

class c { public: void f(int x) const; // Usual function declaration. };

class c { PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences. (public) (void) (f)( (int)(x) ) (const) ); };

What happens with stuff like pointers, or does that not matter for your application? E.g. (public) (void) (f)( (int*)(x) ) (const) ?

My library does not need to detect pointers at the preprocessor metaprogramming level. I can wait until using the compiler at the template metaprogramming level to detect and handle pointers (using Boost.MPL, Boost.TypeTraits, etc). So my pp-parser macros simply have to expand: IS_PUBLIC(int*) // Expand to 0. IS_INT(int*) // Expand to 1. Where I never use the last expansion because I use template metaprogramming to detect and manipulate types. Similarly for references, etc. (There is actually one exception to this for functions returning `void*` because my pp-parser macro need to detect functions returning `void`. I have implemented a workaround for this case allowing a special syntax within the signature sequence... but that is _very_ specific to my application.)

...

...
The parser macro above can say "the signature sequence starts with `public` so this is a member function" at a preprocessor metaprogramming level and then expand to special code as a library might need to handle member functions. The parser macros can even do some basic syntax error checking -- for example, if `(const)` is specified as cv-qualifier at the end of the signature sequence of a non-member function, the parser macro can check that and expand to a compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using `BOOST_MPL_ASSERT_MSG()`).

Most of the tokens within C++ funciton signatures are composed of pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are some exceptions like `,` to separate funciton parameters, `<`/`>` for templates, `:` for constructors' member initializers, etc. The grammar of my preprocessor parser macros requires the use of different tokens in these cases. For example, parenthesis `(`/`)` are used for templates instead of `<`/`>`:

template< typename T> f(T x); // Usual.

PARSE_FUNCTION_DECL( // PP-sequence. (template)( (typename)(T) ) (f)( (T)(x) ) );

(Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which will have caused the parser macro to fail when inspecting `(<)` via one of the `IS_XXX()` macros as per the limitation from using `BOOST_PP_CAT()` mentioned above.)

The grammar of my preprocessor parser macros clearly documents that only pp-identifiers can be passed as tokens of the function signature sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above is not a problem for my application.

Thank you very much.

You're welcome. I don't know the ultimate purpose of this encoding, but the

This encoding, which I am calling "parenthesized syntax" (given the ridiculous amount of parenthesis that it requires :) ) is used by my library under construction "Boost.Contract" https://sourceforge.net/projects/dbcpp/ to implement contract programming for C++ as specified by N1962. For example: template<typename T> class myvector { public: CONTRACT_FUNCTION( (public) (void) (push_back)( (const T&)(element) ) (copyable) (precondition)( (size() < max_size()) // More preconditions here... ) (postcondition)( (size() == (CONTRACT_OLDOF(this)->size() + 1)) // More postconditions here... ) ({ ... // Original implementation. }) ) ... }; Note how I can define new "keywords" like `precondition`, `postcondition`, `copyable`, etc; program `IS_XXX()` macros for those; and use the pp-parser macros to parse them and expand to code that checks these assertions at the right time during execution. I have also extended the parenthesized syntax to support concepts (interfacing with Boost.ConceptCheck) and named parameters (interfacing with Boost.Parameter). The idea being that contracts + concepts + named parameters fully specify the interface requirements. An example of concepts + contracts: CONTRACT_FUNCTION( (template)( (typename)(T) ) (requires)( (boost::CopyConstructible<T>) (boost::Assignable<T>) (Addable<T>) ) (T) (sum)( (T*)(array) (int)(n) (T)(result) ) (precondition)( (array) (n > 0) ) ({ ... // Original implementation. }) )

...

encoding itself doesn't look too bad.

In my experience, the parenthesized syntax is OK for this application -- it's not terrible but it's not great either... My programmer's life would be better without this syntax but worst without contracts :) However, using the preprocessor to parse and generate every function declaration (with a contract) slows down compilation quite a bit... I think I can optimize the code of my macros and the way I am using Boost.Preprocessor but I am still finishing up the implementation and I am leaving optimizations for later. BTW, for this optimization it would be useful to assess the computational complexity (maybe in terms of "number of macro expansions"?) of the Boost.Preprocessor macros -- how can I do that? Thank you. -- Lorenzo

Wolf Lammen

12:39 p.m.

New subject: [preprocessor] check if a token is a keyword (was "BOOST_PP_IS_UNARY()")

Hi,

...

BTW, for this optimization it would be useful to assess the computational complexity (maybe in terms of "number of macro expansions"?) of the Boost.Preprocessor macros -- how can I do that?

The C++ plugin of Eclipse has a Macro-expander, that allows you to step through an expansion and gives you the number of replacements nedded. cheers Wolf Lammen -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

5472

Age (days ago)

5476

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Lorenzo Caminiti
Paul Mensonides
Wolf Lammen