Re: [boost] [preprocessor] check if a token is a keyword (was "BOOST_PP_IS_UNARY()")

17 Aug 2010

      On Tue, Aug 17, 2010 at 2:12 AM, Paul Mensonides <pmenso57@comcast.net> wrote:
...
On 8/16/2010 9:21 PM, Lorenzo Caminiti wrote:
...
Yes, I am aware of this "limitation". However, for my application it
is not a problem to limit the argument of `IS_PUBLIC()` to
pp-identifiers and pp-numbers with no decimal points (if interested,
see "MY APPLICATION" below).
1) Out of curiosity, is there a way to implement `IS_PUBLIC()`
(perhaps without using `BOOST_PP_CAT()`) so it does not have this
limitation? (I could not think of any.)
The limitation is not BOOST_PP_CAT per se, but token-pasting in general.
 The "good" part of using BOOST_PP_CAT in combination with
BOOST_PP_IS_NULLARY, et al, is that they have been "hacked" together for
preprocessors that are broken.  Effectively, the detection macros work by
Yes, I understand.
...
manipulating the operational syntax of macro expansion.  For that to work,
stuff has to happen (namely, macros being expanded) at roughly the correct
time.  The basic problem with VC++, for example, is that they don't, so the
pp-lib works overtime to attempt to _force_ expansions all over the library.
I got my pp-parsers to successfully work under both GCC and MSVC.
Especially on MSVC, I also had to do "hack" some of the macros to make
sure they expand when they are supposed to -- BTW, having a library
like Boost.Preprocessor has proven to be immensely useful.
...
 Unfortunately, there is a limit to what can be forced--particularly with
more advanced manipulations of the macro expansion process such as those
used by Chaos where there is analogy to the uncertainty principle (e.g. you
cannot force expansion in may contexts without changing the result = you
cannot measure particle velocity and position at the same time).  Even with
those types of manipulations, however, there is no way to do the above with
"smashing the particles together and seeing what comes out."
That's an interesting analogy :) (I do have an engineering/physics background).
...
The limitation is caused by the ridiculous limitation that token-pasting
arbitrary tokens together where the result is not a single token results in
undefined behavior.  Even to detect this scenario, the simplest
implementation in a preprocessor is to simply juxtapose the characters
making up the tokens and re-tokenize them.  If there is more than one, issue
diagnostic, otherwise insert the single token.  A better definition would be
simply to insert the resulting sequence of tokens.
...
2) Also, does the expansion of any of the following result in
undefined behavior? (I don't think so...)
    IS_PUBLIC(public abc)            // Expand to 1.
    IS_PUBLIC(public::)                // Expand to 1.
    IS_PUBLIC(public(abc, ::))       // Expand to 1.
    IS_PUBLIC(public (abc) (yxz))  // Expand to 1.
(My application relies on some of these expansions to work.)
All of those look fine.  Basically, what happens in the following
#define M(a) id ## a
The appearance of the formal parameter 'a' adjacent to the token-pasting
operator affects _which_ actual parameter is substituted.  Namely, the
version of the actual parameter which has _not_ had macros replaced in it.
 However, the token-pasting operation doesn't occur until after that
substitution, and its operands are only the two _tokens_ immediately
adjacent to it.  E.g.
#define A() 123
#define B(x) x id ## x
B(A())
=> 123 id ## A()
=> 123 idA()
OK, now I understand much better how my `IS_PUBLIC()` macro actually
works -- thanks a lot!
...
I.e. the token-pasting operator affects the expansion of the actual
parameter (at least in that substitution context), but its operands are only
the tokens on either side after that substitution.
Because of that, you're basically getting:
PREFIX_ ## public abc
PREFIX_ ## public ::
PREFIX_ ## public ( abc , :: )
PREFIX_ ## public ( abc ) ( yxz )
...all of which are okay.
...
MY APPLICATION
I am using `IS_PUBLIC()` and similar macros to program the
preprocessor to *parse* a Boost.Preprocessor sequence of tokens that
represents a function signature. For example:
    class c {
        public: void f(int x) const; // Usual function declaration.
    };
    class c {
        PARSE_FUNCTION_DECL( // Equivalent declaration using pp-sequences.
        (public) (void) (f)( (int)(x) ) (const)
        );
    };
What happens with stuff like pointers, or does that not matter for your
application?  E.g. (public) (void) (f)( (int*)(x) ) (const) ?
My library does not need to detect pointers at the preprocessor
metaprogramming level. I can wait until using the compiler at the
template metaprogramming level to detect and handle pointers (using
Boost.MPL, Boost.TypeTraits, etc). So my pp-parser macros simply have
to expand:

    IS_PUBLIC(int*) // Expand to 0.
    IS_INT(int*) // Expand to 1.

Where I never use the last expansion because I use template
metaprogramming to detect and manipulate types. Similarly for
references, etc.

(There is actually one exception to this for functions returning
`void*` because my pp-parser macro need to detect functions returning
`void`. I have implemented a workaround for this case allowing a
special syntax within the signature sequence... but that is _very_
specific to my application.)
...
...
The parser macro above can say "the signature sequence starts with
`public` so this is a member function" at a preprocessor
metaprogramming level and then expand to special code as a library
might need to handle member functions. The parser macros can even do
some basic syntax error checking -- for example, if `(const)` is
specified as cv-qualifier at the end of the signature sequence of a
non-member function, the parser macro can check that and expand to a
compile-time error like `SYNTAX_ERROR_unexpected_cv_qualifier` (using
`BOOST_MPL_ASSERT_MSG()`).
Most of the tokens within C++ funciton signatures are composed of
pp-idenfitiers such as the words `public`, `void`, `f`, etc. There are
some exceptions like `,` to separate funciton parameters, `<`/`>` for
templates, `:` for constructors' member initializers, etc. The grammar
of my preprocessor parser macros requires the use of different tokens
in these cases. For example, parenthesis `(`/`)` are used for
templates instead of `<`/`>`:
    template<  typename T>  f(T x); // Usual.
    PARSE_FUNCTION_DECL( // PP-sequence.
    (template)( (typename)(T) ) (f)( (T)(x) )
    );
(Instead of `(template)(<) (typename) (T) (>) (f)( (T)(x) )` which
will have caused the parser macro to fail when inspecting `(<)` via
one of the `IS_XXX()` macros as per the limitation from using
`BOOST_PP_CAT()` mentioned above.)
The grammar of my preprocessor parser macros clearly documents that
only pp-identifiers can be passed as tokens of the function signature
sequence. Therefore, the "limitation" of `IS_PUBLIC()` indicated above
is not a problem for my application.
Thank you very much.
You're welcome.  I don't know the ultimate purpose of this encoding, but the
This encoding, which I am calling "parenthesized syntax" (given the
ridiculous amount of parenthesis that it requires :) ) is used by my
library under construction "Boost.Contract"
https://sourceforge.net/projects/dbcpp/ to implement contract
programming for C++ as specified by N1962. For example:

template<typename T>
class myvector {
public:
    CONTRACT_FUNCTION(
    (public) (void) (push_back)( (const T&)(element) ) (copyable)
        (precondition)(
            (size() < max_size())
            // More preconditions here...
        )
        (postcondition)(
            (size() == (CONTRACT_OLDOF(this)->size() + 1))
            // More postconditions here...
        )
    ({
        ... // Original implementation.
    }) )
    ...
};

Note how I can define new "keywords" like `precondition`,
`postcondition`, `copyable`, etc; program `IS_XXX()` macros for those;
and use the pp-parser macros to parse them and expand to code that
checks these assertions at the right time during execution.

I have also extended the parenthesized syntax to support concepts
(interfacing with Boost.ConceptCheck) and named parameters
(interfacing with Boost.Parameter). The idea being that contracts +
concepts + named parameters fully specify the interface requirements.
An example of concepts + contracts:

CONTRACT_FUNCTION(
(template)( (typename)(T) )
    (requires)(
        (boost::CopyConstructible<T>)
        (boost::Assignable<T>)
        (Addable<T>)
    )
(T) (sum)( (T*)(array) (int)(n) (T)(result) )
    (precondition)( (array) (n > 0) )
({
    ... // Original implementation.
}) )
...
encoding itself doesn't look too bad.
In my experience, the parenthesized syntax is OK for this application
-- it's not terrible but it's not great either... My programmer's life
would be better without this syntax but worst without contracts :)

However, using the preprocessor to parse and generate every function
declaration (with a contract) slows down compilation quite a bit... I
think I can optimize the code of my macros and the way I am using
Boost.Preprocessor but I am still finishing up the implementation and
I am leaving optimizations for later. BTW, for this optimization it
would be useful to assess the computational complexity (maybe in terms
of "number of macro expansions"?) of the Boost.Preprocessor macros --
how can I do that?

Thank you.

-- 
Lorenzo