Re: [boost] [preprocessor] "How macro expansion works", comments

17 Feb 2006

      ...
-----Original Message-----
From: boost-bounces@lists.boost.org 
[mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
...
...
It isn't a function-like macro.  In an object-like macro, 
there is no 
stringizing operator.
Fine. Brings me back to the text. This detail is woth mentioning!
"There is no stringizing operator for object-like macros" is 
certainly less clear than "the '#' character in an 
object-like macro does not act like an operator and is passed 
on to the output" (or so).
Okay, I'll note it, but I have to be a little more specific.  To be accurate,
the only # tokens that are stringizing operators are those that exist in the
definition of a function-like macro.  I.e. it is, for all intents and purposes,
replaced by a canonical form at the point of a macro definition.  This
distinguishs it from (e.g.) a # token that is passed as input to a macro--which
is not the stringizing operator.  The same is true for ##, except that it is
also replaced by a canonical form in object-like macro definitions.

(I realize what I just wrote is a mess!)

The token-pasting and stringizing operators are a bit like formal parameters in
the sense that they can be replaced by canonical forms (i.e. a token that is
introduced by the implementation that isn't a normal token--virtual tokens are
similar, but have a different function).  E.g. when the preprocessor comes
across this definition:

    #define MACRO(a, b) token a ## b token

The replacement list stored in the symbol table effectively becomes:

    token <_1> <##> <_2> token

Thus, when the macro is called such as:

    MACRO(b #, # a)

argument substitution yields:

    token b # <##> # a token

(where the # tokens are not the stringizing operator--that would be <#> if it
existed in the macro definition)

token-pasting yields:

    token b ## a token

and that's the result.  The ## in the result is most definitely not the
token-pasting operator, nor are the #'s in the input the stringizing operator.

Of course, if the invocation was like this:

#define MACRO_2(a, b) MACRO(b #, # a)

Then it would be an error, because then the #'s are stringizing operators, but
the first one isn't applied to a formal parameter.

Does that make any sense?  (Sorry it's late, and I'm in a hurry to go to bed.)
...
...
...
...
At the same time, it is worth noting that the syntax of macro 
invocations is dynamic.  That is, a series of macro invocations 
cannot be parsed into a syntax tree and then evaluated.  
It can only 
parse one invocation at a time, and its result is 
basically appended 
to the input.
Replacing 'appended' with 'prepended' makes this sentence a 
really usable explanation of what the standard calls "rescanning".
Yes, but I'm not sure that it fits well with the abstract model.  I.e. the
abstract preprocessor is scanning a sequence of preexisting tokens.

A concrete preprocessor will normally do lexical analysis, scanning for macro
expansion, and output in a single pass, and then it is indeed more like
prepending it to the input.  But with the abstract model, the sequence of tokens
that makes up the macro invocation is replaced by the sequence of tokens from
the replacement list (after argument substitution, token-pasting, etc.).  IOW,
it is more like it is mutating a data structure (the sequence of tokens to be
scanned), so there isn't really an input or output so much as a side effect.

Now that I think about this, I question my use of the term "output" in the
article.  The abstract model is what defines the result of the preprocessor.
Scanning for macro expansion is similar in the abstract model to the early
phases of translation where--in the abstract model--the entire results of phase
1 are fed to phase 2.

Except for the possibly unfortunate use of "output", the article is following
the abstract model.  E.g. "moves on to the next token" instead of "gets the next
token".
...
...
...
The terms 'invocation' and 'function-like macro' are probably much 
worse than wasting some words on how dynamic things really are...
I'm not sure I follow what you're saying here.  Can you rephrase?
After we have done a expansion we have to reset our scanner 
back to the first token in the replacement list.
...and this is precisely where the description of the abstract model is more
difficult because the sequence of tokens from the replacement list might be the
empty sequence.  In which case, the position of the scanner is set to the next
token beyond the invocation.  This can be explained a lot simpler in the
concrete model.  I have to think about this.

*But*, I get what you're saying.  I need (in one form or another) to not glance
over "moves on to the next token".  I.e. it is more like "move back to the
beginning of the algorithm starting with the next token".
...
That's what "rescanning" means to me. "The output is 
prepended to the input" (or so) is probably a better way to 
say it. Anyway, some redundancy to highlight this process is 
a good thing because it explains the dynamic nature of the 
preprocessor's syntax.
Yes.

I need to think about the concrete model vs. the abstract model.  Both are
correct in that they yield the same results.  In other ways, the abstract model
is better suited to the description (e.g. where scanning takes a sequence of
tokens as input, mutates it, and returns mutated sequence).  This more easily
describe what happens to macro arguments that scanned for macro expansion.
...
The terms "invocation" and "function-like macro", however, 
are pushing the reader towards viewing macros as functions 
(worse than "rescanning" does IMO).
Well, there is a difference between "function" and "functional hierarchy".
Macros a very much like functions in a lot of ways, but what they "return" is
not the result of "executing" their definitions.  Rather, they return their
definitions (i.e. the return code)--sort of like what physically happens when a
function is inlined.

So, I'm not against viewing macros as functions, I'm against the belief that
they form a functional hierarchy.  I'm not even against viewing it that way when
using macros (provided that you know that ultimately that isn't what is really
happening and recognize the implications of the differences).  I do that myself
when writing code.  E.g. I typically *view* the following:

#define A() B()

...as A calling B, but I *know* that that is not what is actually happening.
(It take some serious "getting-used-to" to follow non-trivial code when viewing
it entirely as iterative expansion.)  The important thing is that you know that
the way you're viewing it is only a mental device used to help you grasp what is
happening.

Regards,
Paul Mensonides