
Paul Mensonides wrote:
The process of macro expansion is best viewed as the process of scanning for macro expansion (rather than the process of a single macro expansion alone). When the preprocessor encounters a sequence of
preprocessing tokens
and whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this document in detail.
Strike that paragraph. It uses terms not yet defined and doesn't say much more than the title (assuming it's still "how macro expansion works").
The paragraph might not be in the right spot, but what the paragraph says is important. The process of macro expansion includes more than just expanding a single macro invocation. Rather, it is the process of scanning for macro expansions, and the whole thing is defined that way.
It becomes clear there is more to macro expansion than expanding a single macro and that multiple steps are required when reading the text... The paragraph seems to try an introduction but does a bad job, IMO.
The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.
I do in the very next sentence.
Yeah, but with too much text, IMO.
[Locations]
There are several points where the preprocessor must scan a sequence of tokens looking for macro invocations to expand. The most obvious of these is between preprocessing directives (subject to conditional compilation). For example,
I had to read this sentence multiple times for me to make sense...
What part was difficult to follow? It seems pretty straightforward to me (but then, I know what I'm looking for).
"Between preprocesing directives" -- what?! Sure, it is correct. But it's too much from the viewpoint of the preprocessor than from where your reader is at. <snip>
in undefined
behavior. For example,
#define MACRO(x) x
MACRO( #include "file.h" )
Indicate more clearly that this code is not OK.
The next sentence says that it is undefined behavior. I'm not sure how to make it more clear than that.
An obvious sourcecode comment (e.g. in red).
[Blue Paint]
If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it outputs the token and moves on to the next.
When an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro (which is why it
outputs it
and moves on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not
corresponds
to a macro. This disabled flag is commonly referred to as "blue paint," and if the disabled flag is set on a particular token, that token is called "painted." (The means by which an identifier token can become painted is described below.)
Remove redundancy in the two paragraphs above.
I believe I was unclear, here: The redundancy isn't the problem (redundancy is actually a good thing in documentation, when used right) but too much redundancy in one spot...
In the running example, the current token is the identifier OBJECT, which _does_ correspond to a macro name. It is not
painted, however,
so the preprocessor moves on to the next step.
[Disabling Contexts]
If the current token is an identifier token that corresponds to a macro name, and the token is _not_ painted, the preprocessor must check to see if a disabling context that corresponds to the macro referred to by the identifier is active. If a corresponding disabling context is active, the preprocessor paints the identifier token, outputs it, and moves on to the next token.
A "disabling context" corresponds to a specific macro and exists over a range of tokens during a single scan. If an identifier that refers to a macro is found inside a disabling context that corresponds to the same macro, it is painted. Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext comes into existence is discussed below.)
In the running example, the current token is still the identifier OBJECT. It is not painted, and there is no active disabling context that would cause it to be painted. Therefore, the preprocessor moves on to the next step.
The introductions of these terms feels structurally too aprupt to me. Introduce these terms along the way, continuing with the example.
They appear at the first point where their definition must appear.
I believe it's useful to sustain it. <snip>
from the replacement list.
+ X OBJECT F() + | | |__________| | OBJECT disabling context (DC)
<-- explain what a disabling context and then what blue paint is is here
Do you mean that they should be defined here for the first time, or that they should be defined here again (but maybe with less detail)?
I meant: introduce the terms here.
function-like macro has no formal parameters, and therefore any use of the stringizing operator is automatically an error.) The result of token-pasting in F's replacement list is
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
I don't know the rationale for why it is the way it is.
In this case, "therefore" is a bit strange...
[Interleaved Invocations]
It is important to note that disabling contexts only exist during a _single_ scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and whitespace separations. Some of those tokens might be painted (and they remain painted), but disabling contexts are not part of the result of scanning.
(If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I believe I wouldn't really understand what you are talking about here without knowing that part of the standard. "A single scan" -- the concept of rescanning was introduced too periphicially to make much sense to someone unfamiliar with the topic.
This all comes back to the beginning--the process is scanning a sequence of tokens for macros to expand (i.e. the first paragraph that you said I should strike). This entire process is recursively applied to arguments to macros (without begin an operand...) and thus this entire scan for macros to expand can be applied to the same sequence of tokens more than once. It is vitally important that disabling contexts don't continue to exist beyond the scan in which they were created, but that blue paint does. As I mentioned, there would be no need for blue paint--what the standard calls "non-replaced macro name preprocessing tokens"--if the disabling contexts weren't transient.
Now for the "rescanning" part: You don't have to introduce that term. Anyway I wouldn't have figured out what "a _single_ scan" was supposed to mean without knowing it, so it feels to me here is something missing.
I'm pretty sure that I don't use the term "rescanning" anywhere in the whole article (yep, I checked, and I don't).
"Rescanning" comes from the standard, of course. I bit myself through chapter 16 because I wanted to know how things work before you posted this article.
In C++, if any argument is empty or contains only whitespace separations, the behavior is undefined. In C, an empty argument is allowed, but gets special treatment. (That special treatment is described below.)
It requires at least C99, right? If so, say it (it's likely there are C compilers that don't support that version of the language).
As far as I am concerned, the 1999 C standard defines what C is until it is replaced by a newer standard. Likewise, 1998 standard defines what C++ is until it is replaced by a newer standard. I.e. an unqualified C implies C99, and unqualified C++ implies C++98. If I wished to reference C90, I'd say C90. Luckily, I don't wish to reference C90 because I don't want to maintain an article that attempts to be backward compatible with all revisions of a language. This is precisely why I should have a note above that variadic macros are not part of C++, BTW, even though I know they will be part of C++0x.
The previous version of the language is still widely used and taught so disambiguation makes some sense, IMO.
Furthermore, deficiencies of compilers (like not implementing the language as it is currently defined or not doing what they should during this process) is not a subject of this article.
OTOH, it wouldn't hurt to mention in the "Conventions" section that, at the time of this writing, C is C99 and C++ is C++98.
It adds clutter noone wants to read -- adding the version number still seems the better solution to me ;-).
[Virtual Tokens]
BTW. I would've probably called them "control tokens" in analogy to "control characters" -- "virtual" has that dynamic-polymorphism-association, especially to C++ programmers with non-native English...
I hope it's of any use.
Definitely. Thanks again.
You're welcome -- it's my way to thank you for your support. -- Tobias