
Hello, yesterday I spent the day with my first application of the Boost Preprocessor library: Given two sequences A and B (according to the Boost Preprocessor datastructure definition), I want to create a nested switch statement, the outer switch running through the constants in A, the inner loops running through B. Piece of cake: Just apply BOOST_PP_SEQ_FOR_EACH two times. Didn't work, because the preprocessor when evaluating a macro M will suspend the definition of M during the evaluation (thus "no recursion"). Unfortunately, I didn't know this, so I had to find it out the hard way. First of all I think here the documentation of the library doesn't reach its full potential yet: I tried first to read through "Topics", but found all examples contrived and complicated (couldn't find a single plain example), and thus I skipped that and went to Reference, where I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious remarks about "The next available BOOST_PP_FOR repetition." etc. But at various places it somehow says that "reentry problems" whatever this means are solved now. Only when looking it up in the rationale of C99 I found a clear statement about the issue (while the C++ standard somehow supposes you know already the issue). So a clear statement about the limitations of the looping constructions *at each place* would help I would guess quite a few people trying to use this library. (I believe documentation must be written so that it can be entered in many ways, and not assuming you are reading it from the start to the end --- redundancy is needed.) Second of all it seems the current facilities in the library should be enhanced (though I don't understand how FOR and FOR_r is supposed to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing, the use of it should be promoted, while FOR and its derivates look ugly. But it seems that the library favours FOR, and seems to envisage a main loop using FOR, and only at the second level using FOR_EACH. My solution was to convert the second sequence B into a list, and use BOOST_PP_LIST_FOR_EACH for the inner loop. But it seems to me that this works is pure chance, and with any change in the implementation of BOOST_PP_SEQ_FOR_EACH or BOOST_PP_LIST_FOR_EACH (introducing a common macro) this solution will break. According to what I understand from the documentation, the natural solution, offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses macros guaranteed not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in favour of a more complicated solution. But it seems this more complicated solution works only for one kind of loop, and not for others, and the documentation seems so arcane that I don't have a clue how it could work (the typical phenomenon, that the documentation wants to show off by presenting only complicated examples, which show all sorts of templates, etc., and this without saying what actually should by achieved --- good code solves one problem at a time, and good documentation discusses one problem at a time). So if the library offers a natural way of nesting BOOST_PP_SEQ_FOR_EACH, then documenting this (directly, without solving other problems) would be good, or otherwise adding BOOST_PP_SEQ_FOR_EACH_2 would help some users (actually, just the presence of this second form would announce the underlying problem). Sometimes less general but simpler solutions are preferable over more general but complicated solutions. Oliver P.S. The situation with the preprocessor is complicated by the general silence on this issue in the books. And quite amazing, how the library achieves to squeeze out nearly a general purpose machine (but not with an infinite tape :-)) from the preprocessor machinery, which seems to have been castrated by the designers on purpose.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Oliver Kullmann
First of all I think here the documentation of the library doesn't reach its full potential yet: I tried first to read through "Topics", but found all examples contrived and complicated (couldn't find a single plain example),
What is a "plain example"?
and thus I skipped that and went to Reference, where I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious remarks about "The next available BOOST_PP_FOR repetition." etc. But at various places it somehow says that "reentry problems" whatever this means are solved now.
There is an article in the topics section about this (titled "Reentrancy").
Only when looking it up in the rationale of C99 I found a clear statement about the issue (while the C++ standard somehow supposes you know already the issue).
It is pretty clear in both standards (6.10.3.4/2 in C and 16.3.4/2 in C++).
So a clear statement about the limitations of the looping constructions *at each place* would help I would guess quite a few people trying to use this library.
It is unnecessary clutter. It is not the documentation's responsibility to constantly tell you the way that the language works. Unless the documentation says otherwise, it should be assumed that a macro cannot expand inside itself. That is simply the way that the language works. The library makes it *appear* that a few macros are recursive, but then it explicitly says that the macro can be used as if it was recursive.
(I believe documentation must be written so that it can be entered in many ways, and not assuming you are reading it from the start to the end --- redundancy is needed.)
I disagree on all counts. It is impractical to make every page of the documentation a complete discourse on both how the preprocessor works and how preprocessor metaprogramming works. Beyond that, redundancy in both code and documentation is a maintenance nightmare.
Second of all it seems the current facilities in the library should be enhanced (though I don't understand how FOR and FOR_r is supposed to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing, the use of it should be promoted, while FOR and its derivates look ugly.
Hmm. FOR is far more general. SEQ_FOR_EACH can be implemented with FOR, but not vice versa.
But it seems that the library favours FOR, and seems to envisage a main loop using FOR, and only at the second level using FOR_EACH.
The library doesn't favor it, it simply is more general purpose. In order to allow the kind of reentrance that FOR allows, hundreds of macros are required for *each* higher-order algorithm. SEQ_FOR_EACH is only one of many.
My solution was to convert the second sequence B into a list, and use BOOST_PP_LIST_FOR_EACH for the inner loop. But it seems to me that this works is pure chance, and with any change in the implementation of BOOST_PP_SEQ_FOR_EACH or BOOST_PP_LIST_FOR_EACH (introducing a common macro) this solution will break.
It will only break if SEQ_FOR_EACH or LIST_FOR_EACH arbitrarily decides to use the other--which won't happen. I design the algorithms in the library, and I'm well aware of vertical dependencies and their implications.
According to what I understand from the documentation, the natural solution, offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses macros guaranteed not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in favour of a more complicated solution.
You don't understand the implications of what you're saying. In order to do that, I'd need to define hundreds of macros to implement SEQ_FOR_EACH alone. It isn't as simple as just replicating the interface macro a few times. Instead, I'd have to reimplement the entire algorithm from scratch--intentionally avoiding the use of any other higher-order algorithm.
But it seems this more complicated solution works only for one kind of loop, and not for others, and the documentation
I'm not sure what you're referring to by "more complicated solution". If you specify what kind of thing your referring to, I'll expound.
seems so arcane that I don't have a clue how it could work (the typical phenomenon, that the documentation wants to show off by presenting only complicated examples, which show all sorts of templates, etc., and this without saying what actually should by achieved --- good code solves one problem at a time, and good documentation discusses one problem at a time).
The documentation has many examples, but you call them either complicated or contrived. What exactly to you want? This kind of stuff doesn't boil down to _simple_ use cases. Rather, typical use cases are parts of overarching designs that are complex--which is precisely the reason that preprocessor metaprogramming enters the picture at all. Furthermore, I have a responsibility (as does any library author) not to promote naive or poor uses of the library--which is almost always the case with examples that implement something simple. So, I can do two things--I can provide simple contrived examples that demonstrate what various things do, and I can provide complex examples that implement real world scenarios. The documentation does both already.
So if the library offers a natural way of nesting BOOST_PP_SEQ_FOR_EACH, then documenting this (directly, without solving other problems) would be good, or otherwise adding BOOST_PP_SEQ_FOR_EACH_2 would help some users (actually, just the presence of this second form would announce the underlying problem).
The library 1) shouldn't have to announce this problem and 2) already does announce this problem--it just doesn't do it hundreds of times in hundres of different places. If you are writing code in the language defined by the preprocessor, you should *at least* know the basics of that language. The library is not the language, it is merely a systematic set of reusable contructs written in that language. The documentation's only responsibility is to document itself.
Sometimes less general but simpler solutions are preferable over more general but complicated solutions.
Again, I'm not sure what you mean by "more complicated solutions". ----- I'm trying very hard not be harsh, but what you suggest has far-reaching implications that you don't understand. Regarding examples in the documentation, I don't think there is a way to please you--see posts by me in this thread: http://thread.gmane.org/gmane.comp.lib.boost.user/16132 ----- Regarding the technical implications: (Please make the effort to follow this.) As it stands now, the library emulates recursion by replicating macros. For example, a WHILE loop can be implemented such as: #define WHILE_1(pred, op, state) \ IF(pred(2, state), WHILE_2, state TUPLE_EAT(3))(pred, op, op(2, state)) \ /**/ #define WHILE_2(pred, op, state) \ IF(pred(3, state), WHILE_3, state TUPLE_EAT(3))(pred, op, op(3, state)) \ /**/ // etc. (i.e. hundreds...) A predicate is passed that determines when the algorithm should stop, and an operation is used to iterate the state on each step of the algorithm. There are a couple of things here must be understood. First, each WHILE_xyz represents both an entry point into the algorithm (i.e. an interface to the algorithm) *and* a single algorithmic step taken by the algorithm. So, WHILE_2 does not mean "separate nested WHILE loop" or "second looping dimension". Instead, it means "the step after WHILE_1". Now, when the algorithm (meaning the entire collection of macros) invokes the predicate and the operation, it passes along the number of the next available WHILE_xyz macro. This value is called the "state" of the algorithm. (Note that this is not the 'state' parameter. That is the (arbitrary) state the a particular *use* of the algorithm is iterating.) The predicate and the operation can use this algorithm state to reenter the algorithm, but before the algorithm itself actually continues into that macro. As the algorithm continues, the algorithm state values passed to the predicate and the operation change (i.e. they increase). This is important. They do not remain fixed, which means that the predicate, for example, cannot just arbitrarily use WHILE_2. It has to use the WHILE_xyz indicated by the algorithm state value passed to it. Obviously it is possible to provide several totally distinct implementations of the entire loop, but that implies several hundred macros per loop, instead of sharing a single set of several hundred macros for all loop uses. Say you did this anyway, and you get WHILE_A, WHILE_B, WHILE_C (and so on, each with their own WHILE_A_1, WHILE_A_2, etc.). This doesn't really buy you anything, because you'll still ultimately end up passing A, B, C (etc.) as a reentry point. E.g. suppose you write some macro that uses WHILE_A, and then you change another macro so that it uses this macro, but this other macro was already using WHILE_A. Fine, you just change this other macro to use WHILE_B. But then, there is another macro that uses this second macro, and this third macro is already using WHILE_B, so you have to change that to use WHILE_C. This is a significant bubbling of implementation detail, and it is *way* worse when all three macros are written by different people. Ultimately, you arrive at the conclusion that it doesn't matter *which* WHILE the first macro uses, so you parametize the first macro with the WHILE to be used, Similarly with the second (and when the second calls the first, it uses the next WHILE from the one it is currently using). Similarly again with the third. So, instead of having the most nested use of WHILE be WHILE_A, you have the outermost use be WHILE_A, and each inner use adjusts, so that the first nested use uses WHILE_B, the next nested use uses WHILE_C, and so on. This is the only way to do it that scales, and this is ultimately no different than just having one WHILE with one set of implementation macros. This is exactly (minus a lot of workarounds) the way that the library emulates recursion (particularly: in FOR). The problem comes when you write a higher-order algorithm on top of another higher-order algorithm. For example, SEQ_FOR_EACH on top of FOR. (Higher-order algorithms are fundamentally different because the implementation of the algorithm cannot completely control what macros are called inside of them.) It doesn't take hundreds of macros to implement SEQ_FOR_EACH because it can reuse the more general purpose algorithmic steps provided by FOR. In order to make SEQ_FOR_EACH reentrant, you need, at minimum, a distinct interface macro (i.e. SEQ_FOR_EACH_1, SEQ_FOR_EACH_2, etc.) and distinct macro to be repeated by FOR for each possible entry point into SEQ_FOR_EACH. That isn't too bad in terms of number of macros, but you introduce yet another algorithm state by doing so. If each higher-order algorithm did so, the user would be inundated by the states of various algorithms. E.g. just to invoke SEQ_FOR_EACH, you'd need to supply both the state of SEQ_FOR_EACH and the state of FOR. Worse, if another algorithm (with multiple entry points) uses SEQ_FOR_EACH, you'd need the state of that algorithm, the state of SEQ_FOR_EACH, and the state of FOR to use it. You resolve this in one of two ways: 1) you simply don't provide reentrance in higher-order algorithms that are implemented using other higher-order algorithms, or 2) you reimplement the algorithm from scratch to avoid using other higher-order algorithms. The former is what the library does, the latter requires hundreds of macros to implement each higher-order algorithm. ----- All of this is the status quo of the library. There are two other models that are superior in terms of ease of use and avoidance of these types of issues. The first of these, I use in a different (far more powerful) preprocessor library. The problem with this model is that it isn't portable to broken preprocessors (which are heavily used by users of Boost--e.g. VC++). The second alternative model revolves around automatically deducing all algorithm states whenever they are needed. IOW, completely hiding them from users altogether. The problem with this model is that it isn't portable because of efficiency concerns. Some preprocessors would handle it without any significant problem. Others would slow to a crawl--e.g. EDG-based preprocessors. Of these two, the first is much more extensible. It allows for the definition of new algorithms without *any* macro replication. The second would still be better (as far as ease-of-use is concerned) than what the library provides now, but there is no way that I could get away with it. The second can be improved beyond what I've described here (e.g. by separating the notions of "entry-point" and "algorithmic step"--200+ loop iterations may be necessary, 200+ nested loops is massive overkill), but it all boils down to the same thing--always deducing the state when needed. Unfortunately, I cannot do either of these in Boost because Boost cannot simply drop support for a compiler just because it has a broken or slow preprocessor (of all things).
P.S. The situation with the preprocessor is complicated by the general silence on this issue in the books. And quite amazing, how the library achieves to squeeze out nearly a general purpose machine (but not with an infinite tape :-)) from the preprocessor machinery, which seems to have been castrated by the designers on purpose.
Incidentally (and this isn't exactly relevant to this conversion), the model used by my other library is capable of using a set of macros exponentially, such that (e.g.) the "end of the tape" is trillions of iterations away--effectively removing it as a limitation. Given unlimited system resources (like memory), any algorithm that needed that many steps would literally take centuries to execute even on the fastest preprocessor. IOW, it removes the limitation for all intents and purposes. Also, attached is a draft of a document that I'm writing (for both libraries) that describes the macro expansion process in detail. You might find it useful. If you or others read this, feedback is welcome. I'm _not_ interested in feedback related to formatting or layout. I want to avoid an argument about an issue that 1) nobody will ever agree on and 2) I don't think is as important as it is sometimes made out to be. I _am_ interested in feedback relating to structure and content. Regards, Paul Mensonides

Paul Mensonides wrote:
I'm trying very hard not be harsh, but what you suggest has far-reaching implications that you don't understand.
There should be some sort of disclaimer in the documentation of Boost.PP that mentions that the design is most unfortunately influenced by the bad quality of some widely used preprocessors. Otherwise there will be false conclusions that pp-metaprogramming is either ugly by definition or that there is something wrong with the design of the library... Here are my comments on the text. Note that I use imperative for the sake of simplicity.
The process of macro expansion is best viewed as the process of =canning for macro expansion (rather than the process of a single macro =xpansion alone). When the preprocessor encounters a sequence of preprocessing tokens =nd whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this document in detail.
Strike that paragraph. It uses terms not yet defined and doesn't say much more than the title (assuming it's still "how macro expansion works").
Conventions
Several conventions are used in this document for the sake of =implicity. First, except in contexts where the distinction matters, this =ocument uses the terms "token" and "preprocessing token" =nterchangeably. Where the distinction does matter, this document will explicitly =ifferentiate between them by using the terms "regular token" and =preprocessing token." From a technical standpoint, the primary difference between the two =oken types relates to permissiveness. Preprocessing tokens are much more permissive than regular tokens. Ignoring classifications (e.g. keyword,
How can tokens be permissive? Their definiton is probably more permissive (if you can say that).
identifier), all regular =okens are trivially converted to preprocessing tokens. Not all preprocessing tokens can be converted to regular tokens. For example, 0.0.0.0 is a single, valid preprocessing =oken, but it cannot be converted to a regular token. Thus, preprocessing tokens can be thought of as a superset of regular =okens.
Second, when this document uses the phrase "sequence of tokens" (i.e. "sequence of preprocessing tokens"), the possibility of whitespace separations is implied. In other words, "sequence of tokens" really means "sequence of tokens =nd whitespace separations." Both the C and C++ standards do this as well, though they don't =xplicitly say they are doing it. Rather, it is implied by the phases of translation and by certain parts of the text such as §16.3.2/2 of the C++ standard and §6.10.3.2/2 of the C standard.
Third, this document uses the term "scanning" to mean "scanning for =acro expansion." There are other kinds of scanning (in the general sense) that are =performed during the macro expansion process. For example, a replacement list is scanned for instances of formal =arameters to substitute. Other kinds of scanning besides scanning for macro expansion are =eferred to explicitly.
Lastly, this document uses a graphical notation to illustrate various parts of the process in examples. For clarity, all whitespace separations are ignored in these =iagrams. Given a sequence of tokens, the current location of the scanner is =ndicated with a circumflex (^). For example,
+ + + ^
A range of tokens is indicated by using a frame such as:
+ + + |^ | |_____| | LABEL
These frames are labeled with the meaning of the frame or the =equence of tokens bounded by the frame. An identifier token that has been painted is
The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.
marked with an =postrophe (').
A' token ^
This document will discuss what it means for an identifier token to =e "painted." It also doesn't use character literals ('a') to avoid ambiguity. It is important to note that the apostrophe is not literally added to =he sequence of tokens. Rather, it is only a notation that indicates a property of the =dentifier to which it is "attached." _________________________________________________________________
Locations
There are several points where the preprocessor must scan a sequence =f tokens looking for macro invocations to expand. The most obvious of these is between preprocessing directives =subject to conditional compilation). For example,
I had to read this sentence multiple times for me to make sense... VVVVV
#include "file.h"
// ...
#define A()
// ...
#undef A
Both of the sections containing comments must be scanned for macro expansion.
The preprocessor must also scan a few other sequences of tokens. These include the expression that controls an #if or =elif directive. Subject to certain conditions, the operands of the =include directive and the #line directive are =lso scanned for macro expansion.
The places where a sequence of tokens is scanned for macro expansion =n a source file are not really the subject of this document. As such, the remainder of this document concentrates on just =equences of tokens that must be scanned, though sometimes the =define directive is used.
^^^^^ simplify this section by using negative logic (in other words: enumerate the contexts where /no/ macro expansion is done).
Before continuing, however, note that an argument to a macro that =ontains what could be a preprocessing directive results in undefined =ehavior. For example,
#define MACRO(x) x
MACRO( #include "file.h" )
Indicate more clearly that this code is not OK.
This is undefined behavior because it gives a preprocessor =mplementation latitude. Specifically, it allows for a preprocessor to parse an entire file =nto directives and sequences of tokens between directives prior to semantic evaluation. It also allows for a preprocessor to semantically evaluate the result =f preprocessing in a single pass.
Put the implementation details in parentheses or use a footnote.
_________________________________________________________________
Scanning
Consider the following macro definitions and the subsequent sequence =f tokens to be scanned:
#define OBJECT OBJ ## ECT F() #define F() G(G(G)) #define G(x) x
+ X OBJECT +
These definitions and the subsequent sequence of tokens will be used =s a running example in describing the steps of the expansion process.
Given a sequence of tokens to be scanned, the preprocessor =uccessively looks at each token looking for identifiers. If a given token is not an identifier, the preprocessor outputs the =oken and moves on to the next.
+ X OBJECT + ^
+ X OBJECT + ^
VVVVV
The term "output" is being used generally. Sometimes it means that the preprocessor is finished with the token =nd that the token can be handed off to the underlying language parser =r output to a text stream (in the case of a compiler's preprocess-only =ode). Other times, it means that it is output to a buffer that is later =eused by the preprocessor for another purpose. In all cases, however, it does mean that this scan for macro expansion is finished with the token and that the token results from = this scan. ^^^^^ shouldn't this be part of the "Conventions" section?
If the token is an identifier, the preprocessor must check =o see if the identifier corresponds to the name of a macro that is =urrently defined. If it is not, the preprocessor outputs the token and moves on to the =ext. In the running example, the current token is the identifier =, which does not correspond to a macro name.
+ X OBJECT + ^
+ X OBJECT + ^ _________________________________________________________________
"Blue Paint"
If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it outputs the token and moves on to the next.
When an identifier token is painted, it means that the preprocessor =ill not attempt to expand it as a macro (which is why it outputs it and =oves on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not corresponds to a macro. This disabled flag is commonly referred to as "blue paint," and if =he disabled flag is set on a particular token, that token is called =painted." (The means by which an identifier token can become painted is =escribed below.)
Remove redundancy in the two paragraphs above. I like the "behaves like an identifier that does not correspond to a macro name"-part.
In the running example, the current token is the identifier =BJECT, which does correspond to a macro name. It is not painted, however, so the preprocessor moves on to the next =tep. _________________________________________________________________
Disabling Contexts
If the current token is an identifier token that corresponds to a =acro name, and the token is not painted, the preprocessor must =heck to see if a disabling context that corresponds to the macro =eferred to by the identifier is active. If a corresponding disabling context is active, the preprocessor =aints the identifier token, outputs it, and moves on to the next token.
A "disabling context" corresponds to a specific macro and exists over = range of tokens during a single scan. If an identifier that refers to a macro is found inside a disabling =ontext that corresponds to the same macro, it is painted.
Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext comes into existence is =iscussed below.)
In the running example, the current token is still the identifier =BJECT. It is not painted, and there is no active disabling context that =ould cause it to be painted. Therefore, the preprocessor moves on to the next step.
The introductions of these terms feels structurally too aprupt to me. Introduce these terms along the way, continuing with the example. ______________________________________________________________
Object-like Macro Invocations
If an identifier token that corresponds to a macro name is not =ainted, and there is not active disabling context that would cause it =o be painted, the preprocessor must check to see what type of macro is =eing referenced--object-like or function-like. If the macro is defined as an object-like macro, which is the case =BJECT in the running example, the identifier alone forms =n invocation of the macro, and the preprocessor begins the replacement =rocess of that invocation.
+ X OBJECT + |^ | |______| | OBJECT invocation (INV)
For an object-like macro, the first step in the replacement process =s the retrieval of the replacement list of the macro from the symbol =able. In the running example, the OBJECT macro's replacement =ist is
OBJ ## ECT F()
The preprocessor then performs any token-pasting operations in the replacement list. (Note that there is no stringizing operator in object-like macros.) The result of token-pasting in OBJECT's replacement list =s
OBJECT F()
After all token-pasting has been performed, the resulting sequence of tokens is used to replace the invocation of the macro. At the same time, a disabling context that corresponds to the macro =eing replaced is activated. This disabling context surrounds the tokens that came from the replacement list.
+ X OBJECT F() + | | |__________| | OBJECT disabling context (DC)
<-- explain what a disabling context and then what blue paint is is here
Finally, scanning resumes beginning with the first token that came =rom the replacement list (or the next token after the invocation if the replacement list was empty). In the running example, scanning resumes at OBJECT.
+ X OBJECT F() + |^ | |__________| | OBJECT DC
Note that this OBJECT identifier is a different =dentifier token than the one that formed an invocation of the =BJECT macro. It came from the replacement list of the OBJECT macro =indirectly by way of token-pasting).
<snip>
Nullary Invocations
If scanning finds that an identifier (followed by a left-parenthesis) =hat refers to a nullary function-like macro (as is the case with = in the running example) the preprocessor must find the =orresponding right-parenthesis. The preprocessor must find the right-parenthesis. If it cannot (e.g. it finds EOF instead), the result is an =incomplete invocation" error.
While it does so, it must ensure that there are no tokens between the left-parenthesis and the right-parenthesis. Specifically, between the two parentheses, there must be either one =r more whitespace separations or nothing at all. If any tokens are present between them, the result is a "too many =rguments" error.
Try to remove some detail.
The identifier and the left- and right-parentheses form an invocation =f the macro, and the preprocessor begins the replacement process of =hat invocation.
+ X OBJECT' F() + | |^ || | |___|| | | | | F INV |____________| | OBJECT DC
As with an object-like macro, the first step in the replacement =rocess is the retrieval of the replacement list of the macro from the =ymbol table. In the running example, the F macro's replacement list =s
G(G(G))
The preprocessor then performs any token-pasting operations in the replacement list. (Note that a nullary macro is a function-like macro, so the =tringizing operator exists. However, the stringizing operator must be applied to an instance of a =ormal parameter in the replacement list. A
<-- add missing "nullary"
function-like macro has no formal parameters, and therefore any use =f the stringizing operator is automatically an error.) The result of token-pasting
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
in F's replacement list is
G(G(G))
In this case, there are no token-pasting operations to perform, so =he result is a no-op.
The sequence of tokens resulting from token-pasting is used to =eplace the invocation of the macro. At the same time, a disabling context that corresponds to the macro =eing replaced is activated. This disabling context surrounds the tokens that came from the =eplacement list.
+ X OBJECT' G(G(G)) + | | || | |_______|| | | | | F DC | |________________| | OBJECT DC
Finally, scanning resumes beginning with the first token that came =rom the replacement list (or the next token after the invocation if the replacement list was empty).
+ X OBJECT' G(G(G)) + | |^ || | |_______|| | | | | F DC | |________________| | OBJECT DC
Nullary function-like macro invocations are nearly identical to object-like macro invocations. The primary difference is that an invocation of a function-like macro =equires multiple tokens. _________________________________________________________________
Interleaved Invocations
It is important to note that disabling contexts only exist during a =ingle scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and =hitespace separations. Some of those tokens might be painted (and they remain painted), but =isabling contexts are not part of the result of scanning. (If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar). I believe I wouldn't really understand what you are talking about here without knowing that part of the standard. "A single scan" -- the concept of rescanning was introduced too periphicially to make much sense to someone unfamiliar with the topic.
In the diagrams used in this document, the tokens that have been =utput by a scan are left in the diagram to provide context for the =eader, such as:
<snip>
Note that interleaved invocations do not allow for infinite =xpansion. More tokens must physically be present after the replacement list to complete an interleaved invocation, and this sequence of tokens is ultimately limited to the finite sequence of tokens contained in the source file. _________________________________________________________________
The above part seems very good to me.
Non-nullary Invocations
If scanning finds an identifier (followed by a left-parenthesis) that refers to a non-nullary function-like macro (as is the case with = G in the running example) the preprocessor must find the corresponding right-parenthesis. The preprocessor must find the right-parenthesis. If it cannot (e.g. it finds EOF instead), the result is an =incomplete invocation" error.
While it does so, it must delineate the actual arguments between the =eft- and right-parentheses. This delineation process is a distinct step that separates each =argument into a separate argument before any other processing of the =rguments is done. Each argument is separated by a comma token (,), but =ommas between matched pairs of left- and right-parentheses do not =eparate arguments, and the right-parenthesis of such matched pairs is =ot used as the right-parenthesis that terminates the argument list. For example, in
MACRO(a, (b, c), d)
the argument list to MACRO is delineated into three =rguments.
After the arguments have been delineated, the preprocessor compares =he number of actual arguments to the arity of the macro. If there are more or less, than the result is a "too many arguments" =r "too few arguments" error. (If the macro is variadic, the number of arguments must be at least one greater than the number of named formal parameters. This implies that a macro defined as:
#define V(x, y, ...) // ...
Mention that variadic macros are not a C++ feature.
has a minimum arity of three--not two.)
In C++, if any argument is empty or contains only whitespace =eparations, the behavior is undefined. In C, an empty argument is allowed, but gets special treatment. (That special treatment is described below.)
It requires at least C99, right? If so, say it (it's likely there are C compilers that don't support that version of the language). <snip>
+ X OBJECT' G(G(G)) + | ||^ ||| | ||_______||| | | | || | | G INV | | |_________|| | | | | F DC | |__________________| | OBJECT DC
// recursive scan of argument #1 to G:
G(G) |^ | |____| | G INV _ _ ______ _ _ | F DC _ _ ______ _ _ | OBJECT DC
// recursive scan of argument #1 to G:
G ^ _ _ _ _ _ | F DC _ _ _ _ _ | OBJECT DC
G ^ _ _ _ _ _ | F DC _ _ _ _ _ | OBJECT DC
// recursive scan results in: G
G |^| |_| | G DC _ _ ___ _ _ | F DC _ _ ___ _ _ | OBJECT DC
G' ^ _ _ ___ _ _ | F DC _ _ ___ _ _ | OBJECT DC
// recursive scan results in: G'
+ X OBJECT' G' + | ||^ ||| | ||__||| | | | || | | G DC | |____|| | | | | F DC |_____________| | OBJECT DC
+ X OBJECT' G' + ^
+ X OBJECT' G' + ^
Thus, the running example finally results in
+ X OBJECT' G' +
Note that the disabling context that is created when a macro =nvocation is replaced is not created until after all arguments =hat need to scanned for macro expansion have been. This is why both G's expand in a the example.
You should repeat the "painting rules" here. You already said that paint applies to single tokens but it's a good idea to more explicitly point out to the reader that no parentheses are needed for G to end up painted.
_________________________________________________________________
Arguments to Interleaved Invocations
Because arguments not used as operands of token-pasting or =tringizing are scanned separately but within any active disabling =ontexts, the possibility exists for an argument (or part of an =rgument) to be scanned in inside a disabling context that no longer =xists after the macro
An "in" too much?
invocation has been replaced. Consider,
#define A(m) m( B(f) #define B(x) A(x)
#define C(x) [ x ]
A(C) ) *
The following diagrams the expansion of this sequence of tokens:
A(C) ) * |^ | |____| | A INV
// recursive scan of argument #1 to A results in:
C( B(f) ) * ||^ | | ||_______|_| | | | | C INV |________| | A DC
// recursive scan of argument #1 to C:
B(f) |^ || |____|| | | B INV _ _ ______| | A DC
// recursive scan of argument #1 to B results in: f
A(f) |^ || |____|| | | B DC _ _ ______| | A DC
A'(f) | ^ || |_____|| | | B DC _ _ _______| | A DC
// recursive scan results in: A'(f)
[ A'(f) ] * |^ | |_________| | C DC
The result of scanning for expansion is
[ A'(f) ] *
Note that the argument to C was scanned inside the =isabling context corresponding to A, but that disabling =ontext is no longer active when scanning resumes after the replacement =f C's invocation. (Scenarios such as this one are tricky, and preprocessors rarely =andle this kind of thing correctly.)
Point out explicitly that A remains painted.
_________________________________________________________________
Virtual Tokens
[...]
Interesting! Is it implemented somewhere? BTW. I'm done with the comments, so far. I believe there is more that can be done to enhance the structure. Anyway, all in all it reads quite nice. I hope it's of any use.
© Copyright [1]Paul Mensonides = 2003-2006
Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
There should be some sort of disclaimer in the documentation of Boost.PP that mentions that the design is most unfortunately influenced by the bad quality of some widely used preprocessors. Otherwise there will be false conclusions that pp-metaprogramming is either ugly by definition or that there is something wrong with the design of the library...
Well, the design is certainly influenced by typical preprocessors. However, the design is not completely inelegant. The fundamental problems with the design of the library as is are more related to a lack of an extensibility model. Even with preprocessors the way they are, I still could hide all of the algorithm states completely, and that would make everything more usable, but it would also make everything slower (to varying degrees on various preprocessors). I'll address your comments about the article in a separate post. I re-rendered the entire article as plain text from the XML source, mailed it to myself to make it quote correctly, replied by inserting your comments. Thus, the entire article will be in my reply, all of your quotes, and my responses to them. Regards, Paul Mensonides

Paul Mensonides wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
There should be some sort of disclaimer in the documentation of Boost.PP that mentions that the design is most unfortunately influenced by the bad quality of some widely used preprocessors. Otherwise there will be false conclusions that pp-metaprogramming is either ugly by definition or that there is something wrong with the design of the library...
Well, the design is certainly influenced by typical preprocessors. However, the design is not completely inelegant. The fundamental problems with the design of the library as is are more related to a lack of an extensibility model. Even with preprocessors the way they are, I still could hide all of the algorithm states completely, and that would make everything more usable, but it would also make everything slower (to varying degrees on various preprocessors).
I said neither "Boost.PP's design is entirely inelegant" nor "the next dimension parameters suck". The parts that are inelegant IMO are the vertical dependencies and coupling of algorithms and data structures. If I understood you correctly, so far, it isn't possible to change it without narrowing portability.
I'll address your comments about the article in a separate post. I re-rendered the entire article as plain text from the XML source, mailed it to myself to make it quote correctly, replied by inserting your comments. Thus, the entire article will be in my reply, all of your quotes, and my responses to them.
They're just some instruction based on what I /believe/ would make the text more understandable. However, I'm not entirely unfamiliar with the topic and someone who is would probably give a better proof-reader than I do. BTW. There were some people organizing themselfes as "documentation helpers" on this list recently (see http://tinyurl.com/783k9). Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
Here are my comments on the text. Note that I use imperative for the sake of simplicity.
Okay. Thanks for your comments.
The process of macro expansion is best viewed as the process of scanning for macro expansion (rather than the process of a single macro expansion alone). When the preprocessor encounters a sequence of preprocessing tokens and whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this document in detail.
Strike that paragraph. It uses terms not yet defined and doesn't say much more than the title (assuming it's still "how macro expansion works").
The paragraph might not be in the right spot, but what the paragraph says is important. The process of macro expansion includes more than just expanding a single macro invocation. Rather, it is the process of scanning for macro expansions, and the whole thing is defined that way.
[Conventions]
Several conventions are used in this document for the sake of simplicity. First, except in contexts where the distinction matters, this document uses the terms "token" and "preprocessing token" interchangeably. Where the distinction does matter, this document will explicitly differentiate between them by using the terms "regular token" and "preprocessing token." From a technical standpoint, the primary difference between the two token types relates to permissiveness. Preprocessing tokens are much more permissive than regular tokens.
How can tokens be permissive? Their definiton is probably more permissive (if you can say that).
Yes, I'll rewrite this. Their definitions are more permissive because they don't have the semantic contraints that regular tokens do.
Ignoring classifications (e.g. keyword, identifier), all regular tokens are trivially converted to preprocessing tokens. Not all preprocessing tokens can be converted to regular tokens. For example, 0.0.0.0 is a single, valid preprocessing token, but it cannot be converted to a regular token. Thus, preprocessing tokens can be thought of as a superset of regular tokens.
Second, when this document uses the phrase "sequence of tokens" (i.e. "sequence of preprocessing tokens"), the possibility of whitespace separations is implied. In other words, "sequence of tokens" really means "sequence of tokens and whitespace separations." Both the C and C++ standards do this as well, though they don't explicitly say they are doing it. Rather, it is implied by the phases of translation and by certain parts of the text such as §16.3.2/2 of the C++ standard and §6.10.3.2/2 of the C standard.
Third, this document uses the term "scanning" to mean "scanning for macro expansion." There are other kinds of scanning (in the general sense) that are performed during the macro expansion process. For example, a replacement list is scanned for instances of formal parameters to substitute. Other kinds of scanning besides scanning for macro expansion are referred to explicitly.
Lastly, this document uses a graphical notation to illustrate various parts of the process in examples. For clarity, all whitespace separations are ignored in these diagrams. Given a sequence of tokens, the current location of the scanner is indicated with a circumflex (^). For example,
+ + + ^
A range of tokens is indicated by using a frame such as:
+ + + |^ | |_____| | LABEL
These frames are labeled with the meaning of the frame or the sequence of tokens bounded by the frame. An identifier token that has been painted is marked with an apostrophe (').
The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.
I do in the very next sentence.
A' token ^
This document will discuss what it means for an identifier token to be "painted." It also doesn't use character literals ('a') to avoid ambiguity. It is important to note that the apostrophe is not literally added to the sequence of tokens. Rather, it is only a notation that indicates a property of the identifier to which it is "attached."
[Locations]
There are several points where the preprocessor must scan a sequence of tokens looking for macro invocations to expand. The most obvious of these is between preprocessing directives (subject to conditional compilation). For example,
I had to read this sentence multiple times for me to make sense...
What part was difficult to follow? It seems pretty straightforward to me (but then, I know what I'm looking for).
#include "file.h"
// ...
#define A()
// ...
#undef A
Both of the sections containing comments must be scanned for macro expansion.
The preprocessor must also scan a few other sequences of tokens. These include the expression that controls an #if or #elif directive. Subject to certain conditions, the operands of the #include directive and the #line directive are also scanned for macro expansion.
The places where a sequence of tokens is scanned for macro expansion in a source file are not really the subject of this document. As such, the remainder of this document concentrates on just sequences of tokens that must be scanned, though sometimes the #define directive is used.
^^^^^ simplify this section by using negative logic (in other words: enumerate the contexts where /no/ macro expansion is done).
!! I don't think this is a good idea. A "List O' Negatives" almost invariably forgets some things--e.g. the SFINAE "List O' Negatives".
Before continuing, however, note that an argument to a macro that contains what could be a preprocessing directive results in undefined behavior. For example,
#define MACRO(x) x
MACRO( #include "file.h" )
Indicate more clearly that this code is not OK.
The next sentence says that it is undefined behavior. I'm not sure how to make it more clear than that.
This is undefined behavior because it gives a preprocessor implementation latitude. Specifically, it allows for a preprocessor to parse an entire file into directives and sequences of tokens between directives prior to semantic evaluation. It also allows for a preprocessor to semantically evaluate the result of preprocessing in a single pass.
Put the implementation details in parentheses or use a footnote.
First, this article is about how macro expansion works, and is therefore very implementation-oriented. Second, this isn't really implementation detail so much as rationale for the undefined behavior classification for this particular case. I considered striking this entire section because the article is not about what the preprocessor does as a whole--just macro expansion.
[Scanning]
Consider the following macro definitions and the subsequent sequence of tokens to be scanned:
#define OBJECT OBJ ## ECT F() #define F() G(G(G)) #define G(x) x
+ X OBJECT +
These definitions and the subsequent sequence of tokens will be used as a running example in describing the steps of the expansion process.
Given a sequence of tokens to be scanned, the preprocessor successively looks at each token looking for identifiers. If a given token is not an identifier, the preprocessor outputs the token and moves on to the next.
+ X OBJECT + ^
+ X OBJECT + ^
The term "output" is being used generally. Sometimes it means that the preprocessor is finished with the token and that the token can be handed off to the underlying language parser or output to a text stream (in the case of a compiler's preprocess-only mode). Other times, it means that it is output to a buffer that is later reused by the preprocessor for another purpose. In all cases, however, it does mean that _this_ scan for macro expansion is finished with the token and that the token results from _this_ scan.
^^^^^ shouldn't this be part of the "Conventions" section?
I'm not sure. This is the first point where it is used, and so I describe how it is being used here. I think that if I try to define it in the "Conventions" section, I'll end up having to pull a bunch of the surround context up with it.
If the token _is_ an identifier, the preprocessor must check to see if the identifier corresponds to the name of a macro that is currently defined. If it is not, the preprocessor outputs the token and moves on to the next. In the running example, the current token is the identifier X, which does not correspond to a macro name.
+ X OBJECT + ^
+ X OBJECT + ^
[Blue Paint]
If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it outputs the token and moves on to the next.
When an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro (which is why it outputs it and moves on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not corresponds to a macro. This disabled flag is commonly referred to as "blue paint," and if the disabled flag is set on a particular token, that token is called "painted." (The means by which an identifier token can become painted is described below.)
Remove redundancy in the two paragraphs above.
It might be redundant, but it is a result of the underlying double-structure that I'm trying to maintain over most of the article. When I first wrote the article (after being asked to do so by several people from Boost a few weeks ago), I wrote it so it was very algorithmic. E.g. if <predicate> then <A> else <B>. After writing it this way, I decided that it was too much like a technical specification, so I decided that I needed a concrete (if contrived) running example. Then, instead of having a pure algorithm structure, I could describe the processing of this example from start to finish. The example that I decided on was designed specifically to address most of the major points about the macro expansion process and to introduce complexity as gradually as possible. For example, the example deals with an object-like macro invocation first, then a nullary macro invocation, then a non-nullary macro invocation. Each of these adds significant complexity to the process. However, I also didn't want to lose the technical specification completely. The net result is that you get paragraphs like the above. The first is a technical "if <predicate> then <A> else <B>". The second is commentary on the predicate used in the first. All in all, this was the most difficult part. I'm trying to maintain two structures at the same time because I think that both are necessary.
I like the "behaves like an identifier that does not correspond to a macro name"-part.
Okay.
In the running example, the current token is the identifier OBJECT, which _does_ correspond to a macro name. It is not painted, however, so the preprocessor moves on to the next step.
[Disabling Contexts]
If the current token is an identifier token that corresponds to a macro name, and the token is _not_ painted, the preprocessor must check to see if a disabling context that corresponds to the macro referred to by the identifier is active. If a corresponding disabling context is active, the preprocessor paints the identifier token, outputs it, and moves on to the next token.
A "disabling context" corresponds to a specific macro and exists over a range of tokens during a single scan. If an identifier that refers to a macro is found inside a disabling context that corresponds to the same macro, it is painted.
Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext comes into existence is discussed below.)
In the running example, the current token is still the identifier OBJECT. It is not painted, and there is no active disabling context that would cause it to be painted. Therefore, the preprocessor moves on to the next step.
The introductions of these terms feels structurally too aprupt to me. Introduce these terms along the way, continuing with the example.
They appear at the first point where their definition must appear. The preprocessor must perform these steps at this specific time. This again is an occurrence of the double-structure. The algorithm is iterative and loops back to this same situation but where tokens might be painted or disabling contexts might be active. Thus, the article contains a degree of iteration also. I don't know how to avoid this without discarding one structure or the other.
[Object-like Macro Invocations]
If an identifier token that corresponds to a macro name is not painted, and there is not active disabling context that would cause it to be painted, the preprocessor must check to see what type of macro is being referenced--object-like or function-like. If the macro is defined as an object-like macro, which is the case OBJECT in the running example, the identifier alone forms an invocation of the macro, and the preprocessor begins the replacement process of that invocation.
+ X OBJECT + |^ | |______| | OBJECT invocation (INV)
For an object-like macro, the first step in the replacement process is the retrieval of the replacement list of the macro from the symbol table. In the running example, the OBJECT macro's replacement list is
OBJ ## ECT F()
The preprocessor then performs any token-pasting operations in the replacement list. (Note that there is no stringizing operator in object-like macros.) The result of token-pasting in OBJECT's replacement list is
OBJECT F()
After all token-pasting has been performed, the resulting sequence of tokens is used to replace the invocation of the macro. At the same time, a disabling context that corresponds to the macro being replaced is activated. This disabling context surrounds the tokens that came from the replacement list.
+ X OBJECT F() + | | |__________| | OBJECT disabling context (DC)
<-- explain what a disabling context and then what blue paint is is here
Do you mean that they should be defined here for the first time, or that they should be defined here again (but maybe with less detail)?
Finally, scanning resumes beginning with the first token that came from the replacement list (or the next token after the invocation if the replacement list was empty). In the running example, scanning resumes at OBJECT.
+ X OBJECT F() + |^ | |__________| | OBJECT DC
Note that this OBJECT identifier is a _different_ identifier token than the one that formed an invocation of the OBJECT macro. It came from the replacement list of the OBJECT macro (indirectly by way of token-pasting).
[Iterative Replacement vs. Functional Hierarchy]
Before continuing, it is important to note that a replacement list is not scanned for macro expansion prior to the replacement of an invocation. For example,
#define A B #define B 123
A
It is tempting to believe that the top-level scan calls A, A calls B, B returns 123 to A, and A returns 123 to the top-level. In other words, it is tempting to believe that the invocations form a functional hierarchy similar to the procedural programming model that is at the root of most programming paradigms. That is _not_ the way that macro expansion works, however. Instead, the top-level scan replaces A with B and resumes. It then replaces B with 123 and resumes.
This iterative replacement model may seem foreign, but it has a strong similarity to inlined functions. For example,
inline int g() { return 123; }
inline int f() { return g(); }
const int x = f();
Instead of calling f, the call of f is effectively replaced with the body of f. The call of g contained therein, is effectively replaced with the body of g. Inlined functions are not quite that simple, of course. They operate under the rules of lexical scope, for example, which macro expansion does not. Nevertheless, inlined functions have a similarity to the iterative replacement model of macro expansion, and one can view macro expansion as a sort of "continual inlining."
[Resumption of Scanning]
After the OBJECT invocation has been replaced in the running example, scanning resumes on the first token that came from the replacement list.
+ X OBJECT F() + |^ | |__________| | OBJECT DC
This time, however, when the preprocessor looks at the OBJECT token, a disabling context that corresponds to OBJECT is active. Because of that, this particular OBJECT token is painted, and the preprocessor moves on to the next token.
+ X OBJECT' F() + | ^ | |___________| | OBJECT DC
[Function-like Macro Invocations]
If an identifier token that corresponds to a macro name is not painted, and there is not active disabling context that would cause it to be painted, the preprocessor must check to see what type of macro is being referenced--object-like or function-like. If the macro is function-like, things are a little more complex than with object-like macros. The process is similar, but several more steps are performed by the preprocessor.
After finding that the an identifier token refers to a function-like macro, the preprocessor must look ahead to the next token in the sequence of tokens to see if it is a left-parenthesis. If it is not, the preprocessor outputs the identifier, and moves on to the next token (i.e. the one that was not a left-parenthesis).
If the next token after the identifier _is_ a left-parenthesis, the preprocessor must check the arity of the macro referenced by the identifier. In the running example, the preprocessor has found that F identifier and the ( token immediately following it (as the next token, whitespace between the identifier and the left-parenthesis is ignored). The preprocessor sees that F is defined as a nullary macro--that is, it takes no arguments.
[Nullary Invocations]
If scanning finds that an identifier (followed by a left-parenthesis) that refers to a nullary function-like macro (as is the case with F in the running example) the preprocessor must find the corresponding right-parenthesis. The preprocessor _must_ find the right-parenthesis. If it cannot (e.g. it finds EOF instead), the result is an "incomplete invocation" error.
While it does so, it must ensure that there are no tokens between the left-parenthesis and the right-parenthesis. Specifically, between the two parentheses, there must be either one or more whitespace separations or nothing at all. If any tokens are present between them, the result is a "too many arguments" error.
The identifier and the left- and right-parentheses form an invocation of the macro, and the preprocessor begins the replacement process of that invocation.
+ X OBJECT' F() + | |^ || | |___|| | | | | F INV |____________| | OBJECT DC
As with an object-like macro, the first step in the replacement process is the retrieval of the replacement list of the macro from the symbol table. In the running example, the F macro's replacement list is
G(G(G))
The preprocessor then performs any token-pasting operations in the replacement list. (Note that a nullary macro is a function-like macro, so the stringizing operator exists. However, the stringizing operator must be applied to an instance of a formal parameter in the replacement list. A
<-- add missing "nullary"
Thanks. There are a few other typos as well.
function-like macro has no formal parameters, and therefore any use of the stringizing operator is automatically an error.) The result of token-pasting in F's replacement list is
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
I don't know the rationale for why it is the way it is. It simply is that way. An occurrence of # in the replacement list of a macro definition is the stringizing operator, yet the stringizing operator has the semantic constraint that it must be applied to an instance of a formal parameter. I mention it because it is an automatic error (at the time of definition) and thus doesn't have to be dealt with here. I.e. I'm making it clear that I'm not skipping a step that otherwise has to be performed for a function-like macro.
G(G(G))
In this case, there are no token-pasting operations to perform, so the result is a no-op.
The sequence of tokens resulting from token-pasting is used to replace the invocation of the macro. At the same time, a disabling context that corresponds to the macro being replaced is activated. This disabling context surrounds the tokens that came from the replacement list.
+ X OBJECT' G(G(G)) + | | || | |_______|| | | | | F DC | |________________| | OBJECT DC
Finally, scanning resumes beginning with the first token that came from the replacement list (or the next token after the invocation if the replacement list was empty).
+ X OBJECT' G(G(G)) + | |^ || | |_______|| | | | | F DC | |________________| | OBJECT DC
Nullary function-like macro invocations are nearly identical to object-like macro invocations. The primary difference is that an invocation of a function-like macro requires multiple tokens.
[Interleaved Invocations]
It is important to note that disabling contexts only exist during a _single_ scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and whitespace separations. Some of those tokens might be painted (and they remain painted), but disabling contexts are not part of the result of scanning. (If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).
I'm not so sure. The reason that I even need to write this article is because the standard uses terminology (like "rescanning") in overly general ways. Likewise, the standard doesn't make it clear that it is describing the process of scanning for macro expansion as opposed to just macro expansion. That section of the standard has confused many people--implementors included, so I'm intentionally trying to avoid using the terminlogy that the standard uses because it is misleading to some people. "Rescanning" is a great example of that. It doesn't mean re-(scanning the sequence of tokens for more macros to replace), it means re-scanning the sequence of tokens again, where this scan is scanning for more macros to replace. There are other types of scans (scanning for concatenations to perform, formal parameter instances to substitute, removal of placemarkers in C99) that happen to this particular sequence of tokens. Before this point, this particular sequence of tokens was never scanned for macro replacement with the exception of those tokens that came from an argument to the macro (without being an operand...). So, the terms "scanning" and "rescanning" are being used in a very general way despite the latter being used as a heading. However, after you already know what the process does (because of an article like this one, for example), the standard makes a lot more sense. (Another thing that I'm not doing is bringing up the original psuedo-code algorithm whose behavior was the basis for the text in the standard.)
I believe I wouldn't really understand what you are talking about here without knowing that part of the standard. "A single scan" -- the concept of rescanning was introduced too periphicially to make much sense to someone unfamiliar with the topic.
This all comes back to the beginning--the process is scanning a sequence of tokens for macros to expand (i.e. the first paragraph that you said I should strike). This entire process is recursively applied to arguments to macros (without begin an operand...) and thus this entire scan for macros to expand can be applied to the same sequence of tokens more than once. It is vitally important that disabling contexts don't continue to exist beyond the scan in which they were created, but that blue paint does. As I mentioned, there would be no need for blue paint--what the standard calls "non-replaced macro name preprocessing tokens"--if the disabling contexts weren't transient. I'm pretty sure that I don't use the term "rescanning" anywhere in the whole article (yep, I checked, and I don't).
In the diagrams used in this document, the tokens that have been output by a scan are left in the diagram to provide context for the reader, such as:
+ + + ^
Here, scanning is looking at the third + token. This particular scan is done with the two preceding tokens. Thus, if a disabling context existed around these tokens...
+ + + |^ | |_____| | DC
...the disabling context effectively "closes" as the scan moves on from token to token because the disabling context is not part of the result of the scan.
+ + + |^ | |___| | DC
+ + + |^| |_| | DC
+ + + ^
Because function-like macro invocations require multiple tokens, it is possible for such an invocation to span the end of a replacement list. For example,
#define A() B #define B() 123
A()()
The A() invocation is replaced with B, and a B invocation is formed using the tokens that follow the A invocation.
A() () |^ | |___| | A INV
B () ||^| | ||_|__| | | | |__| B INV | A DC
Thus, the invocation of B is geographically _interleaved_ with the disabling context corresponding to A. When the invocation of B gets replaced by 123 and scanning resumes at 123, the disabling context corresponding to A _no longer exists_.
In order for an "outer" disabling context to remain active, invocations must be nested (rather than interleaved) within that disabling context. For example,
#define C() D() #define D() 123
C()
Here, the D invocation will be nested within the disabling context corresponding to C.
C() |^ | |___| | C INV
D() ||^ || ||___|| | | | | D INV |_____| | C DC
123 ||^ || ||___|| | | | | D DC |_____| | C DC
Disabling contexts (or lack thereof) have no effect on the result of either of these two examples. However, that isn't always the case.
#define A() B #define B() A
A()()()
Here, the result is ultimately an unpainted B token--because no A or B token ever becomes painted.
A() ()() |^ | |___| | A INV
B () () ||^| | ||_|__| | | | |__| B INV | A DC
A () ||^| | ||_|__| | | | |__| A INV | B DC
B |^| |_| | A DC
Things are different in the following example:
#define C() D() #define D() C()
C()
Here, the result is ultimately C'().
C() |^ | |___| | C INV
D() ||^ || ||___|| | | | | D INV |_____| | C DC
C() ||^ || ||___|| | | | | D DC |_____| | C DC
C'() || ^ || ||____|| | | | | D DC |______| | C DC
Note that interleaved invocations do _not_ allow for infinite expansion. More tokens must physically be present after the replacement list to complete an interleaved invocation, and this sequence of tokens is ultimately limited to the finite sequence of tokens contained in the source file.
The above part seems very good to me.
Okay.
[Non-nullary Invocations]
If scanning finds an identifier (followed by a left-parenthesis) that refers to a non-nullary function-like macro (as is the case with G in the running example) the preprocessor must find the corresponding right-parenthesis. The preprocessor _must_ find the right-parenthesis. If it cannot (e.g. it finds EOF instead), the result is an "incomplete invocation" error.
While it does so, it must delineate the actual arguments between the left- and right-parentheses. This delineation process is a distinct step that separates each argument into a separate argument before any other processing of the arguments is done. Each argument is separated by a comma token (,), but commas between matched pairs of left- and right-parentheses do not separate arguments, and the right-parenthesis of such matched pairs is not used as the right-parenthesis that terminates the argument list. For example, in
MACRO(a, (b, c), d)
the argument list to MACRO is delineated into three arguments.
After the arguments have been delineated, the preprocessor compares the number of actual arguments to the arity of the macro. If there are more or less, than the result is a "too many arguments" or "too few arguments" error. (If the macro is variadic, the number of arguments must be at least one greater than the number of named formal parameters. This implies that a macro defined as:
#define V(x, y, ...) // ...
Mention that variadic macros are not a C++ feature.
Yes, good idea. This was an oversight that I should have caught. Note, however, that variadic macros and placemarkers as all but guaranteed to be part of the next C++ standard.
has a minimum arity of three--not two.)
In C++, if any argument is empty or contains only whitespace separations, the behavior is undefined. In C, an empty argument is allowed, but gets special treatment. (That special treatment is described below.)
It requires at least C99, right? If so, say it (it's likely there are C compilers that don't support that version of the language).
As far as I am concerned, the 1999 C standard defines what C is until it is replaced by a newer standard. Likewise, 1998 standard defines what C++ is until it is replaced by a newer standard. I.e. an unqualified C implies C99, and unqualified C++ implies C++98. If I wished to reference C90, I'd say C90. Luckily, I don't wish to reference C90 because I don't want to maintain an article that attempts to be backward compatible with all revisions of a language. This is precisely why I should have a note above that variadic macros are not part of C++, BTW, even though I know they will be part of C++0x. Furthermore, deficiencies of compilers (like not implementing the language as it is currently defined or not doing what they should during this process) is not a subject of this article. OTOH, it wouldn't hurt to mention in the "Conventions" section that, at the time of this writing, C is C99 and C++ is C++98.
Assuming that the number of actual arguments is correct, the identifier, the left- and right-parentheses, and all of the tokens between them form an invocation of the macro, and the preprocessor begins the replacement process of that invocation.
+ X OBJECT' G(G(G)) + | ||^ ||| | ||_______||| | | | || | | G INV | | |_________|| | | | | F DC | |__________________| | OBJECT DC
Here, the preprocessor has delineated the arguments to G which results in in the correct number of arguments for G (which is one, because G is unary). That argument is
G(G)
As with both object-like and nullary function-like invocations, the first step in the replacement list of a function-like macro invocation is the retrieval of the replacement list of the macro from the symbol table. In the running example, the G macro's replacement list is
x
which is an instance of the formal parameter.
After retrieval of the replacement list, the preprocessor replaces all instances of formal parameters in the replacement with actual parameters. However, it does this in a variety of ways depending on how an instance of the formal parameter is used in the replacement list. If a particular instance of a formal parameter in the replacement list is an operand of either the token-pasting operator or the stringizing operator, the actual argument is substituted in place of the formal parameter instance. (In C99, if the argument is empty or contains only whitespace separations, a placemarker is substituted instead.) If a particular instance of a formal parameter is _not_ an operand of either the token-pasting operator or the stringizing operator, the actual parameter is scanned for macro expansion, and the result is substituted in place of the formal parameter instance. (The scanning of an argument is the only true instance of recursion in the macro expansion process.) Though it is scanned independently, any disabling contexts still exist.
Note that an argument is scanned for macro expansion if and _only_ if at least one instance of a corresponding formal parameter exists in the replacement list without being an operand of one of the operators. Thus, in
#define EAT(x) #define BINARY(a, b) // ...
EAT( BINARY() )
no error should occur--even though there would be an error if the argument to EAT was scanned for macro expansion.
After all instances of formal parameters in the replacement list have been substituted, the preprocessor performs all token-pasting and stringizing operations in the resulting sequence of tokens. Note that the precedence between the token-pasting and stringizing operators is unspecified, as is the associativity of token-pasting operator. However, the operands of the token-pasting operator are single tokens (those immediately to the right and left of it). The operand of the stringizing operator is the entire argument. (Recall that it must be applied to an instance of a formal parameter.) Thus, even the order of evaluation between token-pasting and stringizing is unspecified, one can think of a use of the stringizing operator as being replaced with a third "stringized" version of the actual argument. (Otherwise, the preprocessor would have to mark the extent of the substituted argument in some way.) Of course, the order of evaluation is unspecified, and therefore code should avoid relying on any particular order.
It is important to note that, even though the token-pasting operator operates on exactly two tokens (rather than entire arguments), it does affect entire arguments in the sense that the token-pasting operator prevents the entire argument from being scanned for macro expansion prior to substitution.
In C99, placemarkers are used as a temporary pseudo-token so that token-pasting and stringizing work "correctly." For example,
#define M(x) 1 x ## 3
M()
Here, the argument for x is empty, so a placemarker is substituted in place of the x in the replacement list.
1 <placemarker> ## 3
This causes the token-pasting operator to operate on 3 and the placemarker (instead of 3 and 1). Token-pasting of a token to a placemarker results in the token. Token-pasting of two placemarkers results in a placemarker. Stringizing of a placemarker results in the empty string:
#define STR(x) #x
STR() // ""
After all token-pasting and stringizing operations have been performed, all remaining placemarkers are removed, and the resulting sequence of tokens is used to replace the invocation of the macro. At the same time, a disabling conttext that corresponds to the macro being replaced is activated. This disabling context surrounds the toknes that came from the replacement list.
In the running example, the x formal parameter instance in G's replacement list is not an operand of either the token-pasting or stringizing operator. Therefore, the actual argument corresponding to x is scanned for macro expansion separately, but within any active disabling contexts. In this example, during the recursive scan of G's argument, another invocation of G is found, and its argument is recursively scanned as well.
+ X OBJECT' G(G(G)) + | ||^ ||| | ||_______||| | | | || | | G INV | | |_________|| | | | | F DC | |__________________| | OBJECT DC
// recursive scan of argument #1 to G:
G(G) |^ | |____| | G INV _ _ ______ _ _ | F DC _ _ ______ _ _ | OBJECT DC
// recursive scan of argument #1 to G:
G ^ _ _ _ _ _ | F DC _ _ _ _ _ | OBJECT DC
G ^ _ _ _ _ _ | F DC _ _ _ _ _ | OBJECT DC
// recursive scan results in: G
G |^| |_| | G DC _ _ ___ _ _ | F DC _ _ ___ _ _ | OBJECT DC
G' ^ _ _ ___ _ _ | F DC _ _ ___ _ _ | OBJECT DC
// recursive scan results in: G'
+ X OBJECT' G' + | ||^ ||| | ||__||| | | | || | | G DC | |____|| | | | | F DC |_____________| | OBJECT DC
+ X OBJECT' G' + ^
+ X OBJECT' G' + ^
Thus, the running example finally results in
+ X OBJECT' G' +
Note that the disabling context that is created when a macro invocation is replaced is not created until _after_ all arguments that need to scanned for macro expansion have been. This is why both G's expand in a the example.
You should repeat the "painting rules" here. You already said that paint applies to single tokens but it's a good idea to more explicitly point out to the reader that no parentheses are needed for G to end up painted.
I thought that I already did that (it is definitely worth noting). Perhaps a "recall that..." back-reference?
[Arguments to Interleaved Invocations]
Because arguments not used as operands of token-pasting or stringizing are scanned separately but within any active disabling contexts, the possibility exists for an argument (or part of an argument) to be scanned in inside a disabling
An "in" too much?
Yes. I found a few mispeligs too.
context that no longer exists after the macro invocation has been replaced. Consider,
#define A(m) m( B(f) #define B(x) A(x)
#define C(x) [ x ]
A(C) ) *
The following diagrams the expansion of this sequence of tokens:
A(C) ) * |^ | |____| | A INV
// recursive scan of argument #1 to A results in:
C( B(f) ) * ||^ | | ||_______|_| | | | | C INV |________| | A DC
// recursive scan of argument #1 to C:
B(f) |^ || |____|| | | B INV _ _ ______| | A DC
// recursive scan of argument #1 to B results in: f
A(f) |^ || |____|| | | B DC _ _ ______| | A DC
A'(f) | ^ || |_____|| | | B DC _ _ _______| | A DC
// recursive scan results in: A'(f)
[ A'(f) ] * |^ | |_________| | C DC
The result of scanning for expansion is
[ A'(f) ] *
Note that the argument to C was scanned inside the disabling context corresponding to A, but that disabling context is no longer active when scanning resumes after the replacement of C's invocation. (Scenarios such as this one are tricky, and preprocessors rarely handle this kind of thing correctly.)
Point out explicitly that A remains painted.
Okay.
[Virtual Tokens]
In the implementation of a scanning routine, the single most design-affecting part of this whole process is the handling of the disabling contexts. This section discusses a possible way in which they can be implemented. Recall that disabling contexts can be viewed as closing contexts because they are only active during a single scan and because scanning never moves backwards. Because of this property, disabling contexts can be implemented using flags in the symbol table and virtual tokens. Instead of maintaining expensive data structures similar to the above diagrams, when a macro invocation is replaced, the preprocessor can flag the macro in the symbol table as "disabled" and insert a virtual token immediately following the sequence of tokens used to replace an invocation. When scanned, this virtual token clears the flag in the symbol table and is removed. Using @A to indicate a virtual token that clears the disabled flag on A in the symbol table...
#define A() A()
A() A() * |^ | |___| | A INV
// set disabled flag on A in symbol table
A() @A A() * ^
A'() @A A() * ^
// clear disabled flag on A
A'() A() * |^ | |___| | A INV
// set disabled flag on A
A'() A() @A * ^
A'() A'() @A * ^
// clear disabled flag on A
A'() A'() * ^
This implementation strategy is fairly straightforward (and it produces the correct results). It does, however, get a little more complex when parameters are involved. For any given formal parameter, how it is handled during substitution is a property of the formal parameter itself. If the parameter is unused, no variations of the parameter are required. If the parameter is used without being an operand of the token-pasting or stringizing operator, a scanned variation is required. If the parameter is used as an operand of either of the two operators, the actual argument itself is required. These last two variations are not mutually exclusive. For a given formal parameter, both may be required, as in
#define F(x) #x x
Which variations are required for a particular formal parameter is known at the point of definition of the macro.
When a macro invocation is expanded, the preprocessor processes each argument in the order they are passed (rather than the order in which they appear in the replacement list). This ensures a virtual token that exists inside one argument is processed at the correct point.
If a formal parameter is not used in the replacement list of a macro, the argument will be discarded. However, before it is discarded, the preprocessor must execute any virtual tokens that it contains. For example,
#define EAT(x) x #define M() EAT(1
M() 2) |^ | |___| | M INV
// set disabled flag on M
EAT(1 @M 2) |^ | |___________| | EAT INV
Here, even though the argument to EAT is discarded, the virtual token @M must be executed. Otherwise, the disabled flag in the symbol table will never be cleared.
If only a scanned variation of a formal parameter is required, the preprocessor can simply recursively scan the actual argument. That recursive scan will automatically handle the virtual tokens correctly.
If only the actual argument itself is required, the preprocessor must iterate through the tokens contained in the argument to paint tokens and execute virtual tokens. It must paint all identifier tokens that correspond to macros that are flagged in the symbol table as disabled and execute all virtual tokens. For example, if the argument to a macro is
A @A A
and the actual argument itself is required for argument substitution, the sequence of tokens used for substitution must be
A' A
If both a scanned variation and an actual argument variation is required, the preprocessor has to handle both of the above scenarios. This situation is a little more complex, because it has to handle both scenarios correctly with regard to painting and virtual tokens. The simplest strategy is to do the actual argument case first, but to keep track of all virtual tokens executed therein. After the actual argument variation is generated, the preprocessor can undo all of the state changes caused by the execution of those virtual tokens. Then, the scanned variation can be generated in the same way as above--simply by recursively scanning it for macro expansion. It is important that the actual argument variation is handled first, because during the recursive scanning the of the scanned variation, any number of disabled flags could be set and later cleared by virtual tokens. At the start of processing either type of argument, the state must be the same for both cases, and because virtual tokens may be generated and executed during the recursive scan, it would be very difficult to keep track of which ones originally existed in the argument.
Interesting! Is it implemented somewhere?
I implemented it in psuedo-code prior to writing the article. As I mentioned above, I wrote the first draft of the article according to the algorithm, and I wanted a more concrete pseudo-code algorithm for me to reference when "translating" it into text. BTW, the pseudo-code algorithm that I wrote was almost a complete implementation (including pedantically correct handling of whitespace). It wasn't abstract at all. I call it pseudo-code because it is using a bunch of infrastructure that is used but isn't there (such as the lexer, a slew of functions like 'is_defined', etc.). There are a variety of things that would have to change in converting the pseudo-code algorithm to real code (there are a variety of opportunities to reduce dynamic allocation, for example, that would be necessary in real code).
BTW. I'm done with the comments, so far.
I really do appreciate your viewpoint even if it might seem that I'm arguing. Hopefully my responses will give you an idea of what I was thinking when I was writing it and what goals I was trying to achieve. I think a lot of your comments about the structure either directly relate to or are fallout from trying to have both technical specification + commentary and following the running example.
I believe there is more that can be done to enhance the structure. Anyway, all in all it reads quite nice.
Thanks. More than anything else I want to make sure that the result is understandable and useful. I don't expect it to be perfect. I'm not a writer (and I don't like writing)--I'm just writing it to help others (and, indirectly, to help myself). So, I want to do a good enough job that it will useful and clear enough to be understandable. IOW, my aim is not to produce the ideal example of technical writing, but to get the concepts across.
I hope it's of any use.
Definitely. Thanks again. Regards, Paul Mensonides

Paul Mensonides wrote:
The process of macro expansion is best viewed as the process of scanning for macro expansion (rather than the process of a single macro expansion alone). When the preprocessor encounters a sequence of
preprocessing tokens
and whitespace separations that needs to be scanned for macro expansion, it has to perform a number of steps. These steps are examined in this document in detail.
Strike that paragraph. It uses terms not yet defined and doesn't say much more than the title (assuming it's still "how macro expansion works").
The paragraph might not be in the right spot, but what the paragraph says is important. The process of macro expansion includes more than just expanding a single macro invocation. Rather, it is the process of scanning for macro expansions, and the whole thing is defined that way.
It becomes clear there is more to macro expansion than expanding a single macro and that multiple steps are required when reading the text... The paragraph seems to try an introduction but does a bad job, IMO.
The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.
I do in the very next sentence.
Yeah, but with too much text, IMO.
[Locations]
There are several points where the preprocessor must scan a sequence of tokens looking for macro invocations to expand. The most obvious of these is between preprocessing directives (subject to conditional compilation). For example,
I had to read this sentence multiple times for me to make sense...
What part was difficult to follow? It seems pretty straightforward to me (but then, I know what I'm looking for).
"Between preprocesing directives" -- what?! Sure, it is correct. But it's too much from the viewpoint of the preprocessor than from where your reader is at. <snip>
in undefined
behavior. For example,
#define MACRO(x) x
MACRO( #include "file.h" )
Indicate more clearly that this code is not OK.
The next sentence says that it is undefined behavior. I'm not sure how to make it more clear than that.
An obvious sourcecode comment (e.g. in red).
[Blue Paint]
If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it outputs the token and moves on to the next.
When an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro (which is why it
outputs it
and moves on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not
corresponds
to a macro. This disabled flag is commonly referred to as "blue paint," and if the disabled flag is set on a particular token, that token is called "painted." (The means by which an identifier token can become painted is described below.)
Remove redundancy in the two paragraphs above.
I believe I was unclear, here: The redundancy isn't the problem (redundancy is actually a good thing in documentation, when used right) but too much redundancy in one spot...
In the running example, the current token is the identifier OBJECT, which _does_ correspond to a macro name. It is not
painted, however,
so the preprocessor moves on to the next step.
[Disabling Contexts]
If the current token is an identifier token that corresponds to a macro name, and the token is _not_ painted, the preprocessor must check to see if a disabling context that corresponds to the macro referred to by the identifier is active. If a corresponding disabling context is active, the preprocessor paints the identifier token, outputs it, and moves on to the next token.
A "disabling context" corresponds to a specific macro and exists over a range of tokens during a single scan. If an identifier that refers to a macro is found inside a disabling context that corresponds to the same macro, it is painted. Disabling contexts apply to macros themselves over a given geographic sequence of tokens, while blue paint applies to particular identifier tokens. The former causes the latter, and the latter is what prevents "recursion" in macro expansion. (The means by which a disabling cotnext comes into existence is discussed below.)
In the running example, the current token is still the identifier OBJECT. It is not painted, and there is no active disabling context that would cause it to be painted. Therefore, the preprocessor moves on to the next step.
The introductions of these terms feels structurally too aprupt to me. Introduce these terms along the way, continuing with the example.
They appear at the first point where their definition must appear.
I believe it's useful to sustain it. <snip>
from the replacement list.
+ X OBJECT F() + | | |__________| | OBJECT disabling context (DC)
<-- explain what a disabling context and then what blue paint is is here
Do you mean that they should be defined here for the first time, or that they should be defined here again (but maybe with less detail)?
I meant: introduce the terms here.
function-like macro has no formal parameters, and therefore any use of the stringizing operator is automatically an error.) The result of token-pasting in F's replacement list is
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
I don't know the rationale for why it is the way it is.
In this case, "therefore" is a bit strange...
[Interleaved Invocations]
It is important to note that disabling contexts only exist during a _single_ scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and whitespace separations. Some of those tokens might be painted (and they remain painted), but disabling contexts are not part of the result of scanning.
(If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I believe I wouldn't really understand what you are talking about here without knowing that part of the standard. "A single scan" -- the concept of rescanning was introduced too periphicially to make much sense to someone unfamiliar with the topic.
This all comes back to the beginning--the process is scanning a sequence of tokens for macros to expand (i.e. the first paragraph that you said I should strike). This entire process is recursively applied to arguments to macros (without begin an operand...) and thus this entire scan for macros to expand can be applied to the same sequence of tokens more than once. It is vitally important that disabling contexts don't continue to exist beyond the scan in which they were created, but that blue paint does. As I mentioned, there would be no need for blue paint--what the standard calls "non-replaced macro name preprocessing tokens"--if the disabling contexts weren't transient.
Now for the "rescanning" part: You don't have to introduce that term. Anyway I wouldn't have figured out what "a _single_ scan" was supposed to mean without knowing it, so it feels to me here is something missing.
I'm pretty sure that I don't use the term "rescanning" anywhere in the whole article (yep, I checked, and I don't).
"Rescanning" comes from the standard, of course. I bit myself through chapter 16 because I wanted to know how things work before you posted this article.
In C++, if any argument is empty or contains only whitespace separations, the behavior is undefined. In C, an empty argument is allowed, but gets special treatment. (That special treatment is described below.)
It requires at least C99, right? If so, say it (it's likely there are C compilers that don't support that version of the language).
As far as I am concerned, the 1999 C standard defines what C is until it is replaced by a newer standard. Likewise, 1998 standard defines what C++ is until it is replaced by a newer standard. I.e. an unqualified C implies C99, and unqualified C++ implies C++98. If I wished to reference C90, I'd say C90. Luckily, I don't wish to reference C90 because I don't want to maintain an article that attempts to be backward compatible with all revisions of a language. This is precisely why I should have a note above that variadic macros are not part of C++, BTW, even though I know they will be part of C++0x.
The previous version of the language is still widely used and taught so disambiguation makes some sense, IMO.
Furthermore, deficiencies of compilers (like not implementing the language as it is currently defined or not doing what they should during this process) is not a subject of this article.
OTOH, it wouldn't hurt to mention in the "Conventions" section that, at the time of this writing, C is C99 and C++ is C++98.
It adds clutter noone wants to read -- adding the version number still seems the better solution to me ;-).
[Virtual Tokens]
BTW. I would've probably called them "control tokens" in analogy to "control characters" -- "virtual" has that dynamic-polymorphism-association, especially to C++ programmers with non-native English...
I hope it's of any use.
Definitely. Thanks again.
You're welcome -- it's my way to thank you for your support. -- Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
The paragraph might not be in the right spot, but what the paragraph says is important. The process of macro expansion includes more than just expanding a single macro invocation. Rather, it is the process of scanning for macro expansions, and the whole thing is defined that way.
It becomes clear there is more to macro expansion than expanding a single macro and that multiple steps are required when reading the text...
Okay, but I think that I still need something at the beginning to clarify what the article is about.
The paragraph seems to try an introduction but does a bad job, IMO.
That's fair. I'll rework it.
The reader probably has no idea what "painted" means at this point. Indicate the forward-declaration by "see below" or something like that.
I do in the very next sentence.
Yeah, but with too much text, IMO.
Okay. I was trying to summarize the notational conventions that were used throughout the document. I suppose that I could avoid the forward references here, and just introduce the notations when I introduce the concepts.
of tokens looking for macro invocations to expand. The most obvious of these is between preprocessing directives (subject to conditional compilation). For example,
I had to read this sentence multiple times for me to make sense...
What part was difficult to follow? It seems pretty straightforward to me (but then, I know what I'm looking for).
"Between preprocesing directives" -- what?!
Sure, it is correct. But it's too much from the viewpoint of the preprocessor than from where your reader is at.
Well, given that this article is not about the preprocessor at a whole, I think it is safe to assume that readers are familiar with directives--I'm not even describing #define directives in this article. I'm really not sure how else to describe the block of tokens between directives. Except for unused conditional compilation blocks, all tokens between directives are scanned for macro expansion.
<snip>
in undefined
behavior. For example,
#define MACRO(x) x
MACRO( #include "file.h" )
Indicate more clearly that this code is not OK.
The next sentence says that it is undefined behavior. I'm not sure how to make it more clear than that.
An obvious sourcecode comment (e.g. in red).
I'll add a comment, such as: MACRO( #include "file.h" // undefined behavior )
[Blue Paint]
If the current token is an identifier that refers to a macro, the preprocessor must check to see if the token is painted. If it is painted, it outputs the token and moves on to the next.
When an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro (which is why it
outputs it
and moves on). In other words, the token itself is flagged as disabled, and it behaves like an identifier that does not
corresponds
to a macro. This disabled flag is commonly referred to as "blue paint," and if the disabled flag is set on a particular token, that token is called "painted." (The means by which an identifier token can become painted is described below.)
Remove redundancy in the two paragraphs above.
I believe I was unclear, here:
The redundancy isn't the problem (redundancy is actually a good thing in documentation, when used right) but too much redundancy in one spot...
Well, I got complaints before about glossing over blue paint, so I'm trying to be explicit. At the same time, I'm trying to maintain the algorithm's point-of-view. How about: If the current token is an identifier that refers to a macro, the preprocessor must check to see if it is painted. If an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro. In other words, the token itself is flagged as disabled and behaves like an identifier that does not correspond to a macro. This disabled flag is commonly referred to as "blue paint", and thus a token that is marked as disabled is called "painted." (The means by which an identifier token can become painted is describd below.) If the current token is painted, the preprocessor outputs the token and moves on to the next.
function-like macro has no formal parameters, and therefore any use of the stringizing operator is automatically an error.) The result of token-pasting in F's replacement list is
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
I don't know the rationale for why it is the way it is.
In this case, "therefore" is a bit strange...
I don't know. It is fairly cause-and-effect. If you use the stringizing operator in a nullary function-like macro, it is an error. For the same reason, if you use the stringizing operator in a non-nullary function-like macro without applying it to an instance of a formal parameter, it is an error.
[Interleaved Invocations]
It is important to note that disabling contexts only exist during a _single_ scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and whitespace separations. Some of those tokens might be painted (and they remain painted), but disabling contexts are not part of the result of scanning.
(If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I think that any need for this (i.e. such a hint) is symptomatic of trying to view macros like functions (e.g. F calls G, G returns 1 to F, F returns 1 to whatever called F). I'm in favor of doing anything that I can do to eradicate this preconception. At the same time, it is worth noting that the syntax of macro invocations is dynamic. That is, a series of macro invocations cannot be parsed into a syntax tree and then evaluated. It can only parse one invocation at a time, and its result is basically appended to the input.
Now for the "rescanning" part: You don't have to introduce that term. Anyway I wouldn't have figured out what "a _single_ scan" was supposed to mean without knowing it, so it feels to me here is something missing.
This is the same kind of problem as forward references to (e.g.) blue paint, except that it is way worse. The article as a whole describes a single scan for macro expansion (because that is all there is unless another is introduced by non-arbitrary means: the entire process is recursively applied to some arguments).
I'm pretty sure that I don't use the term "rescanning" anywhere in the whole article (yep, I checked, and I don't).
"Rescanning" comes from the standard, of course. I bit myself through chapter 16 because I wanted to know how things work before you posted this article.
It is an ill-chosen term, IMO.
The previous version of the language is still widely used and taught so disambiguation makes some sense, IMO.
Okay, noted.
[Virtual Tokens]
BTW. I would've probably called them "control tokens" in analogy to "control characters" -- "virtual" has that dynamic-polymorphism-association, especially to C++ programmers with non-native English...
It is a *somewhat* common term that has been around for a while. In a way, they aren't that different from polymorphism. Regular tokens (e.g. ++), special tokens (i.e. canonical forms), and virtual tokens are all concrete forms of a more general concept of token. Regards, Paul Mensonides

Paul Mensonides wrote:
I was trying to summarize the notational conventions that were used throughout the document. I suppose that I could avoid the forward references here, and just introduce the notations when I introduce the concepts.
Probably better (I knew your notation already when starting proof-reading, so what you really need is a virgin reader).
Except for unused conditional compilation blocks, all tokens between directives are scanned for macro expansion.
Straight enough.
Well, I got complaints before about glossing over blue paint, so I'm trying to be explicit. At the same time, I'm trying to maintain the algorithm's point-of-view. How about:
If the current token is an identifier that refers to a macro, the preprocessor must check to see if it is painted. If an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro. In other words, the token itself is flagged as disabled and behaves like an identifier that does not correspond to a macro. This disabled flag is commonly referred to as "blue paint", and thus a token that is marked as disabled is called "painted." (The means by which an identifier token can become painted is describd below.) If the current token is painted, the preprocessor outputs the token and moves on to the next.
The paragraph has a *much* better flow now, IMO.
function-like macro has no formal parameters, and therefore any use of the stringizing operator is automatically an error.) The result of token-pasting in F's replacement list is
It's not clear to me why the stringizing operator leads to an error rather than a '#' character. Probably too much of a sidenote, anyway.
I don't know the rationale for why it is the way it is.
In this case, "therefore" is a bit strange...
I don't know. It is fairly cause-and-effect. If you use the stringizing operator in a nullary function-like macro, it is an error. For the same reason, if you use the stringizing operator in a non-nullary function-like macro without applying it to an instance of a formal parameter, it is an error.
No strong opinion, here. But while we're at it -- what about # define CHAOS_IP_HASH_I # ?
[Interleaved Invocations]
It is important to note that disabling contexts only exist during a _single_ scan. Moreover, when scanning passes the end of a disabling context, that disabling context no longer exists. In other words, the output of a scan results only in tokens and whitespace separations. Some of those tokens might be painted (and they remain painted), but disabling contexts are not part of the result of scanning.
(If they were, there would be no need for blue paint.)
Misses (at least) a reference to 16.3.4-1 (the wording "with the remaining tokens of the source" (or so) is quite nice there, so consider using something similar).
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I think that any need for this (i.e. such a hint) is symptomatic of trying to view macros like functions (e.g. F calls G, G returns 1 to F, F returns 1 to whatever called F). I'm in favor of doing anything that I can do to eradicate this preconception.
I disagree. If you don't want people to think of macros as functions it's important to point out the differences as clear as possible. It's quite easy to come up with a simple and stupid text replacement mechanism that does *not* have this behaviour without wasting a single thought on functions.
At the same time, it is worth noting that the syntax of macro invocations is dynamic. That is, a series of macro invocations cannot be parsed into a syntax tree and then evaluated. It can only parse one invocation at a time, and its result is basically appended to the input.
The terms 'invocation' and 'function-like macro' are probably much worse than wasting some words on how dynamic things really are... Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
If the current token is an identifier that refers to a macro, the preprocessor must check to see if it is painted. If an identifier token is painted, it means that the preprocessor will not attempt to expand it as a macro. In other words, the token itself is flagged as disabled and behaves like an identifier that does not correspond to a macro. This disabled flag is commonly referred to as "blue paint", and thus a token that is marked as disabled is called "painted." (The means by which an identifier token can become painted is describd below.) If the current token is painted, the preprocessor outputs the token and moves on to the next.
The paragraph has a *much* better flow now, IMO.
Okay.
I don't know. It is fairly cause-and-effect. If you use the stringizing operator in a nullary function-like macro, it is an error. For the same reason, if you use the stringizing operator in a non-nullary function-like macro without applying it to an instance of a formal parameter, it is an error.
No strong opinion, here. But while we're at it -- what about
# define CHAOS_IP_HASH_I #
?
It isn't a function-like macro. In an object-like macro, there is no stringizing operator. That is why it is defined as: #define CHAOS_PP_HASH() CHAOS_IP_HASH_I #define CHAOS_IP_HASH_I # instead of: #define CHAOS_PP_HASH() # // should be an error
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I think that any need for this (i.e. such a hint) is symptomatic of trying to view macros like functions (e.g. F calls G, G returns 1 to F, F returns 1 to whatever called F). I'm in favor of doing anything that I can do to eradicate this preconception.
I disagree.
If you don't want people to think of macros as functions it's important to point out the differences as clear as possible.
There is a section on that alone. There is also a section on interleaved invocations--which is about macro invocations that span the end of replacement list.
It's quite easy to come up with a simple and stupid text replacement mechanism that does *not* have this behaviour without wasting a single thought on functions.
Yes. However, in such a language, whether or not "macro invocations" form a functional hierarchy or not is irrelevant because the result is the same.
At the same time, it is worth noting that the syntax of macro invocations is dynamic. That is, a series of macro invocations cannot be parsed into a syntax tree and then evaluated. It can only parse one invocation at a time, and its result is basically appended to the input.
The terms 'invocation' and 'function-like macro' are probably much worse than wasting some words on how dynamic things really are...
I'm not sure I follow what you're saying here. Can you rephrase? Regards, Paul Mensonides

Paul Mensonides wrote:
I don't know. It is fairly cause-and-effect. If you use the stringizing operator in a nullary function-like macro, it
is an error.
For the same reason, if you use the stringizing operator in a non-nullary function-like macro without applying it to an
instance of a formal parameter, it is an error.
No strong opinion, here. But while we're at it -- what about
# define CHAOS_IP_HASH_I #
?
It isn't a function-like macro. In an object-like macro, there is no stringizing operator.
Fine. Brings me back to the text. This detail is woth mentioning! "There is no stringizing operator for object-like macros" is certainly less clear than "the '#' character in an object-like macro does not act like an operator and is passed on to the output" (or so).
I have to clarify: I'm missing a hint (in the text not the examples) that tokens from outside the replacement list can form a macro invocation together with expansion output. The sentence from 16.3.4-1 is actually quite good.
I think that any need for this (i.e. such a hint) is symptomatic of trying to view macros like functions (e.g. F calls G, G returns 1 to F, F returns 1 to whatever called F). I'm in favor of doing anything that I can do to eradicate this preconception.
I disagree.
If you don't want people to think of macros as functions it's important to point out the differences as clear as possible.
There is a section on that alone. There is also a section on interleaved invocations--which is about macro invocations that span the end of replacement list.
It's quite easy to come up with a simple and stupid text replacement mechanism that does *not* have this behaviour without wasting a single thought on functions.
Yes. However, in such a language, whether or not "macro invocations" form a functional hierarchy or not is irrelevant because the result is the same.
Fair enough ;-).
At the same time, it is worth noting that the syntax of macro invocations is dynamic. That is, a series of macro invocations cannot be parsed into a syntax tree and then evaluated. It can only parse one invocation at a time, and its result is basically appended to the input.
Replacing 'appended' with 'prepended' makes this sentence a really usable explanation of what the standard calls "rescanning".
The terms 'invocation' and 'function-like macro' are probably much worse than wasting some words on how dynamic things really are...
I'm not sure I follow what you're saying here. Can you rephrase?
After we have done a expansion we have to reset our scanner back to the first token in the replacement list. That's what "rescanning" means to me. "The output is prepended to the input" (or so) is probably a better way to say it. Anyway, some redundancy to highlight this process is a good thing because it explains the dynamic nature of the preprocessor's syntax. The terms "invocation" and "function-like macro", however, are pushing the reader towards viewing macros as functions (worse than "rescanning" does IMO). Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
It isn't a function-like macro. In an object-like macro, there is no stringizing operator.
Fine. Brings me back to the text. This detail is woth mentioning!
"There is no stringizing operator for object-like macros" is certainly less clear than "the '#' character in an object-like macro does not act like an operator and is passed on to the output" (or so).
Okay, I'll note it, but I have to be a little more specific. To be accurate, the only # tokens that are stringizing operators are those that exist in the definition of a function-like macro. I.e. it is, for all intents and purposes, replaced by a canonical form at the point of a macro definition. This distinguishs it from (e.g.) a # token that is passed as input to a macro--which is not the stringizing operator. The same is true for ##, except that it is also replaced by a canonical form in object-like macro definitions. (I realize what I just wrote is a mess!) The token-pasting and stringizing operators are a bit like formal parameters in the sense that they can be replaced by canonical forms (i.e. a token that is introduced by the implementation that isn't a normal token--virtual tokens are similar, but have a different function). E.g. when the preprocessor comes across this definition: #define MACRO(a, b) token a ## b token The replacement list stored in the symbol table effectively becomes: token <_1> <##> <_2> token Thus, when the macro is called such as: MACRO(b #, # a) argument substitution yields: token b # <##> # a token (where the # tokens are not the stringizing operator--that would be <#> if it existed in the macro definition) token-pasting yields: token b ## a token and that's the result. The ## in the result is most definitely not the token-pasting operator, nor are the #'s in the input the stringizing operator. Of course, if the invocation was like this: #define MACRO_2(a, b) MACRO(b #, # a) Then it would be an error, because then the #'s are stringizing operators, but the first one isn't applied to a formal parameter. Does that make any sense? (Sorry it's late, and I'm in a hurry to go to bed.)
At the same time, it is worth noting that the syntax of macro invocations is dynamic. That is, a series of macro invocations cannot be parsed into a syntax tree and then evaluated. It can only parse one invocation at a time, and its result is basically appended to the input.
Replacing 'appended' with 'prepended' makes this sentence a really usable explanation of what the standard calls "rescanning".
Yes, but I'm not sure that it fits well with the abstract model. I.e. the abstract preprocessor is scanning a sequence of preexisting tokens. A concrete preprocessor will normally do lexical analysis, scanning for macro expansion, and output in a single pass, and then it is indeed more like prepending it to the input. But with the abstract model, the sequence of tokens that makes up the macro invocation is replaced by the sequence of tokens from the replacement list (after argument substitution, token-pasting, etc.). IOW, it is more like it is mutating a data structure (the sequence of tokens to be scanned), so there isn't really an input or output so much as a side effect. Now that I think about this, I question my use of the term "output" in the article. The abstract model is what defines the result of the preprocessor. Scanning for macro expansion is similar in the abstract model to the early phases of translation where--in the abstract model--the entire results of phase 1 are fed to phase 2. Except for the possibly unfortunate use of "output", the article is following the abstract model. E.g. "moves on to the next token" instead of "gets the next token".
The terms 'invocation' and 'function-like macro' are probably much worse than wasting some words on how dynamic things really are...
I'm not sure I follow what you're saying here. Can you rephrase?
After we have done a expansion we have to reset our scanner back to the first token in the replacement list.
...and this is precisely where the description of the abstract model is more difficult because the sequence of tokens from the replacement list might be the empty sequence. In which case, the position of the scanner is set to the next token beyond the invocation. This can be explained a lot simpler in the concrete model. I have to think about this. *But*, I get what you're saying. I need (in one form or another) to not glance over "moves on to the next token". I.e. it is more like "move back to the beginning of the algorithm starting with the next token".
That's what "rescanning" means to me. "The output is prepended to the input" (or so) is probably a better way to say it. Anyway, some redundancy to highlight this process is a good thing because it explains the dynamic nature of the preprocessor's syntax.
Yes. I need to think about the concrete model vs. the abstract model. Both are correct in that they yield the same results. In other ways, the abstract model is better suited to the description (e.g. where scanning takes a sequence of tokens as input, mutates it, and returns mutated sequence). This more easily describe what happens to macro arguments that scanned for macro expansion.
The terms "invocation" and "function-like macro", however, are pushing the reader towards viewing macros as functions (worse than "rescanning" does IMO).
Well, there is a difference between "function" and "functional hierarchy". Macros a very much like functions in a lot of ways, but what they "return" is not the result of "executing" their definitions. Rather, they return their definitions (i.e. the return code)--sort of like what physically happens when a function is inlined. So, I'm not against viewing macros as functions, I'm against the belief that they form a functional hierarchy. I'm not even against viewing it that way when using macros (provided that you know that ultimately that isn't what is really happening and recognize the implications of the differences). I do that myself when writing code. E.g. I typically *view* the following: #define A() B() ...as A calling B, but I *know* that that is not what is actually happening. (It take some serious "getting-used-to" to follow non-trivial code when viewing it entirely as iterative expansion.) The important thing is that you know that the way you're viewing it is only a mental device used to help you grasp what is happening. Regards, Paul Mensonides

On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Oliver Kullmann
First of all I think here the documentation of the library doesn't reach its full potential yet: I tried first to read through "Topics", but found all examples contrived and complicated (couldn't find a single plain example),
What is a "plain example"?
An example which shows one thing, without any additional fuss, so that it immediately clear what the example is doing, and how it is achieved. If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction. So then I continue to Topics (the next item on the list), where the natural choice seems to look for "Motivation": But here again there is no plain example, but we need to know at this point what the "is_function<> metafunction in Boost" is (no attempt is made to explain this thing). So the solution presented down the page doesn't motivate anything for anybody who is not familiar with all of Boost (it is not even said where we could find this function in Boost, i.e., to what sub-library it belongs). Such examples I call "non-plain". A plain example would consider for example the problem of how to create an enumeration out of a variable list of identifiers. The point is here that the example is *simple*, that is, atomic --- not much reference to anything else is needed. I believe with the style of non-plain examples (which is quite common) completely unnecessary hurdles are created.
and thus I skipped that and went to Reference, where I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious remarks about "The next available BOOST_PP_FOR repetition." etc. But at various places it somehow says that "reentry problems" whatever this means are solved now.
There is an article in the topics section about this (titled "Reentrancy").
How does the reader of the documentation get there? After I found that the Motivation does not motivate much for me, I went on to "Known problems" --- alright, I understand that, but it doesn't give me much. So well, going to techniques : Again the very first example doesn't say what it is supposed to do, and so it continues; all examples are part of some discussion within an expert round of users (who don't bother to even spell out the problems they are discussing). So let's skip techniques. Next comes "Incompatibilities". It speaks about the previous Boost release 1.28, so it seems not of relevance anymore (especially since I'm a new user). So let's skip this, going to "reentrancy". From the headline it's not clear what could be meant by it (if you don't know it). Alright, the first few paragraphs there name the problem. But actually not explicitely, it doesn't say that C or C++ in version soandso has this "feature", but it speaks of "the preprocessor", which might mean a specific compiler, some special features or this very library itself. It also speaks of recursion, which in the usual sense of the word (that is, in mathematics) is not what is really meant. I want to make the point here, that as a documentation author one has similar responsibilities to a car driver: One must anticipate problems. Readers always have a certain special angle how the view it, and one must add many hints and redundancies to help them becoming aware of the incongruence of their point of view with the authors point of view. My main assumption was, that C and C++ respects the programmer, and thus does not disallow potentially "dangerous constructions". So in my mind I glanced over the problem with "recursion", and basically thought that this is no problem for C and C++ (the whole process might go wrong, but we are in control). Coming from this point of view, I then see "recursion", guess they mean something different, and moreover I have already seen somewhere that "reentrancy" (whatever this means) is solved by the new version of the library anyway, so I don't need to bother (not to forget that when I come to this point in the documentation, then I already "learned" that apparently most parts of the documentation have to be skipped). For the reader there are always many unknowns to be solved (what is the language definition here? what is the role of this library? what is the role of the version?), and usually the resulting system of equations doesn't allow a solution (so one needs an iterative approach, reading through it again and again, until it finally seems to converge). But the documentation write can offer great help here by saying as much as possible and as precisely as possible (the problem here is a fundamental C/C++ problem, given by the language definition; the library offers specific ways around them; and the version number needs to be brought up to the current version --- otherwise there is always this cloud of uncertainty, how much this all is really relevant).
Only when looking it up in the rationale of C99 I found a clear statement about the issue (while the C++ standard somehow supposes you know already the issue).
It is pretty clear in both standards (6.10.3.4/2 in C and 16.3.4/2 in C++).
I'm not aware of *any* book on C or C++ discussing this issue. I have quite a collection of them, read (or have read) CUJ, CVu and Overload, but the preprocessor seems to be a tabu. This I think one should keep in mind for the documentation. Alright, leaves the standard(s). I worked through all examples there, but none of them shows the "recursion problem" (at least in the C++ standard). Sure, finally one gets to the point where you understand what they are talking about, but it takes (too) long. You start reading this chapter A preprocessing directive consists of a sequence of preprocessing tokens. The first token in the sequence is a # preprocessing token that is either the first character in the source file (optionally after white space containing no newline characters) or that follows white space containing at least one newline character. The last token in the sequence is the first newline character that follows the first token in the sequence. And so on. Among all of that you have to find the right few sentences.
So a clear statement about the limitations of the looping constructions *at each place* would help I would guess quite a few people trying to use this library.
It is unnecessary clutter. It is not the documentation's responsibility to constantly tell you the way that the language works. Unless the documentation says otherwise, it should be assumed that a macro cannot expand inside itself. That is simply the way that the language works.
As I said, where in the literature on C or C++ (books and articles) you find anything on this?
The library makes it *appear* that a few macros are recursive, but then it explicitly says that the macro can be used as if it was recursive.
(I believe documentation must be written so that it can be entered in many ways, and not assuming you are reading it from the start to the end --- redundancy is needed.)
I disagree on all counts. It is impractical to make every page of the documentation a complete discourse on both how the preprocessor works and how preprocessor metaprogramming works. Beyond that, redundancy in both code and documentation is a maintenance nightmare.
First, text read by humans is different from text read by computers. Due to the inherent imprecision of natural language, and the necessity for the reader to anticipate meaning (a well-know psychological fact that without anticipation no understanding), the reader constantly goes into wrong directions, and only redundancies (saying important things at many places (with different words)) can help him. And second, if the documentation would be organised like software, this might actually help: Every file(!) (not just every translation unit) is supposed to include for example <algorithm> again and again --- accordingly one would at least expect at every reference for some construction (corresponding to a source code file) a *link* to the relevant general sources here ("including" that what needs to be known to understand the reference).
Second of all it seems the current facilities in the library should be enhanced (though I don't understand how FOR and FOR_r is supposed to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing, the use of it should be promoted, while FOR and its derivates look ugly.
Hmm. FOR is far more general. SEQ_FOR_EACH can be implemented with FOR, but not vice versa.
sure --- in other words, FOR_EACH is easier to use!
But it seems that the library favours FOR, and seems to envisage a main loop using FOR, and only at the second level using FOR_EACH.
The library doesn't favor it, it simply is more general purpose. In order to allow the kind of reentrance that FOR allows, hundreds of macros are required for *each* higher-order algorithm. SEQ_FOR_EACH is only one of many.
My solution was to convert the second sequence B into a list, and use BOOST_PP_LIST_FOR_EACH for the inner loop. But it seems to me that this works is pure chance, and with any change in the implementation of BOOST_PP_SEQ_FOR_EACH or BOOST_PP_LIST_FOR_EACH (introducing a common macro) this solution will break.
It will only break if SEQ_FOR_EACH or LIST_FOR_EACH arbitrarily decides to use the other--which won't happen. I design the algorithms in the library, and I'm well aware of vertical dependencies and their implications.
I was referring to the documentation, which seems not to speak about this issue.
According to what I understand from the documentation, the natural solution, offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses macros guaranteed not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in favour of a more complicated solution.
You don't understand the implications of what you're saying.
sure
In order to do that, I'd need to define hundreds of macros to implement SEQ_FOR_EACH alone. It isn't as simple as just replicating the interface macro a few times. Instead, I'd have to reimplement the entire algorithm from scratch--intentionally avoiding the use of any other higher-order algorithm.
the whole thing looks quite complicated; why not adding a FAQ page to the documentation (I was looking for one, but didn't find one) answering for example my above request in this way? I think it would have helped me.
But it seems this more complicated solution works only for one kind of loop, and not for others, and the documentation
I'm not sure what you're referring to by "more complicated solution". If you specify what kind of thing your referring to, I'll expound.
The documentation says In particular, the library must provide reentrance for BOOST_PP_FOR, BOOST_PP_REPEAT, and BOOST_PP_WHILE. There are two mechanisms that are used to accomplish this: state parameters (like the above concatenation example) and automatic recursion. Thus it seems only for these loops you have a special mechanism for reentrancy, and this special, "generic" mechanism" is what I mean with "more complicated" : for the user it's easy to use BOOST_PP_SEQ_FOR_EACH and BOOST_PP_SEQ_FOR_EACH_2, but it's complicated to use the generic method.
seems so arcane that I don't have a clue how it could work (the typical phenomenon, that the documentation wants to show off by presenting only complicated examples, which show all sorts of templates, etc., and this without saying what actually should by achieved --- good code solves one problem at a time, and good documentation discusses one problem at a time).
The documentation has many examples, but you call them either complicated or contrived.
Let's go to the example section: First we have "array_arithmetic.c": It says "This example implements over 2200 functions". Sounds impressive, but is not an example (sounds more like an extension of the library). Among those six examples only "catch_builtin.cpp" seems to be close to an example (which is something you look at and you understand). In the reference section there are actually simple examples, but as far as I can see always only one example: I believe a systematical declination of the elementary cases would be very helpful (always staying simple, just handling simple sequences, but with example 5 to BOOST_PP_SEQ_FOR_EACH for example on would show a nested loop ("without added meaning", that is, no template stuff etc.)).
What exactly to you want? This kind of stuff doesn't boil down to _simple_ use cases.
My understanding is that use cases are not very helpful. What we need is *understanding*. In the best of all worlds, the library would present a complete mathematical definition, and the reader would understand it --- that's it. Now in reality nothing is defined about C or C++ (the standard is not without reason often referred to as "the bible"), and if it were, then (given the current state of specification) it would take us years to digest this. So one has to use a mixture of general explanations and educating examples. Let's make an example. Say we want to introduce addition of integers. The general theory is too complicated, so we are using examples. To make a little parody of the preprocessor-documentation-style, there the example would read as follows: ---------------------------- Consider a graph G (finite, no parallel edges, no loops; generalisation to include these things as an exercise). We have sum_{v in V(G)} deg(v) = 2 |E(G)|. ---------------------------- ??? What does this want to say to us? What are "graphs" etc. ??? But shouldn't one know all these things!! And it's a very nice fundamental property of graphs, just *using* addition in two ways. And here come other examples : What about projective geometries ... (all showing nice examples of addition, and it will enlighten the reader enormously to think through all of them). Alright, here now comes what I would consider as good examples: 0 + 0 = 0 0 + 1 = 1 0 + 2 = 2 0 + 3 = 3 understood? 1 + 0 = 1 1 + 1 = 2 1 + 2 = 3 understood? 2 + 0 = 2 2 + 2 = 4 2 + 5 = 7 understood? 5 + 7 = 12 100 * 2 = 200 sum_{i=1}^n i = n (n+1) /2. Hope you get what I mean.
Rather, typical use cases are parts of overarching designs that are complex--which is precisely the reason that preprocessor metaprogramming enters the picture at all.
As I said, I don't want use cases, I just want to use the preprocessing library (for my own purposes).
Furthermore, I have a responsibility (as does any library author) not to promote naive or poor uses of the library--which is almost always the case with examples that implement something simple.
So the documentation is the guard, meant to be as a kind of torture, and only those worthy souls which get through shall use the library?
So, I can do two things--I can provide simple contrived examples that demonstrate what various things do, and I can provide complex examples that implement real world scenarios. The documentation does both already.
Speaking for myself, I never ever found any interest in those "real world scenario's" (for this library and others), since their "real world" is not mine. Of course, other might disagree, and having a healthy mix of simple and more complicated examples sounds like a good thing.
So if the library offers a natural way of nesting BOOST_PP_SEQ_FOR_EACH, then documenting this (directly, without solving other problems) would be good, or otherwise adding BOOST_PP_SEQ_FOR_EACH_2 would help some users (actually, just the presence of this second form would announce the underlying problem).
The library 1) shouldn't have to announce this problem
I meant the documentation
and 2) already does announce this problem--it just doesn't do it hundreds of times in hundreds of different places. If you are writing code in the language defined by the preprocessor, you should *at least* know the basics of that language. The library is not the language, it is merely a systematic set of reusable constructs written in that language. The documentation's only responsibility is to document itself.
yes, but neither you are an angel (so that you could give a precise definition of the whole thing) nor am I (so that I could read it); instead we are human beings nd mst fnd r wy thrgh ntcptn (there are nicer examples; but hopefully this works a bit: just looking at it I hope you thought huh, but knowing that I left out the consonants it shouldn't be too hard; I have seen a very nice series of examples in this style, where you were guided to read more and more strange stuff, and at the end you were presented with a plain English sentence --- and you couldn't read it anymore!).
-----
I'm trying very hard not be harsh, but what you suggest has far-reaching implications that you don't understand.
sure (you mean not fully)
Regarding examples in the documentation, I don't think there is a way to please you
perhaps there are ways: I believe actually in small steps, massaging something reasonable here and there so that it gets better. If some of the above thoughts would lead to some additions here and there (a few remarks, hints) in the documentation, that would mean progress. --see posts by me in
this thread:
In there you have this thing about "use cases". So in the above example about addition I forget the "use cases" (to which the user "can relate"). Perhaps it goes something as follows (recall the purpose is to explain addition): Assume you have apples and bananas (if you don't like them, then please use strawberries and cherries; if you don't like them then I can't help you). Imagine a big bowl of apples, and a smaller pot of bananas (that's how I organise them). They are placed in a tidy kitchen (to enter virtual worlds) etc etc etc Finally: 5 apples plus 7 bananas gives 11 something. So I'm not such a great fan of use cases.
-----
Regarding the technical implications:
(Please make the effort to follow this.)
As it stands now, the library emulates recursion by replicating macros. For example, a WHILE loop can be implemented such as:
#define WHILE_1(pred, op, state) \ IF(pred(2, state), WHILE_2, state TUPLE_EAT(3))(pred, op, op(2, state)) \ /**/ #define WHILE_2(pred, op, state) \ IF(pred(3, state), WHILE_3, state TUPLE_EAT(3))(pred, op, op(3, state)) \ /**/ // etc. (i.e. hundreds...)
A predicate is passed that determines when the algorithm should stop, and an operation is used to iterate the state on each step of the algorithm.
There are a couple of things here must be understood. First, each WHILE_xyz represents both an entry point into the algorithm (i.e. an interface to the algorithm) *and* a single algorithmic step taken by the algorithm. So, WHILE_2 does not mean "separate nested WHILE loop" or "second looping dimension". Instead, it means "the step after WHILE_1".
Now, when the algorithm (meaning the entire collection of macros) invokes the predicate and the operation, it passes along the number of the next available WHILE_xyz macro. This value is called the "state" of the algorithm. (Note that this is not the 'state' parameter. That is the (arbitrary) state the a particular *use* of the algorithm is iterating.) The predicate and the operation can use this algorithm state to reenter the algorithm, but before the algorithm itself actually continues into that macro. As the algorithm continues, the algorithm state values passed to the predicate and the operation change (i.e. they increase). This is important. They do not remain fixed, which means that the predicate, for example, cannot just arbitrarily use WHILE_2. It has to use the WHILE_xyz indicated by the algorithm state value passed to it.
Obviously it is possible to provide several totally distinct implementations of the entire loop, but that implies several hundred macros per loop, instead of sharing a single set of several hundred macros for all loop uses. Say you did this anyway, and you get WHILE_A, WHILE_B, WHILE_C (and so on, each with their own WHILE_A_1, WHILE_A_2, etc.). This doesn't really buy you anything, because you'll still ultimately end up passing A, B, C (etc.) as a reentry point.
E.g. suppose you write some macro that uses WHILE_A, and then you change another macro so that it uses this macro, but this other macro was already using WHILE_A. Fine, you just change this other macro to use WHILE_B. But then, there is another macro that uses this second macro, and this third macro is already using WHILE_B, so you have to change that to use WHILE_C. This is a significant bubbling of implementation detail, and it is *way* worse when all three macros are written by different people. Ultimately, you arrive at the conclusion that it doesn't matter *which* WHILE the first macro uses, so you parametize the first macro with the WHILE to be used, Similarly with the second (and when the second calls the first, it uses the next WHILE from the one it is currently using). Similarly again with the third. So, instead of having the most nested use of WHILE be WHILE_A, you have the outermost use be WHILE_A, and each inner use adjusts, so that the first nested use uses WHILE_B, the next nested use uses WHILE_C, and so on. This is the only way to do it that scales, and this is ultimately no different than just having one WHILE with one set of implementation macros.
This is exactly (minus a lot of workarounds) the way that the library emulates recursion (particularly: in FOR). The problem comes when you write a higher-order algorithm on top of another higher-order algorithm. For example, SEQ_FOR_EACH on top of FOR. (Higher-order algorithms are fundamentally different because the implementation of the algorithm cannot completely control what macros are called inside of them.) It doesn't take hundreds of macros to implement SEQ_FOR_EACH because it can reuse the more general purpose algorithmic steps provided by FOR. In order to make SEQ_FOR_EACH reentrant, you need, at minimum, a distinct interface macro (i.e. SEQ_FOR_EACH_1, SEQ_FOR_EACH_2, etc.) and distinct macro to be repeated by FOR for each possible entry point into SEQ_FOR_EACH.
That isn't too bad in terms of number of macros, but you introduce yet another algorithm state by doing so. If each higher-order algorithm did so, the user would be inundated by the states of various algorithms. E.g. just to invoke SEQ_FOR_EACH, you'd need to supply both the state of SEQ_FOR_EACH and the state of FOR. Worse, if another algorithm (with multiple entry points) uses SEQ_FOR_EACH, you'd need the state of that algorithm, the state of SEQ_FOR_EACH, and the state of FOR to use it. You resolve this in one of two ways: 1) you simply don't provide reentrance in higher-order algorithms that are implemented using other higher-order algorithms, or 2) you reimplement the algorithm from scratch to avoid using other higher-order algorithms. The former is what the library does, the latter requires hundreds of macros to implement each higher-order algorithm.
Thanks for the explanation! (By the way, to the reader: There are always dragons out there requesting to minimise the use of quotes --- but in the above case it would destroy readability.) I can understand the general point of view. But this general point of view might not necessarily be the point of a user (who in general will just occasionally use the library). Perhaps it might be a good idea to add the above explanation somewhere to the documentation? So if I understand it correctly, you currently have three independent looping constructs BOOST_PP_FOR, BOOST_PP_REPEAT, BOOST_PP_WHILE, while all other loops are implemented in terms of these? Then it would make sense to concentrate on the documentation of these three macros, giving more examples (showing also how to simulate nesting), and emphasising for e.g. FOR_EACH its special character. As far as I can see from the material provided on more powerful pp-libraries, the Boost pp-library won't be further developed, but at some time replaced? I hope nevertheless that some additions to the existing documentation might be possible. Keep up the good work! Oliver

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Oliver Kullmann
What is a "plain example"?
An example which shows one thing, without any additional fuss, so that it immediately clear what the example is doing, and how it is achieved.
Those are the ones that you (and I) call contrived. There are many of those.
If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction.
It isn't really based on an earlier chapter.
So then I continue to Topics (the next item on the list), where the natural choice seems to look for "Motivation": But here again there is no plain example, but we need to know at this point what the "is_function<> metafunction in Boost" is (no attempt is made to explain this thing). So the solution presented down the page doesn't motivate anything for anybody who is not familiar with all of Boost (it is not even said where we could find this function in Boost, i.e., to what sub-library it belongs).
Where does the documentation refer to an "is_function<> metafunction in Boost"? (The documentation uses some similar things as examples.) Incidentally, I didn't write the first three of the documents in the "Topics" section, but I did write the rest of them.
Such examples I call "non-plain". A plain example would consider for example the problem of how to create an enumeration out of a variable list of identifiers. The point is here that the example is *simple*, that is, atomic --- not much reference to anything else is needed.
And I'd call that a bad example that promotes a bad use of the library. Taking this one apart, you don't gain anything by defining a simple enum like this. Instead, you might gain something if you define not only the enum, but also surrounding code that makes the enum more useful than a built in enum (e.g. I/O support, etc.). But when you do that, the example is suddenly much more complicated. As I said before, this kind of thing does not boil down to simple use cases--which is exactly what this kind of example is. Instead, when the abstractions provided are brought all the way down to concrete use cases, those use cases are either bad uses or are complex. This is almost invariably true. If it wasn't, you wouldn't need preprocessor metaprogramming to do it.
and thus I skipped that and went to Reference, where I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious remarks about "The next available BOOST_PP_FOR repetition." etc. But at various places it somehow says that "reentry problems" whatever this means are solved now.
There is an article in the topics section about this (titled "Reentrancy").
How does the reader of the documentation get there? After I found that the Motivation does not motivate much for me, I went on to "Known problems" --- alright, I understand that, but it doesn't give me much. So well, going to techniques : Again the very first example doesn't say what it is supposed to do, and so it continues; all examples are part of some discussion within an expert round of users (who don't bother to even spell out the problems they are discussing).
Expert users are the primary targets of the documentation. This is a generative _metaprogramming_ library. There are all kinds of things that you can do with metaprogramming--one of them is write code that is total crap. The target of the library is advanced programmers, because only advanced programmers have accumulated the design experience necessary to evaluate the tradeoffs involved. Furthermore, many Boost authors can readily identify with the underlying problem that something like 'is_function' takes in stride (and what it really is an example of). That's dealing with variable arity--a very common area in which the library is used.
So let's skip techniques. Next comes "Incompatibilities". It speaks about the previous Boost release 1.28, so it seems not of relevance anymore (especially since I'm a new user).
No, that one isn't relevant to you.
So let's skip this, going to "reentrancy". From the headline it's not clear what could be meant by it (if you don't know it). Alright, the first few paragraphs there name the problem. But actually not explicitely, it doesn't say that C or C++ in version soandso has this "feature", but it speaks of "the preprocessor", which might mean a specific compiler, some special features or this very library itself.
The preprocessor is defined by the standard, and that is what the documentation is referring to. Many real world preprocessors approximate it closely, and some are horribly broken.
It also speaks of recursion, which in the usual sense of the word (that is, in mathematics) is not what is really meant.
I'm not sure I follow this. "Recursion", as I'm using the term in the documentation and as it is commonly used in computer science, is the ability for an interface to use itself.
I want to make the point here, that as a documentation author one has similar responsibilities to a car driver: One must anticipate problems. Readers always have a certain special angle how the view it, and one must add many hints and redundancies to help them becoming aware of the incongruence of their point of view with the authors point of view.
I don't disagree with that, but I also can't accommodate every possible point of view.
My main assumption was, that C and C++ respects the programmer, and thus does not disallow potentially "dangerous constructions". So in my mind I glanced over the problem with "recursion", and basically thought that this is no problem for C and C++ (the whole process might go wrong, but we are in control). Coming from this point of view, I then see "recursion", guess they mean something different, and moreover I have already seen somewhere that "reentrancy" (whatever this means) is solved by the new version of the library anyway,
The documentation doesn't say that *anywhere*.
so I don't need to bother (not to forget that when I come to this point in the documentation, then I already "learned" that apparently most parts of the documentation have to be skipped).
Or read with an attempt to understand what they are saying...
Only when looking it up in the rationale of C99 I found a clear statement about the issue (while the C++ standard somehow supposes you know already the issue).
It is pretty clear in both standards (6.10.3.4/2 in C and 16.3.4/2 in C++).
I'm not aware of *any* book on C or C++ discussing this issue. I have quite a collection of them, read (or have read) CUJ, CVu and Overload, but the preprocessor seems to be a tabu. This I think one should keep in mind for the documentation.
So, what you're saying is: 1) there is no 20+ years of existing literature as there is for C and C++, so 2) this library should provide the equivalent of 20+ years of literature. The bottom line is that the responsibility of the documentation stops after documenting itself. Anything beyond that is a freebie--the documentation is going above an beyond the call, so-to-speak. I'm not against that, but I'm hearing nothing but "everything is poorly done".
Alright, leaves the standard(s). I worked through all examples there, but none of them shows the "recursion problem" (at least in the C++ standard). Sure, finally one gets to the point where you understand what they are talking about, but it takes (too) long. You start reading this chapter
I agree that the standard is not elucidating. I disagree that it is my _responsibility_ (via the documentation) to elucidate it, regardless of how useful it might be to do so. Frankly, the lack of recursion is an fundamental aspect of the way macro expansion works. If you don't know that (and I don't mean all the details down to the subatomic level), you probably aren't *ready* to use this library.
And so on. Among all of that you have to find the right few sentences.
Yep. That's because it is a technical specification, not an explanation. Yes, it is difficult to follow and get all the details. Yes, the elucidation that you want would be useful. No, it is _not_ my responsibility to do so--even though I'm already in the process of doing exactly that for macro expansion. On top of all this, I'm doing it in my spare time for free. Just like I'm spending hours on this conversation (and putting off other things that I'm working on).
First, text read by humans is different from text read by computers. Due to the inherent imprecision of natural language, and the necessity for the reader to anticipate meaning (a well-know psychological fact that without anticipation no understanding), the reader constantly goes into wrong directions, and only redundancies (saying important things at many places (with different words)) can help him.
Technical documentation needs to be concise. I shouldn't have to say the same thing twice. If the reader doesn't understand something later, they can go back and re-read it.
And second, if the documentation would be organised like software, this might actually help: Every file(!) (not just every translation unit) is supposed to include for example <algorithm> again and again --- accordingly one would at least expect at every reference for some construction (corresponding to a source code file) a *link* to the relevant general sources here ("including" that what needs to be known to understand the reference).
I agree with this--as long as it is about library-defined issues. E.g. I don't have a problem with providing cross-references to material on algorithm states. I do have a problem providing cross-references to basic preprocessor functionality every time it might possibly be relevant (which is just about everywhere).
Second of all it seems the current facilities in the library should be enhanced (though I don't understand how FOR and FOR_r is supposed to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing, the use of it should be promoted, while FOR and its derivates look ugly.
Hmm. FOR is far more general. SEQ_FOR_EACH can be implemented with FOR, but not vice versa.
sure --- in other words, FOR_EACH is easier to use!
Sure, as are a whole lot of other things that are just as easy to use as FOR_EACH. IOW, it is nothing special; it is one of many, etc., etc..
It will only break if SEQ_FOR_EACH or LIST_FOR_EACH arbitrarily decides to use the other--which won't happen. I design the algorithms in the library, and I'm well aware of vertical dependencies and their implications.
I was referring to the documentation, which seems not to speak about this issue.
The documentation says that certain constructs are reentrant. It doesn't say for every non-reentrant macro that it is non-rentrant--because that is the status quo, the "normal" situation which is not defined by the library and thus is not the responsibility of the library to document.
According to what I understand from the documentation, the natural solution, offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses macros guaranteed not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in favour of a more complicated solution.
You don't understand the implications of what you're saying.
sure
I'm not saying that to be derogatory, but I am saying that out of frustration because you immediately started pronouncing what should be and what I should do without knowing any of the implications of your pronouncements nor any of the rationale behind the design of the library or what is and isn't in the documentation. You didn't ask *why* something is the way it is. Instead, you ran into problems, and then said that everything should be better (so that you wouldn't have run into problems). I'm also not belittling the problems that you ran into. Preprocessor metaprogramming can be tricky, is very foreign to typical programming models, and has a steep learning curve. Sure, the documentation isn't perfect, but sometimes learning curves are inherently steep.
In order to do that, I'd need to define hundreds of macros to implement SEQ_FOR_EACH alone. It isn't as simple as just replicating the interface macro a few times. Instead, I'd have to reimplement the entire algorithm from scratch--intentionally avoiding the use of any other higher-order algorithm.
the whole thing looks quite complicated; why not adding a FAQ page to the documentation (I was looking for one, but didn't find one) answering for example my above request in this way? I think it would have helped me.
Because I don't want to implement the library in the documentation. These are implementation details and if I introduced those into the documentation, the documentation would be orders of magnitude more complex.
I'm not sure what you're referring to by "more complicated solution". If you specify what kind of thing your referring to, I'll expound.
The documentation says
In particular, the library must provide reentrance for BOOST_PP_FOR, BOOST_PP_REPEAT, and BOOST_PP_WHILE. There are two mechanisms that are used to accomplish this: state parameters (like the above concatenation example) and automatic recursion.
Thus it seems only for these loops you have a special mechanism for reentrancy, and this special, "generic" mechanism" is what I mean with "more complicated" : for the user it's easy to use BOOST_PP_SEQ_FOR_EACH and BOOST_PP_SEQ_FOR_EACH_2, but it's complicated to use the generic method.
Except that you're wrong. You run into massive dependency nightmares if you do that. It isn't simpler--it creates problems that are very difficult for users to track down.
The documentation has many examples, but you call them either complicated or contrived.
Let's go to the example section:
First we have "array_arithmetic.c": It says "This example implements over 2200 functions". Sounds impressive, but is not an example (sounds more like an extension of the library).
No, it's an example. The library is a code generator, not a library of generated code.
Among those six examples only "catch_builtin.cpp" seems to be close to an example (which is something you look at and you understand).
I disagree with your definition of an example. Any code that uses an interface is an example of using that interface. Whether or not it is a good example is another issue. However, the library includes many very simple examples--such as the example for SEQ_FOR_EACH: #define SEQ (w)(x)(y)(z) #define MACRO(r, data, elem) BOOST_PP_CAT(elem, data) BOOST_PP_SEQ_FOR_EACH(MACRO, _, SEQ) // w_ x_ y_ z_ That is a simple example that clearly shows what SEQ_FOR_EACH does without focusing on an overarching (complicated) problem to be solved. The library is littered with such examples.
In the reference section there are actually simple examples, but as far as I can see always only one example: I believe a systematical declination of the elementary cases would be very helpful (always staying simple, just handling simple sequences, but with example 5 to BOOST_PP_SEQ_FOR_EACH for example on would show a nested loop ("without added meaning", that is, no template stuff etc.)).
Why would I show a nested loop in an example for SEQ_FOR_EACH when SEQ_FOR_EACH cannot be called from the repeated macro invoked by SEQ_FOR_EACH?
What exactly to you want? This kind of stuff doesn't boil down to _simple_ use cases.
My understanding is that use cases are not very helpful. What we need is *understanding*.
Yes.
??? What does this want to say to us? What are "graphs" etc. ??? But shouldn't one know all these things!! And it's a very nice fundamental property of graphs, just *using* addition in two ways.
The difference between this mock example and the library's examples is significant. The templates (or anything else) that appear in the examples aren't what the examples are about. They are *output* of the example--not part of the functionality of the example.
sum_{i=1}^n i = n (n+1) /2.
Hope you get what I mean.
I get what you mean, but I think the simple examples are already there. I just don't think you invested much effort. It basically seems that if you saw "template", you just immediately ignored the example.
Rather, typical use cases are parts of overarching designs that are complex--which is precisely the reason that preprocessor metaprogramming enters the picture at all.
As I said, I don't want use cases, I just want to use the preprocessing library (for my own purposes).
Okay, but your example of using the library to generate an enum is a use-case. Sure, it's a simple one, but it is only simple because it is naive to the point of being destructive.
Furthermore, I have a responsibility (as does any library author) not to promote naive or poor uses of the library--which is almost always the case with examples that implement something simple.
So the documentation is the guard, meant to be as a kind of torture, and only those worthy souls which get through shall use the library?
This actually makes me smile. No, the documentation is not meant to be that, but the learning curve is inherently steep, and such learning requires the learner to expend a certain amount of effort--moreso than when learning something simpler. What I said was true. I have a responsibility not to promote naive or poor uses of the library. The user that is learning the library has not yet attained the degree of aptitude required to separate the way something is achieved (the point of the example) from what is being achieved (the medium of the example).
Speaking for myself, I never ever found any interest in those "real world scenario's" (for this library and others), since their "real world" is not mine. Of course, other might disagree, and having a healthy mix of simple and more complicated examples sounds like a good thing.
It is a good thing. What you seem to be suggesting, as far as examples are concerned, is somewhere in the middle. Something that I would call a trivial use case. But for the reasons that I already mentioned, such examples can be dangerous. Thus, I tried to either make them full-fledged and concrete or utterly simple and contrived. (There were a few times that I got bored of writing ultra-simple examples, so I made them a little more elaborate, but those are almost always in the interfaces targeted towards more advanced users.)
The library 1) shouldn't have to announce this problem
I meant the documentation
As did I.
and 2) already does announce this problem--it just doesn't do it hundreds of times in hundreds of different places. If you are writing code in the language defined by the preprocessor, you should *at least* know the basics of that language. The library is not the language, it is merely a systematic set of reusable constructs written in that language. The documentation's only responsibility is to document itself.
yes, but neither you are an angel (so that you could give a precise definition of the whole thing) nor am I (so that I could read it); instead we are human beings nd mst fnd r wy thrgh ntcptn (there are nicer examples; but hopefully this works a bit: just looking at it I hope you thought huh, but knowing that I left out the consonants it shouldn't be too hard; I have seen ^^^^^^^^^^ vowels?
a very nice series of examples in this style, where you were guided to read more and more strange stuff, and at the end you were presented with a plain English sentence --- and you couldn't read it anymore!).
You're right that documentation has to be more than just a formal definition. However, the point remains that it isn't the job of the documentation of a library written in language X to document language X.
Regarding examples in the documentation, I don't think there is a way to please you
perhaps there are ways: I believe actually in small steps, massaging something reasonable here and there so that it gets better. If some of the above thoughts would lead to some additions here and there (a few remarks, hints) in the documentation, that would mean progress.
Out of curiousity--do you believe there are certain things that have inherently steep learning curves? What you're suggesting here is basically that I flatten the learning curve by gradually introducing complexity. My argument is that there are times that complexity cannot be avoided or compartmentalized into small, easily understood, chunks. There are certain steps along the road that are transactional--all or nothing.
Regarding the technical implications:
I can understand the general point of view. But this general point of view might not necessarily be the point of a user (who in general will just occasionally use the library).
Perhaps it might be a good idea to add the above explanation somewhere to the documentation?
IOW, add implementation details to the documentation? As far as the reentry stuff is concerned, all that is necessary for users is to know is that macros cannot be used inside themselves (i.e. common knowledge). The places where that rule is broken (meaning that the library makes it look like it is breaking the rule) are explicitly noted. Users need not ever deal with state parameters--they only need to know that macros will not expand inside themselves. Thus, the macro passed to (e.g.) SEQ_FOR_EACH, that is invoked by SEQ_FOR_EACH, cannot use SEQ_FOR_EACH. Otherwise, the library makes it work automatically. You aren't required to use SEQ_FOR_EACH_R--or any interface that requires you to mess around with an algorithm state. Instead, you can ignore it completely.
So if I understand it correctly, you currently have three independent looping constructs BOOST_PP_FOR, BOOST_PP_REPEAT, BOOST_PP_WHILE, while all other loops are implemented in terms of these?
There are actually more than that, but the library makes it appear that there are only a few algorithmic states (e.g. 'r' for FOR, 'z' for REPEAT, and 'd' for WHILE), and therefore that there are only a few algorithms implemented this way. It used to be the case that, because SEQ_FOR_EACH was implemented in terms of FOR, you couldn't use SEQ_FOR_EACH inside FOR. You *had* to use SEQ_FOR_EACH_R. I removed that kind of limitation. You cannot use SEQ_FOR_EACH inside itself, but you need not deal with the algorithm states (like 'R') at all if you don't want to. This makes the library a great deal easier to use.
Then it would make sense to concentrate on the documentation of these three macros, giving more examples (showing also how to simulate nesting), and emphasising for e.g. FOR_EACH its special character.
It isn't FOR_EACH that is special, it is the ones that are explicitly designed to be reentrant that are special.
As far as I can see from the material provided on more powerful pp-libraries, the Boost pp-library won't be further developed, but at some time replaced?
Not necessarily. However, it is near what I can do with it either because of portability issues or efficiency concerns with various preprocessors. I have tried adding various things to it, but it is extremely difficult to keep the library stable on some very popular compilers. Things are improving. As far as broken preprocessors are concerned, there are only a few remaining roadblocks--the most significant being VC++. As far as efficiency is concerned, the major roadblock has always been EDG-based preprocessors (notoriously slow). I haven't seen proof of this, but *supposedly* the EDG preprocessor has been modified to use memory much more efficiently during macro expansion, and the previous way that it was using memory was *supposedly* the source of the major slowdown. (This is in an EDG version that is not yet available in any other compiler.) The speed issue is not as big an issue as the utter broken-ness of VC++ in particular. When and if VC++ and the other two broken preprocessors, Sun and IBM, fix their preprocessors, the library won't be replaced so much as heavily refactored. Some other refactoring is due anyway in preparation for variadics. As an immediate example, SEQ_FOR_EACH's interface needs to change from: SEQ_FOR_EACH(macro, data, seq) to: SEQ_FOR_EACH(macro, seq, data) in preparation for it becoming: SEQ_FOR_EACH(macro, seq, ...) Of course, when and if that happens--which depends on the vendors--a lot of the limitations that you've run into will be gone. SEQ_FOR_EACH would be just as reentrant as anything else. Regards, Paul Mensonides

On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction.
[snip] Completely & utterly agree. I only learnt how to use BOOST_PP by reading the source, and that can't be right.

Simon Carter <simon.carter@ttplabtech.com> writes:
On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction.
[snip]
Completely & utterly agree. I only learnt how to use BOOST_PP by reading the source, and that can't be right.
Knowing what the code to be generated means has nothing to do with knowing how to generate it. Did you really try to understand that tutorial, or did you just give up when it seemed to refer back to earlier material you hadn't read? -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Fri, Feb 17, 2006 at 01:44:52PM -0500, David Abrahams wrote:
Simon Carter <simon.carter@ttplabtech.com> writes:
On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction.
[snip]
Completely & utterly agree. I only learnt how to use BOOST_PP by reading the source, and that can't be right.
Knowing what the code to be generated means has nothing to do with knowing how to generate it. Did you really try to understand that tutorial, or did you just give up when it seemed to refer back to earlier material you hadn't read?
I can only speak about myself: Just seeing a reference to something obviously unrelated puts me off; I'm not a student anymore, and don't need holding by the hand and going through a "learning experience", first showing two wrong solutions, then two half-solution, then a 3/4-solution, and promising to see the whole solution somewhere later in the book --- where actually I'm not interested in the solution, but I need precise and complete technical information to the point *for something else*. I actually have read the MPL book (in parts), and I quickly browsed through that chapter on Boost PP, but it didn't speak about *my* problem (that problem of nesting loops; and related that problem of "recursion"). It is understandable, but nevertheless wrong in my opinion, that authors of documentation assume the reader will read the whole thing, from the begin to the end, with devotion (and time). Typically I have a concrete problem, and now just need the right tool to solve it. Thus I don't want to understand for example the whole Boost PP library, but I just want to create a double-nested loop with simple content (just as an example). Giving me general advice about good programming style and how to use the library in a good way, not to use it for the wrong purposes, might at some time interest me, but this should go into some special section (actually, it would be good to have this general discussion for a library; but as I said, I believe it should go to its own place, with appropriate links to it). This purpose, getting the right information "zack zack", is in my understanding best served by avoiding creating additional problems like introducing the preprocessor library by means of an example which first needs to understood by itself, and where furthermore understanding this use case won't help me a jota with understanding how to use the preprocessor library. A final word: I know it from myself (not regarding documentation, but articles) that we tend to make a big fuss about our products, how it relates to so much other things, and how much a proper understanding of our product will help so much the reader. So we tend to obscure that actually what we are doing is quite simple --- and can be used quite simply. Oliver

Oliver Kullmann <O.Kullmann@swansea.ac.uk> writes:
On Fri, Feb 17, 2006 at 01:44:52PM -0500, David Abrahams wrote:
Simon Carter <simon.carter@ttplabtech.com> writes:
On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
If I enter the current documentation, then first I get to the short introduction page with a link to the chapter from the mpl book, which is based on earlier chapter in the book, and thus cannot be used as an introduction.
[snip]
Completely & utterly agree. I only learnt how to use BOOST_PP by reading the source, and that can't be right.
Knowing what the code to be generated means has nothing to do with knowing how to generate it. Did you really try to understand that tutorial, or did you just give up when it seemed to refer back to earlier material you hadn't read?
I can only speak about myself: Just seeing a reference to something obviously unrelated puts me off;
Well, for a PP tutorial to be useful there needs to be something to generate -- which will necessarily be unrelated to the topic at hand: how to use the PP library. Why is the example used there worse than any other?
I'm not a student anymore, and don't need holding by the hand and going through a "learning experience", first showing two wrong solutions, then two half-solution, then a 3/4-solution, and promising to see the whole solution somewhere later in the book
Are you claiming that the PP tutorial does that? If so, please point out where, because AFAICT it only shows correct code. If not, my good man, what *are* you talking about, and what relevance does it have to our PP tutorial? And if you don't want a "learning experience" you probably shouldn't be reading tutorials in the first place.
--- where actually I'm not interested in the solution, but I need precise and complete technical information to the point *for something else*.
I don't know what you mean but it sounds like you're describing the PP reference docs; they do that job pretty well.
I actually have read the MPL book (in parts), and I quickly browsed through that chapter on Boost PP, but it didn't speak about *my* problem (that problem of nesting loops; and related that problem of "recursion").
Nope, that's why that "chapter" is actually an appendix. One could write a whole separate book on PP metaprogramming, and I hope someday Paul will do that.
It is understandable, but nevertheless wrong in my opinion, that authors of documentation assume the reader will read the whole thing, from the begin to the end, with devotion (and time). Typically I have a concrete problem, and now just need the right tool to solve it. Thus I don't want to understand for example the whole Boost PP library, but I just want to create a double-nested loop with simple content (just as an example). Giving me general advice about good programming style and how to use the library in a good way, not to use it for the wrong purposes, might at some time interest me, but this should go into some special section (actually, it would be good to have this general discussion for a library; but as I said, I believe it should go to its own place, with appropriate links to it).
Several people (including me) have had this discussion with Paul many times. Although I disagree with him about this, I respect Paul's point of view in the matter. I suggest, unless you can find some new angle to the argument, that nothing new will come of repeating what others before you have tried and failed at.
This purpose, getting the right information "zack zack", is in my understanding best served by avoiding creating additional problems like introducing the preprocessor library by means of an example which first needs to understood by itself,
Which example is that?
and where furthermore understanding this use case won't help me a jota with understanding how to use the preprocessor library.
I can't imagine what sort of example you would say didn't need to be "understood by itself" if you feel you had to understand the one used in the tutorial appendix by itself. What does "by itself" mean anyway?
A final word: I know it from myself (not regarding documentation, but articles) that we tend to make a big fuss about our products, how it relates to so much other things, and how much a proper understanding of our product will help so much the reader. So we tend to obscure that actually what we are doing is quite simple --- and can be used quite simply.
For most people, the PP lib gives the illusion of being quite simple and familiar, but then there are issues like this recursion thing that you ran into, which don't map neatly onto the ordinary programming paradigms we're used to. I think those phenomena are harder to explain simply. -- Dave Abrahams Boost Consulting www.boost-consulting.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams
It is understandable, but nevertheless wrong in my opinion, that authors of documentation assume the reader will read the whole thing, from the begin to the end, with devotion (and time). Typically I have a concrete problem, and now just need the right tool to solve it. Thus I don't want to understand for example the whole Boost PP library, but I just want to create a double-nested loop with simple content (just as an example). Giving me general advice about good programming style and how to use the library in a good way, not to use it for the wrong purposes, might at some time interest me, but this should go into some special section (actually, it would be good to have this general discussion for a library; but as I said, I believe it should go to its own place, with appropriate links to it).
Several people (including me) have had this discussion with Paul many times. Although I disagree with him about this, I respect Paul's point of view in the matter. I suggest, unless you can find some new angle to the argument, that nothing new will come of repeating what others before you have tried and failed at.
It isn't that I don't believe that this is a good thing; it is that I don't believe that it is _possible_ without much more writing than those who ask for it believe. It would, for example, make the documentation for the pp-lib much larger than all of the rest of the documentation for Boost combined. I don't think that it is legitimate for users to expect to be able to have all their problems solved by the library without putting some effort into learning the language in which it is written. The other side of the coin is that I don't think it is legitimate for users to use some parts of the library without putting in some effort to learn the conventions used by the library (the primary one being how it handles recursion from a client interface point of view). I don't think, for example, that it is a long jump from "I tried to use SEQ_FOR_EACH inside itself, and it didn't work" to the topical article about reentrancy. Most of the time I get these complaints, it is because the documentation doesn't start with a particular user's problem and provide a direct (or near direct) solution. Even attempting to do that is like trying to produce a comprehensive list of uses for looping constructs in programming language. Then I get complaints that the examples are too contrived. In essence, they don't produce a reusable result (i.e. a use case). Then I get complaints that the examples are too complex. I just don't think there is any way that I can win, and I get discouraged every time I try because nothing is ever good enough. Preprocessor metaprogramming has an inherently steep learning curve. I can't just make that disappear by using better words. No matter what I do, there will be someone who will complain. The root of such complaints is always (or nearly always) that I don't somehow magically make the learning curve disappear. That I don't provide some kind of tutorial that takes a user from absolutely basic concepts to advanced idioms--only with small progressive steps. I don't think that's possible, and if it is, I certainly don't know _how_ to do it (especially not without 500+ pages of prose).
A final word: I know it from myself (not regarding documentation, but articles) that we tend to make a big fuss about our products, how it relates to so much other things, and how much a proper understanding of our product will help so much the reader. So we tend to obscure that actually what we are doing is quite simple --- and can be used quite simply.
For most people, the PP lib gives the illusion of being quite simple and familiar, but then there are issues like this recursion thing that you ran into, which don't map neatly onto the ordinary programming paradigms we're used to. I think those phenomena are harder to explain simply.
That's true. However... 1) The lack of recursion (or infinite replacement) in macro expansion is a pretty fundamental concept that user's should already know *long* before they start throwing preprocessor metaprogramming at a problem. If the user doesn't know that, than the user *probably* (and I'm not pointing a finger at you Oliver) doesn't know the rest of C++ (including modern C++ idioms) well enough to really know if preprocessor metaprogramming is necessary or better than the alternatives. Preprocessor metaprogramming is used mostly (by far) by people that have already run smack into situations where it was necessary after exhausting all other possibilities (i.e. they couldn't do it any other "good" way). That is not to say that the use of preprocessor metaprogramming should always be a last resort. But, like anything else, it should be used only when it is "better" than all other alternatives. The problem is that it takes a significant degree of familiarity with other alternatives (like templates and template metaprogramming) to make that comparison--which is why the pp-lib docs don't shy away from templates. 2) IMO, the "Reentrancy" article in the documentation does a pretty good job of explaining the library's conventions in this regard. Regards, Paul Mensonides

Oliver Kullmann wrote:
Given two sequences A and B (according to the Boost Preprocessor datastructure definition), I want to create a nested switch statement, the outer switch running through the constants in A, the inner loops running through B.
If you don't need to do it all within a macro, use file iteration: The generated code is much better to debug, human-readable and it's generated faster. Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger
Oliver Kullmann wrote:
Given two sequences A and B (according to the Boost Preprocessor datastructure definition), I want to create a nested switch statement, the outer switch running through the constants in A, the inner loops running through B.
If you don't need to do it all within a macro, use file iteration: The generated code is much better to debug, human-readable and it's generated faster.
More accurately.... It's consistently generated fast. Using macro expansion alone can be faster on some compilers and much slower on others. Tobias is correct, though, that the resulting code is much easier to debug without running the result through a formatter. The downside is that the scaffolding tends to take over a file. Regards, Paul Mensonides

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger Sent: Thursday, February 16, 2006 3:16 PM To: boost@lists.boost.org Subject: Re: [boost] Preprocessor: Nested loops
Paul Mensonides wrote:
Using macro expansion alone can be faster on some compilers
...such as?
The point is that file iteration is not *always* faster. The point at which file iteration becomes faster (typically) varies from preprocessor to preprocessor. E.g. #include <boost/preprocessor/arithmetic/inc.hpp> #include <boost/preprocessor/cat.hpp> #include <boost/preprocessor/repetition/repeat.hpp> #define M1(z, n, _) \ BOOST_PP_REPEAT(BOOST_PP_INC(n), M2, ~) \ /**/ #define M2(z, n, _) n BOOST_PP_REPEAT(100, M1, ~) Is often faster than the "equivalent" with file iteration. On gcc, for example, it is more than 10 times faster than if both inner and outer loops are done with file iteration. If the outer loop is done with file iteration, but the inner with REPEAT, it is slightly slower than just using macros. If the count is 200 instead of 100, than the outer file iteration + inner REPEAT is faster. And so on. It isn't simply that file-iteration is faster. What it ultimately comes down is how disabling contexts are implemented and how much effect they have on performance when there are a lot of them. The depth of file iteration, OTOH, doesn't make a significant difference (if it makes any). E.g. an inner iteration is just as fast as an outer iteration, but iteration itself carries its own overhead--it uses macros itself + lexical analysis and directive handling by the preprocessor. There is usually a point at which the complexity of active disabling contexts of macro expansion outweights the (comparatively high) constant factor of file iteration. Where that point is depends heavily on how a particular preprocessor's performance is affected by disabling contexts. So, macro expansion can be faster than file iteration, but file iteration is more consistently "fast enough" across compilers--because its overheads are relatively constant (i.e. not a function of "depth"). Regards, Paul Mensonides
participants (6)
-
David Abrahams
-
Hartmut Kaiser
-
Oliver Kullmann
-
Paul Mensonides
-
Simon Carter
-
Tobias Schwinger