[preprocessor] Patch for stringize.hpp

The file boost/preprocessor/stringize.hpp contains a trivial bug, for which I have attached a proposed patch. The problem is that there are three incorrect usages of the ## operator. The proposed fix is to simply remove the ## operators, as they are unnecessary. The ## operator joins two tokens and makes a new single token. But all three usages of ## in this file attempt to join an identifier with '(', which does not result in a single token. The action taken by most C/C++ translators in this situation is to rescan the invalid text back into two tokens, which makes the incorrect ## usage innocuous in most situations. However, the behavior is undefined by the C++ and C standards. With our code coverage tool, this improper ## usage results in a warning. *** stringize.hpp.orig Fri Feb 17 10:16:41 2006 --- stringize.hpp Fri Feb 17 10:18:36 2006 *************** *** 20,30 **** # # if BOOST_PP_CONFIG_FLAGS() & BOOST_PP_CONFIG_MSVC() # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_A((text)) ! # define BOOST_PP_STRINGIZE_A(arg) BOOST_PP_STRINGIZE_B ## (arg) ! # define BOOST_PP_STRINGIZE_B(arg) BOOST_PP_STRINGIZE_I ## arg # elif BOOST_PP_CONFIG_FLAGS() & BOOST_PP_CONFIG_MWCC() # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_OO((text)) ! # define BOOST_PP_STRINGIZE_OO(par) BOOST_PP_STRINGIZE_I ## par # else # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_I(text) # endif --- 20,30 ---- # # if BOOST_PP_CONFIG_FLAGS() & BOOST_PP_CONFIG_MSVC() # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_A((text)) ! # define BOOST_PP_STRINGIZE_A(arg) BOOST_PP_STRINGIZE_B(arg) ! # define BOOST_PP_STRINGIZE_B(arg) BOOST_PP_STRINGIZE_I arg # elif BOOST_PP_CONFIG_FLAGS() & BOOST_PP_CONFIG_MWCC() # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_OO((text)) ! # define BOOST_PP_STRINGIZE_OO(par) BOOST_PP_STRINGIZE_I par # else # define BOOST_PP_STRINGIZE(text) BOOST_PP_STRINGIZE_I(text) # endif

"Steve Cornett" <boost@bullseye.com> writes:
The file boost/preprocessor/stringize.hpp contains a trivial bug, for which I have attached a proposed patch. The problem is that there are three incorrect usages of the ## operator. The proposed fix is to simply remove the ## operators, as they are unnecessary.
The ## operator joins two tokens and makes a new single token. But all three usages of ## in this file attempt to join an identifier with '(', which does not result in a single token. The action taken by most C/C++ translators in this situation is to rescan the invalid text back into two tokens, which makes the incorrect ## usage innocuous in most situations. However, the behavior is undefined by the C++ and C standards. With our code coverage tool, this improper ## usage results in a warning.
Does your tool account for illegal code that may be necessary to work around buggy C++ [preprocessor] implementations? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Yes, we have made our code coverage bug-compatible with most issues in many compilers. The only compiler bugs that we do not replicate are either too convoluted to reverse engineer, or are just plain crazy.

"Steve Cornett" <boost@bullseye.com> writes:
Yes, we have made our code coverage bug-compatible with most issues in many compilers. The only compiler bugs that we do not replicate are either too convoluted to reverse engineer, or are just plain crazy.
Then it shouldn't be complaining about the code you're referring to... unless in your opinion it falls into the "just plain crazy" category. -- Dave Abrahams Boost Consulting www.boost-consulting.com

There is a different bug in Microsoft C++, that is too difficult for us to reverse engineer. The warning about improper ## usage helps us diagnose when that other problem occurs. Without our warning, the situation is very difficult for us to diagnose. We have found this problem to be common in code written for Microsoft C++, so the warning saves us a lot of time. I did not want to complicate the issue with this bug report because I was not expecting any resistance. The ## usage in stringize.hpp is wrong, and there is no disadvantage to fixing it. But if you want all the details, see www.bullseye.com/help/trouble_microsoftPaste.html. The problem in stringize.hpp is described in CAUSE #1. The real reason we need to issue the warning is CAUSE #2.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Steve Cornett
The file boost/preprocessor/stringize.hpp contains a trivial bug, for which I have attached a proposed patch. The problem is that there are three incorrect usages of the ## operator. The proposed fix is to simply remove the ## operators, as they are unnecessary.
The ## operator joins two tokens and makes a new single token. But all three usages of ## in this file attempt to join an identifier with '(', which does not result in a single token. The action taken by most C/C++ translators in this situation is to rescan the invalid text back into two tokens, which makes the incorrect ## usage innocuous in most situations. However, the behavior is undefined by the C++ and C standards. With our code coverage tool, this improper ## usage results in a warning.
Similar things are done all over the library, not just in 'stringize.hpp'. I'm well aware that behavior of such use is undefined. However, they are necessary workarounds for preprocessors that are broken in other ways. You will never see it occur in the entire library if the library is preprocessed in its strict configuration, for example. Suffice to say, they cannot be removed. Regards, Paul Mensonides

Perhaps I was not clear, plus the ## operator is a trickster. Let's try this: you can prove to yourself that the ## operators are having no effect (and are therefore unnecessary) with this sample program below. Compile and run it both with and without the patch. You will see no change in the behavior. #include <stdio.h> #include "stringize.hpp" int main() { puts(BOOST_PP_STRINGIZE(x)); return 0; } So beyond being unnecessary, let me again attempt to convince you that the ## operators are also wrong and therefore undesirable. First let me explain my motivation. Our tool issues a warning when our end user compiles this file with our code coverage tool together with Microsoft C++. The reason for the warning is a long story, but we need it. Other than the warning, our behavior with the ## operator is exactly the same as Microsoft C++. Sometimes end users turn on the option that says all warnings are errors, and then their build fails. So we don't want to bother the end user with something they have no control over. Anyway, let's look closely at this line: # define BOOST_PP_STRINGIZE_A(arg) BOOST_PP_STRINGIZE_B ## (arg) In BOOST_PP_STRINGIZE_A there is an attempt to join "BOOST_PP_STRINGIZE_B" with "(" that would result in a single token "BOOST_PP_STRINGIZE_B(". That does not make sense, this is clearly two separate tokens, an identifier and a '('. The ## is not really accomplishing anything here. The only effect of ## here is to make this macro behavior undefined by the C and C++ standards. The problem is just a little harder to see in BOOST_PP_STRINGIZE_B: # define BOOST_PP_STRINGIZE_B(arg) BOOST_PP_STRINGIZE_I ## arg In this case, the argument "arg" is always passed in from BOOST_PP_STRINGIZE and it always begins with "(". But after that, it is the same story as before, you cannot join an identifier with a '('. So I hope you see more clearly what is happening now. By the way, when I compile the whole boost library with our tool, I see the warning about ## producing invalid results only here in stringize.hpp. Regards, Steve Cornett

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Steve Cornett
Perhaps I was not clear, plus the ## operator is a trickster. Let's try this: you can prove to yourself that the ## operators are having no effect (and are therefore unnecessary) with this sample program below. Compile and run it both with and without the patch. You will see no change in the behavior.
#include <stdio.h> #include "stringize.hpp" int main() { puts(BOOST_PP_STRINGIZE(x)); return 0; }
Perhaps *I* was not clear. This doesn't prove anything. You don't understand what the use of ## is a workaround in that code. It isn't a workaround for any kind of stringizing issue. It is a workaround for Microsoft's utterly broken macro expansion routine that does expansion at (seemingly) arbitrary times (AFAICT, it is supposed to be some sort of optimization). The purpose of the use of ## is to _induce_ MS's preprocessor to finish expanding the argument. In a simple example like the above, you won't see the difference. If the argument to STRINGIZE is a complex macro expansion, you might (and you might not).
So beyond being unnecessary, let me again attempt to convince you that the ## operators are also wrong and therefore undesirable.
As I said before, I know that it results in undefined behavior. It is a desireable, nonetheless, as a workaround for other things. If I didn't need the workaround, it wouldn't be there.
First let me explain my motivation. Our tool issues a warning when our end user compiles this file with our code coverage tool together with Microsoft C++. The reason for the warning is a long story, but we need it.
And the reason for the workaround is a long story--but we need it.
Other than the warning, our behavior with the ## operator is exactly the same as Microsoft C++. Sometimes end users turn on the option that says all warnings are errors, and then their build fails. So we don't want to bother the end user with something they have no control over.
It cannot be helped. Those token-pasting operators must be their in the configuration for Microsoft and the configuration for Metrowerks prior to version 9.
Anyway, let's look closely at this line:
# define BOOST_PP_STRINGIZE_A(arg) BOOST_PP_STRINGIZE_B ## (arg)
In BOOST_PP_STRINGIZE_A there is an attempt to join "BOOST_PP_STRINGIZE_B" with "(" that would result in a single token "BOOST_PP_STRINGIZE_B(". That does not make sense, this is clearly two separate tokens, an identifier and a '('. The ## is not really accomplishing anything here. The only effect of ## here is to make this macro behavior undefined by the C and C++ standards.
No. The effect is that it induces the MS preprocessor to (usually) finish the expansion of 'arg' (which it should have done long before this point). You don't need to point out how it is undefined behavior--I already know--really.
The problem is just a little harder to see in BOOST_PP_STRINGIZE_B:
# define BOOST_PP_STRINGIZE_B(arg) BOOST_PP_STRINGIZE_I ## arg
In this case, the argument "arg" is always passed in from BOOST_PP_STRINGIZE and it always begins with "(". But after that, it is the same story as before, you cannot join an identifier with a '('.
Again. I know.
So I hope you see more clearly what is happening now.
I saw clearly (and so did Dave) exactly what you were talking about in your first post. The problem with the VC++ preprocessor is far deeper than the bug you referenced via the url that you posted. All kinds of things are broken--the most significant of which is that the order expansion is all screwed up. Let me give you a concrete example: #define IM p, q #define A(im) B(im) #define B(x, y) x + y A(IM) This expansion should result in p + q. The _single_ argument to A should be expanded on entry to A, the 'im' in the replacement list of A should be substituted with the result of that expansion ( p, q ), _then_ B should be invoked such as B(p, q). Instead, MS's preprocessor doesn't expand 'im' where it should, gives a warning about "not enough actual parameters for macro 'B'", and results in: p, q + Now, if the workaround is present: #define IM p, q #define A(im) B((im)) #define B(par) C ## par #define C(x, y) x + y A(IM) // p + q The concatenation, while undefined behavior, induces the expansion of 'im' before it is passed to C, and yields the correct results. Note that there is no other way (AFAIK) to induce the expansion. Extra macros don't work: #define IM p, q #define A(im) B(im) #define B(im) C(im) #define C(im) D(im) #define D(im) E(im) #define E(x, y) x + y A(IM) No matter how many macros it goes through, the preprocessor will not expand the argument until it _thinks_ that it needs to. The problem is that where it _thinks_ that it needs to does not include all of the situations where actually _does_ need to (for this and a variety of other circumstances). The use of the token-pasting operator makes it _think_ that it needs to. This is only a trivial example that shows only one of the many ways that the flawed algorithm can cause problems. If I don't do it, the library becomes horribly unstable on VC++ and Metrowerks (< v9)--to the point of being nearly unusable for anything even close to complex.
By the way, when I compile the whole boost library with our tool, I see the warning about ## producing invalid results only here in stringize.hpp.
It is used in so many places it is ridiculous (e.g. 'cat.hpp', 'control/iif.hpp', etc., etc.). It is not even *close* to the only place that it is used. In any case, I'm sorry, but those workarounds cannot be removed. I don't know what the means for your tool--though I understand the situation that you're dealing with. Perhaps you'll have to cause the tool to recognize Boost headers and disable the warnings. But, until MS fixes their preprocessor, I absolutely *must* have those inducers in place. Regards, Paul Mensonides

Thanks for the detailed answer. I now see more of the overall purpose of the macros in stringize.hpp, but your 2nd example does not in fact require the ## operator to get the behavior you want with Microsoft C++. If you delete the ## out of macro B, leaving everything else in place, there is no difference in the behavior; you still get p + q. I tried this with Microsoft Visual C++ 5 (v11) and Microsoft Visual C++ 2005 (v14): #define IM p, q #define A(im) B((im)) #define B(par) C par // no ## here #define C(x, y) x + y A(IM) // p + q So despite your impressive knowledge of Microsoft's preprocessor bugs, I'm not really convinced the ## is needed in stringize.hpp, at least with Microsoft C++. I see the similar ## usage in cat.hpp and iif.hpp, but that code seems limited to Metrowerks. I'm really only interested in Microsoft C++. In stringize.hpp was the only place ## was used to produce an invalid token when I built boost with Microsoft C++ ("bjam -sTOOLS=vc-8_0"). There may be others places, but those wheels didn't squeak for me.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Steve Cornett
Thanks for the detailed answer. I now see more of the overall purpose of the macros in stringize.hpp, but your 2nd example does not in fact require the ## operator to get the behavior you want with Microsoft C++. If you delete the ## out of macro B, leaving everything else in place, there is no difference in the behavior; you still get p + q. I tried this with Microsoft Visual C++ 5 (v11) and Microsoft Visual C++ 2005 (v14):
#define IM p, q #define A(im) B((im)) #define B(par) C par // no ## here #define C(x, y) x + y A(IM) // p + q
So despite your impressive knowledge of Microsoft's preprocessor bugs, I'm not really convinced the ## is needed in stringize.hpp, at least with Microsoft C++.
This happens to work here because of a different bug. Consider: #define EMPTY #define A() 123 A EMPTY () // expands to A (), as it should. #define B() A EMPTY () B() // expands to 123, as it shouldn't. What is happening here is that the MS preprocessor is doing an extra scan for expansion when it shouldn't. The first scan for expansion yields: A () But then an extra scan (that shouldn't be there) is being applied that causes A() to expand. The same thing is happening above: it is not expanding 'im' when it should, but instead picks up that expansion during the first scan, resulting in: C (p, q) Then, it is going back and expanding C during the extra scan (that shouldn't be there). So, it is working in this trivial example, for the wrong reason, and this other bug doesn't undo the effect (completely) of the first bug in more complicated scenarios. What happens is that you get a build up of things that aren't yet done that are dependent on each other. Most of the time the workarounds in the library pick all of these up. The problem is that there is no way (AFAIK) to force the preprocessor to expand things when it should. At most, it is coaxing it do it.
I see the similar ## usage in cat.hpp and iif.hpp, but that code seems limited to Metrowerks. I'm really only interested in Microsoft C++. In stringize.hpp was the only place ## was used to produce an invalid token when I built boost with Microsoft C++ ("bjam -sTOOLS=vc-8_0"). There may be others places, but those wheels didn't squeak for me.
The VC++ configuration doesn't use it to the degree that I was thinking it did (confusing the workarounds used for MS and for MW). However, if you look at the revision history of "stringize.hpp", it was modified to fix a specific instance of the problem I've been referring to: BOOST_PP_STRINGIZE( ( BOOST_PP_SEQ_ENUM((x)(y)(z)) ) ) If STRINGIZE is defined in the canonical way (the way that should work): #define STRINGIZE(x) PRIMITIVE_STRINGIZE(x) #define PRIMITIVE_STRINGIZE(x) #x STRINGIZE( ( BOOST_PP_SEQ_ENUM((x)(y)(z)) ) ) ...this fails--which was the reason for the workaround at this particular point. Modifying STRINGIZE to: #define STRINGIZE(x) STRINGIZE_I((x)) #define STRINGIZE_I(par) STRINGIZE_II par #define STRINGIZE_II(x) #x This works for this particular example (which--if I remember correctly--was the motivating reason for the change). I'll change "stringize.hpp", but if people start having problems, I'll have to roll it back. That may or may not happen (the pp-lib is not 100% stable on VC++ anyway because of these exact issues) and stringizing is not all that common in preprocessor metaprogramming. You have to understand that these kinds of workarounds are not simply a local application with a local effect as you seem to think. I.e. you can't tell if it is effective by just testing STRINGIZE in isolation. It has to be tested in a much more combinatorial way. For example, these trivial scenarios make it appear that a simple uniform application of the workaround (with or without ##) will cause everything to work correctly. That, however and unfortunately, is not the case. An example of the type of build up that occurs is (trying to keep it simple): #define E() #define A() B E E E()()()() #define B() 123 A() // B E ()() Note that we're talking about VC++ here. The result in this example *should* be: B E E()()() ...but the extra scan that VC++ is applying is causing the extra EMPTY expansion. However, that's just an aside, not the point. The EMPTY's represent build up accrues over the course of a complex macro expansion. With some effort, we can induce it to expand B in this particular case: #define DO(x) DO_I((x)) #define DO_I(par) DO_II par #define DO_II(x) x DO( A() ) // 123 However, the amount of build up is variable--depending on how an argument is constructed. E.g. #define C() B E E E E()()()()() DO(C()) // B () Now the extra coaxing applied by DO is not enough. Technically speaking, it shouldn't be enough, but neither should the build up exist. The point of all this is that you can't take a trivial example and say that it proves that it works. It doesn't prove anything--you have to show stability with large-scale examples. (BTW, building Boost is not an effective test of whether the pp-lib works. The libraries that need building don't use even close to all of the library, so it isn't an adequate test. You should at least include the header-only libraries.) Regards, Paul Mensonides

"Paul Mensonides" <pmenso57@comcast.net> writes:
(BTW, building Boost is not an effective test of whether the pp-lib works. The libraries that need building don't use even close to all of the library, so it isn't an adequate test. You should at least include the header-only libraries.)
Wouldn't running all the Boost tests, making the change, running them again, and looking for differences do a pretty good job of it? -- Dave Abrahams Boost Consulting www.boost-consulting.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams
(BTW, building Boost is not an effective test of whether the pp-lib works. The libraries that need building don't use even close to all of the library, so it isn't an adequate test. You should at least include the header-only libraries.)
Wouldn't running all the Boost tests, making the change, running them again, and looking for differences do a pretty good job of it?
In some ways, yes. Running the pp-lib regressions alone is insufficient--because I can't test all possible combinations on VC++. Running the entire set of Boost tests makes it alot more likely to pick up a flaw. Every once in a while, a user has a problem with some combination of primitives (such as STRINGIZE + SEQ_ENUM), which basically amounts to a place where I haven't eliminated or accounted for the build up in time (i.e. before the structural result is needed), so then I have to figure out a way to do that. So, the regression tests may pass, and that is an indication of usability, but it is not an indication of complete stability (on VC++ and Metrowerks < 9). The bugs in other preprocessors (even the significant ones) have a local effect--i.e. if you do XYZ, it will always work or always fail. So, running all of the regression test might flush out a "combinatorial" failure related to this change (i.e. substituting one workaround for another in STRINGIZE), and it might not, it may also be that the alternate workaround happens to work just as well as the previous one. Regards, Paul Mensonides
participants (3)
-
David Abrahams
-
Paul Mensonides
-
Steve Cornett