Re: [boost] Macros at namespace scope: trailing semicolon?

19 Jul 2005

      ...
-----Original Message-----
From: boost-bounces@lists.boost.org 
[mailto:boost-bounces@lists.boost.org] On Behalf Of Chris Uzdavinis
...
...
Not if editors can see through macro expansions, but I was 
referring 
to re-indenting by hand.
True, but that would require an editor to also be a project 
manager, or at least to be able to parse and understand 
(correctly) a makefile.
Basically it would have to be given a set of #include paths and a set of
predefined macros.  For this purpose, it needs nothing more than that, but also
nothing less.
...
Otherwise, it wouldn't know what 
preprocessor macros are defined or not from the command line, 
wouldn't know include paths, etc.  That is hard enough to 
implement, but even harder to get fast enough to be acceptible.
I don't think that last is necessarily true.  Preprocessors can be quite fast,
even with complex metaprogramming somewhere in an included header.  It also
doesn't have to do it constantly.
...
...
I don't think that is an onerous task to split a line that 
is too long 
manually.
Well, I was thinking that if a variable name gets longer, it 
would likely be changed in all places it's used, and for that 
a search/replace may be applied.  Then a followup indentation 
might be more convenient.
I agree that it is more convenient (provided that it does what you want), but I
don't think its terribly hard to format code by hand as you go along--even in
the face of maintenance like you're referring to above.
...
True, but re-indenting is a convenient way to get all this 
done at once en masse, if you so prefer.  (Well, it seems you 
don't prefer, but I do.)
I don't prefer it because I haven't seen a editor that can do a good enough job
with code that I write--lack of macro expansion being one of the major reasons.
...
...
That's true, but you might be reading it to verify what the 
generator 
is doing.
You're right.  I've written a few code generators and spent a 
lot of time making them format generated code cleanly, for 
precisely that reason.  It also makes it easier to figure out 
precisely what's going on or how to use the code if you 
sometimes take a peek.
Yes, but in this case, the formatting doesn't have to be perfect.
...
...
...
By using the word "logically", I meant that "X becomes Y" 
can quite 
reasonably be described as being Y.
I understand that, and that's what I'm disagreeing with.  
There is a 
different degree of indirection that is important.
Ok, I'm willing to listen and try to be open minded.  What is 
it that I'm missing?  Please consider this simple example:
void log(char const * msg, char const * file, int lineno);
#define LOG(X) log((X), __FILE, __LINE__)
I have a few questions for you because I am kind of stubborn 
but if I feel convinced I'm wrong I'm willing to change.
1)  Do you think that the LOG() macro should contain a trailing
    semicolon?
No, unless it is supposed to be expand to an entire statement.  That could well
be the case, however, because of sequence points.
...
2) I'm pretty sure you'll say #1, but which of the following examples
   do you prefer?
int foo()
   {
     LOG("NO SEMICOLON HERE?")          // 1
     stuff();
LOG("OR does this seem better?");  // 2
     more_stuff();
   }
The first one, because I see the second one having an unnatural syntax that is
alien to the preprocessor.  I.e. it is an attempted interleaving between the
syntax of the preprocessor and the syntax of the underlying language.

More fundamentally, I see preprocessing as a top-to-bottom execution path that
writes the source code of a program.  I don't see something like...

int foo()
{
   f();
   MACRO(); // with or without the semicolon
            // --doesn't matter for the point I'm making
   g();
}

...as calling 'f', then doing whatever 'MACRO' does, then calling 'g'.  I see
the preprocessor effect as a separate transformation that includes *all* of the
source code, and results in *all* of the transformed source code.  In other
words, in the above, I see all of the above translated to...

int foo()
{
    f();
    h(); // or whatever
    g();
}

...and then I look at what it does at runtime (or compile-time after
preprocessing).  As an example, this is a bit of code that I wrote recently for
an arbitrary-precision natural class I needed.  It is using a Duff's Device
(slightly modified because there is no zero-iterations case):

template<class T, T radix> bool operator<(natural<T, radix> x, natural<T, radix>
y) {
    unsigned long a = x.size(), b = y.size();
    if (a == b) {
        typename natural<T, radix>::const_iterator i = x.begin(), j = y.begin();
        unsigned long n = (a + 7) / 8;
        #define _ \
            if (*i != *j) { \
                return *i < *j; \
            } \
            ++i, ++j; \
            /**/
        switch (a % 8) {
            case 0:
                do {
                    _
                    case 7: _
                    case 6: _
                    case 5: _
                    case 4: _
                    case 3: _
                    case 2: _
                    case 1: _
                } while (--n);
        }
        #undef _
        return false;
    }
    else {
        return a < b;
    }
}

Now, when I look at this, I don't see the macro '_' as a statement (or a
placeholder for a statement), I see it as a substitution for a common token
pattern to be placed where ever I need to put it.  I.e. its a shorthand for me
writing it eight times and making the Duff's Device less explicit.  My
perspective is entirely that this substitution takes place before the underlying
syntax even exists, which is the only way that the perception can be the
generalized across all possible macros.  Say that I decide that I want to
generalize the entire Duff's Device itself--it is, after all, a pattern of code,
and then apply it here.  In that case, macros would write the entire thing, and
it gets further and further away from something that can be viewed as a
syntactic entity of the underlying language.  The thing is, I don't see a
difference between the first way and the second way as far as macros are
concerned.  I.e. macros are pure code generators, nothing more and nothing less,
from the simplest cases to the most complex.

This may be a bad example, and I'm probably not explaining my point of view
clearly at all.  I guess my point of view fundamentally revolves around my
conception of macros being totally distinct from the underlying language, in
terms of how they work, what that work is, and _when_ they do it.
...
3) Do you see harm if the macro does not contain the trailing
    semicolon?  If so, what harm is it?  Specifically, how does usage
    #2 above re-enforce bad understanding of macros?  (Or am I
    misconstruing the nature of your objection?)
It isn't a misunderstanding of macros per se, it's a perspective difference.
The unit of work that macros perform is some code generation.  The unit of work
that a function call does is a runtime effect.  (Likewise, the unit of work that
a structure declaration does is on, for example, the symbol table.)  I.e. the
difference between 'max' and 'MAX' isn't multiple evaluation, it's that 'max'
returns the greater of its arguments while 'MAX' returns some parametized code.
I don't think there's a misunderstanding that the macro is doing a token
substitution, but I think that many people view them as "doing the same thing
for all intensive purposes".  This merging between different types of work is
what leads to subtle errors.  Obviously, 'max' is a old-hat example, but is one
that should never have been an issue because people never should have mentally
merged two distinct types of work.  Most other macro-related problems stem from
this kind of blurring.  So, the harm that it does is to purposely help maintain
that illusion.
...
4) What is the important degree of indirection that I would be missing
   by talking about the usage of the LOG macro as a call to the log
   function?  I understand that I could call it "a macro that expands
   to a call to the log function" but I don't think that anyone would
   be more enlightened by that.  Other than being more detailed about
   what's really happening, it's not clear to me what important
   mistake is being made that bothers you.
The mistake isn't necessarily in this particular example, its in the mentality
that it fosters which leads to problems in less simplistic scenarios.
...
...
...
When you throw
extra parenthesis after it and do other things to change how the 
preprocessor expands macros, then it blurs things up.
But for the typical, standard usages of macros, this 
indistinction is 
not a problem.
I totally disagree.  In fact, (if you're referring to 
something like 
full scale preprocessor metaprogramming) I'd go so far as 
to say the 
opposite.  In that case, any indistinction is far less of a problem 
than it is with "standard usages of macros".  Preprocessor 
metaprogramming is so obviously different than normal code that it 
hardly matters.  It is the normal, relatively simple, uses that can 
cause the most damage.
Could you apply this to my LOG macro above, to put it into a 
concrete example of what's so bad?
As above, it isn't this in isolation.  If the effect was only on how this macro
is defined and how it is used correctly, it hardly matters.  It is more like the
broader scenario of how macros in general are defined and how they are used
correctly.  I.e. there is a universal way that macros behave that is a property
of all macros, rather than "this macro has these properties" and "this macro has
these properties".  When you blur the distinction between macros and underlying
language elements, you start getting issues like "this macro is like a function
call except x, y, and z".  It is starting with the wrong thing and adding
caveats to it, rather than starting with the right thing and adding allowances
to it.

Regards,
Paul Mensonides