
On 4/25/2012 5:01 AM, Paul Mensonides wrote:
However, it is also not *nearly* as complex as the core language--and therefore not nearly as difficult to do correctly. In fact, in the last 24 hours, I implemented a macro expansion algorithm that, AFAICT, works flawlessly--even in all the corner cases. (Grain of salt... I haven't tested it very well, however, and it doesn't have a lexer or parser attached to it, so it is not a "preprocessor"--it just takes a sequence of preprocessing tokens and a set of existing macro definitions, and does the macro replacement therein.)
Implementation of the above attached--probably lots of room for improvement and optimization. Apologies for the mega function (in particular--I didn't feel like refactoring) and the (non-portable) shell color-coding (I wanted to see blue paint). I built it with g++ 4.7.0, but I believe 4.6+ should also work. $ g++ -std=c++11 -I $CHAOS_ROOT 1.cpp If this wasn't just a toy, I'd probably replace the symbol table with a trie (radix tree) populated during lexical analysis--especially with how many common prefixes you get in C++. Also, I'd avoid the various string comparisons and just compare iterators--which would faster and improve locality. Also, for output to tty (i.e. preprocess only) there needs to be state machine to judiciously insert whitespace to prevent erroneous re-tokenization by later tool. Such generated whitespace can no longer affect the semantics of the program (whereas before, it can, thanks to stringizing). As one can see from this code, a recursive call to the macro replacement scanner only occurs in one spot: when preparing an actual argument for a formal argument that is used in the replacement "in the open". Aside from that, the blue paint, and context changes (implemented here via virtual tokens), this is a classical stream editor. Macros invocations are found in the stream, replaced (in the stream) by their replacement lists (without macro replacement), and scanning resumes at the first token from the replacement list. The input is: #define O 0 +X ## Y A #define A() A() B #define B() B() A #define C(x, y) x ## y #define D(...) D D ## __VA_ARGS__ __VA_ARGS__ #__VA_ARGS__ #define ID(...) __VA_ARGS__ #define P(p, x) p ## x(P O() A()()() C(C,) D(D(0, 1)) ID ( 1 ) P(,ID)(1,2),P(1,2))) Given no lexer/parser, the above is manually put in in the code. The output is: 0 <space> + XY <space> A ( ) <space> B <newline> A ( ) <space> B ( ) <space> A ( ) <space> B <newline> C <newline> D <space> DD ( 0 , <space> 1 ) <space> D <space> D0 , <space> 1 <space> 0 , <space> 1 <space> "0, 1" <space> "D(0, 1)" <newline> <space> <tab> 1 <space> <newline> P ( 1 , 2 ) , 12 ( P ) <newline> which is correct. g++ outputs: 0 +XY A() B A() B() A() B C D DD(0, 1) D D0, 1 0, 1 "0, 1" "D(0, 1)" 1 P(1,2),12(P) which is correct. cl outputs: 0 +XY A() B A() B() A() B C D DD D0, 1 0, 1 "0, 1" D D0, 1 0, 1 "0, 1" "D(0, 1)" 1 12(P,12(P) where the 4th and 6th are wrong. wave outputs: 0 +XY A() B A() B() A() B C D DD(0, 1) D D0, 1 0, 1 "0, 1" "D(0, 1)" 1 error: improperly terminated macro invocation or replacement-list terminates in partial macro expansion (not supported yet): missing ')' Regards, Paul Mensonides