
I carefully said it was "my definition" because I'm aware there are multiple different interpretations. Why do you consider your definition to be the 'right' one? :)
The definition that commonly occurs in refactoring literature: preserving exterior behavior (or was it external, or outward facing?) I don't remember exactly. It amounts to any transformation that preserves the successful execution of a test program.
I would also point out that in order to ensure that a program is correct it first has to be preprocessed, and parsed. Something as simple as renaming a function - which is a well-known refactoring - requires none of that.
Actually, you're wrong. C and C++ require token pasting and escaped newline splicing to happen - and they certainly can occur in identifiers. That is, unless you're willing to break some correct code, various hacks like using sed can sometimes work... be careful of scoping issues though, particularly when macros can expand into {'s :)
Yeah, but you're still talking about transformations on structured text. Having a lexically correct program is a pretty far cry from actually an actually correct program - which I agree is important. You may also have the cases where you may want to operate directly on macros without expansion - or on header inclusions without inclusion.
The complexity of the refactoring determines the amount of information needed - whether or not you actually need a fully correct AST all the time - I doubt it.
Certainly, it obviously depends on the transformation.
Software engineering research literature is actually a good place to go to see how people are trying to deal with these problems (that is if you can still find people trying to work with C++ - most researchers prefer Java these days). One of the conclusions from all this work is that there's a distinct difference between a compiler and what sometimes gets called a reverse engineering parser. There's a tradeoff between correctness and robustness in their ability to work with more code and under different conditions - like in an editor, or in absence of a correct build. I guess I'm trying to say that there are a broad class of operations on source code that require a holistic view of the text rather than a fully preprocessed and lexical view - and the ability to build a partial AST on top of it. I think it will be interesting to see if llvm/clang is capable of addressing the two different approaches to source code analysis. Andrew Sutton asutton@cs.kent.edu