Re: [boost] Clang: Open-source C/C++ front end under development

2 Sep 2007

      ...
I carefully said it was "my definition" because I'm aware there are
multiple different interpretations.  Why do you consider your
definition to be the 'right' one? :)
The definition that commonly occurs in refactoring literature:  
preserving exterior behavior (or was it external, or outward facing?)  
I don't remember exactly. It amounts to any transformation that  
preserves the successful execution of a test program.
...
...
I would also
point out that in order to ensure that a program is correct it first
has to be preprocessed, and parsed. Something as simple as renaming a
function - which is a well-known refactoring - requires none of that.
Actually, you're wrong.  C and C++ require token pasting and escaped
newline splicing to happen - and they certainly can occur in
identifiers.  That is, unless you're willing to break some correct
code, various hacks like using sed can sometimes work... be careful
of scoping issues though, particularly when macros can expand into
{'s :)
Yeah, but you're still talking about transformations on structured  
text. Having a lexically correct program is a pretty far cry from  
actually an actually correct program - which I agree is important.  
You may also have the cases where you may want to operate directly on  
macros without expansion - or on header inclusions without inclusion.
...
...
The complexity of the refactoring determines the amount of
information needed - whether or not you actually need a fully correct
AST all the time - I doubt it.
Certainly, it obviously depends on the transformation.
Software engineering research literature is actually a good place to  
go to see how people are trying to deal with these problems (that is  
if you can still find people trying to work with C++ - most  
researchers prefer Java these days). One of the conclusions from all  
this work is that there's a distinct difference between a compiler  
and what sometimes gets called a reverse engineering parser. There's  
a tradeoff between correctness and robustness in their ability to  
work with more code and under different conditions - like in an  
editor, or in absence of a correct build.

I guess I'm trying to say that there are a broad class of operations  
on source code that require a holistic view of the text rather than a  
fully preprocessed and lexical view - and the ability to build a  
partial AST on top of it. I think it will be interesting to see if  
llvm/clang is capable of addressing the two different approaches to  
source code analysis.

Andrew Sutton
asutton@cs.kent.edu