
On Sat, 2007-09-01 at 13:02 -0700, Chris Lattner wrote:
No, you're technically correct. Some semantic analysis is certainly required to parse C++, so you can't completely drop semantic analysis and still parse.
Isn't "some" a huge understatement? I mean, c'mon, you need to do overload resolution! Just evaluate boost::detail::is_incrementable<X>::value for some X, for example.
C++ is clearly more complicated than C. The minimal amount of semantic processing for C++ will probably include scoping, namespace, class and function processing (where in C you just need to track typedefs + scoping). However, you don't need to track function bodies and a lot of other things if you don't want to.
As Dave noted, it also includes template instantiation and overload resolution. It's a phenomenal amount of work to write a full C++ parser, because you need nearly everything that a compiler needs. Once you have that, "minimal" semantic analysis can still be very useful. That minimal analysis still includes most of the capabilities of a compiler (yes, template instantiation and overloading have to be there to be 100% correct), but it can still avoid instantiations of function templates, instantiations of class templates without specializations, code generation, and much of the other semantic analysis tasks. So while an AST-producing C++ parser won't have much less code than a full C++ compiler, it will execute far less of that code. You need template instantiation and overload resolution, but only in very limited cases.
As Doug mentioned, the most important point of the design space we are in is to keep the syntax and semantics partitioned from each other. This makes it easier to understand either of the two and enforces a clear and well-defined interface boundary between the two. Having both a minimal semantics implementation and a full AST- building semantics analysis module is more useful as verification that the interfaces are correct than anything else.
It's also extremely useful for anyone who wants to manipulate the ASTs. The reason GCC is so darned hard to work with (aside from the crusty C code and ambiguous data structures) is that there is no separate API for manipulating the AST. The parsing is intertwined with the semantic analysis, so if you want to go through and build a new tree *without* parsing code for that tree, things can get ugly. - Doug