
No, you're technically correct. Some semantic analysis is certainly required to parse C++, so you can't completely drop semantic analysis and still parse.
Isn't "some" a huge understatement? I mean, c'mon, you need to do overload resolution! Just evaluate boost::detail::is_incrementable<X>::value for some X, for example.
C++ is clearly more complicated than C. The minimal amount of semantic processing for C++ will probably include scoping, namespace, class and function processing (where in C you just need to track typedefs + scoping). However, you don't need to track function bodies and a lot of other things if you don't want to.
and the parser will certainly need to call into the semantic analysis module to figure out whether a particular name is a type, a value, a template, etc... just like a C parser needs to consult a symbol table to figure out whether a name is a typedef name or something else.
Yeah, only more so. At one point he said of the parser, "we don't do constant folding," but clearly you need to do that to decide whether a name is a type or not.
foo<3*5>::x * y;
It seems to me that for C++ with templates, during parsing you have to all the semantic analysis that isn't code generation -- and that's a lot.
No one is debating that you have to track the right things. For integer constant expressions, you can clearly track the value as you parse, regardless of whether you are building an AST or not. You do have to do minimal amounts of semantic analysis to do this. In C, the closest example are things like "case 1+4/(someenumval):". In the AST, we actually do represent the fully expanded form (which is useful for some clients of the AST) and compute the i-c-e value on demand. As Doug mentioned, the most important point of the design space we are in is to keep the syntax and semantics partitioned from each other. This makes it easier to understand either of the two and enforces a clear and well-defined interface boundary between the two. Having both a minimal semantics implementation and a full AST- building semantics analysis module is more useful as verification that the interfaces are correct than anything else.
What this probably means is that the "minimal" semantic analysis for C++ is a whole lot more heavyweight than the minimal semantic analysis for C. But you still get some benefit from separating out the semantics from the parser, because there are many semantic bits that you *can* ignore if you only want an (unchecked) parse tree.
What, other than code generation?
It has often seemed to me that it might make sense to parse C++ nondeterministically, just to avoid some of these issues. The number of real instances of ambiguity is probably pretty small.
Without more context I'm not sure what you mean by non- deterministic. Obviously (hopefully) all parsers are deterministic :). There are at least three different ways of parsing C++ fuzzily: 1. Use a doxygen-style "fuzzy" parser with a set of heuristics. This is needed if you want to try to parse files without processing headers, but has obvious significant limitations. 2. Parse and track the "minimal" set of semantic information to parse correctly. This gives you a correct parse tree, and reduces the amount of semantic information you need to keep around (memory use is lower than a full sema implementation), but for C++, you end up doing a lot of stuff anyway. 3. Parse a superset of the language and either resolve the ambiguity later or not. The problem with this is that it is significantly less efficient both in time and space than using semantic information to direct the parser. However, it can provide a nice separation between the parser and semantic analyzer. -Chris