Re: [boost] Clang: Open-source C/C++ front end under development

1 Sep 2007

      ...
...
No, you're technically correct. Some semantic analysis is certainly
required to parse C++, so you can't completely drop semantic analysis
and still parse.
Isn't "some" a huge understatement?  I mean, c'mon, you need to do
overload resolution!  Just evaluate
boost::detail::is_incrementable<X>::value for some X, for example.
C++ is clearly more complicated than C.  The minimal amount of  
semantic processing for C++ will probably include scoping, namespace,  
class and function processing (where in C you just need to track  
typedefs + scoping).  However, you don't need to track function  
bodies and a lot of other things if you don't want to.
...
...
and the
parser will certainly need to call into the semantic analysis module
to figure out whether a particular name is a type, a value, a
template, etc... just like a C parser needs to consult a symbol
table to figure out whether a name is a typedef name or something
else.
Yeah, only more so.  At one point he said of the parser, "we don't do
constant folding," but clearly you need to do that to decide whether a
name is a type or not.
foo<3*5>::x * y;
It seems to me that for C++ with templates, during parsing you have to
all the semantic analysis that isn't code generation -- and that's a
lot.
No one is debating that you have to track the right things.  For  
integer constant expressions, you can clearly track the value as you  
parse, regardless of whether you are building an AST or not.  You do  
have to do minimal amounts of semantic analysis to do this.  In C,  
the closest example are things like "case 1+4/(someenumval):".  In  
the AST, we actually do represent the fully expanded form (which is  
useful for some clients of the AST) and compute the i-c-e value on  
demand.

As Doug mentioned, the most important point of the design space we  
are in is to keep the syntax and semantics partitioned from each  
other.  This makes it easier to understand either of the two and  
enforces a clear and well-defined interface boundary between the  
two.  Having both a minimal semantics implementation and a full AST- 
building semantics analysis module is more useful as verification  
that the interfaces are correct than anything else.
...
...
What this probably means is that the "minimal" semantic analysis for
C++ is a whole lot more heavyweight than the minimal semantic
analysis for C. But you still get some benefit from separating out
the semantics from the parser, because there are many semantic bits
that you *can* ignore if you only want an (unchecked) parse tree.
What, other than code generation?
It has often seemed to me that it might make sense to parse C++
nondeterministically, just to avoid some of these issues.  The number
of real instances of ambiguity is probably pretty small.
Without more context I'm not sure what you mean by non- 
deterministic.  Obviously (hopefully) all parsers are  
deterministic :).  There are at least three different ways of parsing  
C++ fuzzily:

1. Use a doxygen-style "fuzzy" parser with a set of heuristics.  This  
is needed if you want to try to parse files without processing  
headers, but has obvious significant limitations.
2. Parse and track the "minimal" set of semantic information to parse  
correctly.  This gives you a correct parse tree, and reduces the  
amount of semantic information you need to keep around (memory use is  
lower than a full sema implementation), but for C++, you end up doing  
a lot of stuff anyway.
3. Parse a superset of the language and either resolve the ambiguity  
later or not.  The problem with this is that it is significantly less  
efficient both in time and space than using semantic information to  
direct the parser.  However, it can provide a nice separation between  
the parser and semantic analyzer.

-Chris

Re: [boost] Clang: Open-source C/C++ front end under development

Chris Lattner