[boost] Re: Interest in a Recursive Descent Parsing Library

25 Dec 2004

      Joel de Guzman wrote:
...
David Abrahams wrote:
...
...
...
Is first/follow really linked to determinism?  It seems to me that you
can use first/follow sets to prune unproductive searches in a
nondeterministic parse, too (?)
Right. First-follow makes a parser predictive and thus deterministic,
Okay, either I'm "right" and first-follow can be used to prune
unproductive parses in a nondeterministic parse *or* using first-follow
makes a parser deterministic.  It can't be both :)
Using first-follow makes a parser deterministic, it can't be both.
An LL(1) parser, for instance, doesn't do backtracking.
Yes, but LL(1) is not synonymous with "recursive descent with first/follow."
...
It always
has one and only one choice when faced with alternatives. It's
roughly like the switch (deterministic) versus the cascading ifs
(non-deterministic).
However... (see below)
...
As I understand "nondeterministic" it just means you're doing
backtracking when a parse gets into trouble, unlike in an NFA where
you're conceptually moving along all viable paths in parallel.  I don't
see why the use of first/follow should preclude backtracking.  You can
always define ambiguous grammars with dynamic failure conditions.
...
typically with one symbol lookahead (LL(1)). A strictly predictive
parser is a lot more restrictive in the grammars that one can use,
however. There's no way to make it parse adhoc grammars like the
C pre-processor or, ahem, c++. What I am inclined to do is to mix
determinism with classic spirit's non-determinism. Exactly the
recipe that you are alluding to.
I'm not sure if it is or not -- I'm just getting more confuseder. ;^)
Classic LL(1) does not have the "dynamic failure conditions" that
you mention.
Yes, I know that.  But then, LL(1) is not synonymous with "recursive
descent with first/follow."
...
NLL(1) grammars cannot have ambiguity. If the grammar
is ambiguous, first follow computation will fail (first follow
conflicts).
Hum?  The algorithm for computing the first and follow sets of a symbol
don't have an "otherwise the computation fails" branch, and the
definitions of these sets are simple:

  first(a) = the set of tokens that can begin strings derived from a.
  follow(N) = the set of tokens that can follow N in grammatical sequences

You can use this information to eliminate hopeless branches of a parse
and avoid wasting effort backtracking without requiring the grammar to
be LL(1) or LL(2) or... even to be unambiguous at all.
...
I'm developing a scheme where you can mix both predictive LL(1) with
classic spirit's non-determinism. The scheme involves using a
predictive front, backed up by a non-predictive fall-back step when
ambiguity arises. It's roughly like combining the switch with
cascading-ifs.
Maybe we're talking about the same thing.
...
It's still too early if such a scheme will hold water. AFAIK, no one
has done it yet.
If you're talking about what I'm describing, I don't think there's
anything particularly radical about it, and I'd be very surprised if it
hadn't been done before.
...
An alternative scheme is to let the user switch
strategies (possibly using spirit directives).
Another scheme, used by PCCTS, is to make the parser predictive
by default, and allow the user to use syntactic and semantic
predicates with arbitrary lookahead on specific points in
the grammar.
Eewww, I smell a kludge ;-)

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com