[boost] Re: Interest in a Recursive Descent Parsing Library

27 Dec 2004

      Larry Evans wrote:
...
On 12/25/2004 11:43 PM, Joel de Guzman wrote:
...
...
The problem is that each alternative can potentially have
different types. A production such as:
Could you explain what this means.  I'm guessing it means
each alternate would have a different synthesized attribut
calculated for it.
Hi Larry!

Ehm... no, the alternatives themselves have different types.
Consider the grammar:

    r = ch_p('a') | ch_p('b') >> ch_p('c');

the type of the left alternative is chlit<char>. The type of
the right alternative is sequence<chlit<char>, chlit<char> >.
...
...
a | b | c;
...
Wouldn't the most "natural" attribute for this be a
"named" variant which map contain the same type for
two or more different "names"?  IOW, instead of the
variant currently in boost, the sum type in:
sandbox/boost/indexed_types/sum.hpp
A variant. Yes, that's the plan. But that's a different matter.
Anyway. Now that you've mentioned it. Yes, a named
variant would be nice!
...
would represent the result of parsing the alternative expression:
a | b | c
and the names could be 1,2,3 (all unsigned) or possibly names,
such as elements of an enumeration:
enum alt_abc{ alt_a, alt_b, alt_c};
Very nice!
...
...
where A, B and C are the types of a, b and c. So, now, when
creating the parser, you'll have to make a table of alternates
tuple<A, B, C> which dispatches properly based on information
known only at runtime: 1) the current input character and
Of course, I understand how this must be only known at runtime.
...
2) the first follow sets of A, B and C.
Now this could be calculated at compile time if there were no
cycles in the productions.  For example, the simplest case would
be:
A = 'a' A | ''
i.e. a sequence of a's.  The cycle, or recursion, is caused by
the A on the rhs.  Of course, we know the solution in this
simple case, but other's would, I guess, couldn't be solved or
would be too difficult to solve. (I'm really guessing just based
on my inability to figure out how).  The point is, some could be
That adds another complication. But then my point was that:

1) the dynamic nature of spirit parsers makes it impossible
    to compute the first/follow sets at compile time.

2) You use this first/follow information to
     A) manipulate and build a parse table in the form of a
        tuple with heterogeneous types (not an array or vector
        of homogeneous types) and
     B) make a prediction of which alternative to choose in a
        tuple<alternatives...>; so basically, you'll have a
        runtime variable N (index) based on the prediction, but
        being in a tuple, choosing the right alternative requires
        a static constant N (index) ( i.e. get<N>(alt)
        instead of alt[N] ).

The impedance mismatch between compile time and runtime is what
makes this so darned difficult.
...
while other's might not be; hence, a more accurate statement
would be:
  2) some of the first and follow sets of A, B, and C could not be
  calculated at compile time or would be "too difficult" to
  calculate at compile time.
Yes!

Regards,
-- 
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net