Re: [boost] Re: Interest in a Recursive Descent Parsing Library

26 Dec 2004


      ----- Original Message ----- 
From: "Joel de Guzman" <joel@boost-consulting.com>
To: <boost@lists.boost.org>
Sent: Saturday, December 25, 2004 6:43 PM
Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
...
christopher diggins wrote:
...
----- Original Message ----- From: "Joel de Guzman" 
<joel@boost-consulting.com>
To: <boost@lists.boost.org>
Sent: Friday, December 24, 2004 6:21 PM
Subject: [boost] Re: Interest in a Recursive Descent Parsing Library
...
christopher diggins wrote:
...
struct StartRule {
 template<typename Elem_T>
 static bool Accept(ParserInputStream<Elem_T>& in) {
   switch (in.GetChar()) {
     case 'a' : return match<RuleA>(in);
     case 'b' : return match<RuleB>(in);
     case 'c' : return match<RuleC>(in);
     case 'd' : return match<RuleD>(in);
     default : return false;
   }
 }
}
This approach requires the programmer to figure out the FIRST(N) table 
by hand. I have found that there are typically only a couple of 
performance bottlenecks where lookahead is actually needed, and that 
generating these tables by hand to be easy. Perhaps other's experience 
is different.
I would be curious how to express the above grammars in Spirit, with 
and without the hand-rolled lookahead rule.
See: http://www.boost.org/libs/spirit/doc/switch_parser.html.
I think this does a good job of illustrating the difference between 
Spirit and YARD. Spirit would require the following code, for the grammar 
production written above using YARD:
rule<> rule_overall =
           switch_p
           [
               case_p<'a'>(parser_a),
               case_p<'b'>(parser_b),
               case_p<'c'>(parser_b),
               case_p<'d'>(parser_b),
               default_p(parser_default)
           ]
       ;
With Spirit there is an arbitrary upper bound on the number of case 
statements, which defaults to 3(!?). I would think that it is safe to say 
that the YARD parser is easier to use, and has fewer arbitrary 
constraints *BUT* does not provide dynamic rule expressions, and is more 
verbose.
Is this balance of features in a CFG parsing library something that 
interests anyone in the Boost community?
If comparison to your *hand-coded* switch statement is your rationale,
then that's unfair. You can also write a hand-coded single token
switch parser using spirit's functor parser:
http://www.boost.org/libs/spirit/doc/functor_parser.html
I didn't mean to make an unfair comparison, I simply used the resource you 
pointed me to (see above). I looked at the functor parser documentation and 
I honestly can't figure out how to rewrite the above example. Perhaps 
someone else could do it. That single comparison was only a small part of 
the basis for my rationale.

Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com