[boost] Re: Interest in a fast lexical analyser compatible with Spirit

28 Dec 2004

      Hartmut Kaiser wrote:
...
Dave Handley wrote:
...
The grammar for Spirit was (in a slightly cut down form):
keyword =
str_p( "Group" ) |
str_p( "Separator" ) |
//etc.;
comment =
lexeme_d[
 ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" 
)  ]; stringLiteral =  lexeme_d[
 ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( 
"\"\n\r" )  ]; word =  lexeme_d[
 ( alpha_p | ch_p( '_' ) ) >>
 * ( alnum_p | ch_p( '_' ) )
];
floatNum =
real_p;
vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. 
I would be interested to know if there was any obvious ways 
to optimise the Spirit parser.
At least you could have used the symbol parser (look here:
http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic
parser usable especially for keyword matching. I'm pretty sure, that this
alone would speed up your test case a lot, because your keyword rule from
above (if it contains 60 alternatives) is a _very_ ineffective way to
recognise known in advance keywords.
Yes. I'd be very interested to know the results when using the
symbol parser. A 60 alternative rule will definitely be slow!

Some more things to note:
* why use lexeme_d on the lexer stage?
* the comment rule can be rewritten as:
   '#' >> *(anychar_p - eol_p) >> eol_p;
* stringLiteral can be rewriten similarly.
* word can be rewritten using chsets.

One thing for sure is that the grammar is not optimized.

Cheers,
-- 
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net