RE: [boost] Interest in a fast lexical analyser compatible with Spirit

27 Dec 2004


      Dave Handley wrote:
...
The grammar for Spirit was (in a slightly cut down form):
keyword =
 str_p( "Group" ) |
 str_p( "Separator" ) |
//etc.;
comment =
 lexeme_d[
  ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" 
)  ]; stringLiteral =  lexeme_d[
  ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( 
"\"\n\r" )  ]; word =  lexeme_d[
  ( alpha_p | ch_p( '_' ) ) >>
  * ( alnum_p | ch_p( '_' ) )
 ];
floatNum =
 real_p;
vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. 
 I would be interested to know if there was any obvious ways 
to optimise the Spirit parser.
At least you could have used the symbol parser (look here:
http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic
parser usable especially for keyword matching. I'm pretty sure, that this
alone would speed up your test case a lot, because your keyword rule from
above (if it contains 60 alternatives) is a _very_ ineffective way to
recognise known in advance keywords.
...
1)    An extensible object oriented approach to writing the 
entire system. 
This can be very useful if you want to handle something like 
say the parsing of a filename in the lexer.  You can simply 
write a new token type that will split the incoming filename 
into path, extension, name, etc.  This can massively simplify 
the production of a final parser - allowing you to deal with 
grammar issues at that stage.
2)    DFAs can be created at run-time, or eventually compile-time.
3)    The code is considerably less obfuscated than the code 
produced by 
flex.  Don't get me wrong, I like flex a lot, but the 
pre-processor directives, and look-up tables in the generated 
code are pretty unreadable IMHO.
Is there any documentation available?

Regards Hartmut

RE: [boost] Interest in a fast lexical analyser compatible with Spirit

Hartmut Kaiser