[boost] Interest in a fast lexical analyser compatible with Spirit

27 Dec 2004

      Christopher Diggins wrote:
...
Would you please post the grammar productions you used for Spirit? Would 
you
consider testing the yard parser as well, http://yard-parser.sf.net ? Are
there other advantages of your tool over Flex, other than providing a nice
interface to Spirit?
The grammar for Spirit was (in a slightly cut down form):

keyword =
 str_p( "Group" ) |
 str_p( "Separator" ) |
//etc.;
comment =
 lexeme_d[
  ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" )
 ];
stringLiteral =
 lexeme_d[
  ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" )
 ];
word =
 lexeme_d[
  ( alpha_p | ch_p( '_' ) ) >>
  * ( alnum_p | ch_p( '_' ) )
 ];
floatNum =
 real_p;
vrml = *( keyword | comment | stringLiteral | word | floatNum );

I've cut down the keywords because there are over 60 of them.  I would be 
interested to know if there was any obvious ways to optimise the Spirit 
parser.

Regarding writing a comparitive test for YARD, I will certainly consider it, 
although I have no experience of using YARD.  I've written quite a lot of 
flex/bison parsers, written a number of Spirit based parsers, and hand coded 
a number in my time, but I would need to spend a little time learning the 
YARD framework first.

There are a number of advantages over Flex (including a nice interface to 
Spirit).  The key ones are:

1)    An extensible object oriented approach to writing the entire system. 
This can be very useful if you want to handle something like say the parsing 
of a filename in the lexer.  You can simply write a new token type that will 
split the incoming filename into path, extension, name, etc.  This can 
massively simplify the production of a final parser - allowing you to deal 
with grammar issues at that stage.
2)    DFAs can be created at run-time, or eventually compile-time.
3)    The code is considerably less obfuscated than the code produced by 
flex.  Don't get me wrong, I like flex a lot, but the pre-processor 
directives, and look-up tables in the generated code are pretty unreadable 
IMHO.

Dave Handley