
Dave Handley wrote:
The grammar for Spirit was (in a slightly cut down form):
keyword = str_p( "Group" ) | str_p( "Separator" ) | //etc.; comment = lexeme_d[ ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" ) ]; stringLiteral = lexeme_d[ ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" ) ]; word = lexeme_d[ ( alpha_p | ch_p( '_' ) ) >> * ( alnum_p | ch_p( '_' ) ) ]; floatNum = real_p; vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. I would be interested to know if there was any obvious ways to optimise the Spirit parser.
At least you could have used the symbol parser (look here: http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic parser usable especially for keyword matching. I'm pretty sure, that this alone would speed up your test case a lot, because your keyword rule from above (if it contains 60 alternatives) is a _very_ ineffective way to recognise known in advance keywords.
1) An extensible object oriented approach to writing the entire system. This can be very useful if you want to handle something like say the parsing of a filename in the lexer. You can simply write a new token type that will split the incoming filename into path, extension, name, etc. This can massively simplify the production of a final parser - allowing you to deal with grammar issues at that stage. 2) DFAs can be created at run-time, or eventually compile-time. 3) The code is considerably less obfuscated than the code produced by flex. Don't get me wrong, I like flex a lot, but the pre-processor directives, and look-up tables in the generated code are pretty unreadable IMHO.
Is there any documentation available? Regards Hartmut