
Hartmut Kaiser wrote:
Dave Handley wrote:
The grammar for Spirit was (in a slightly cut down form):
keyword = str_p( "Group" ) | str_p( "Separator" ) | //etc.; comment = lexeme_d[ ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" ) ]; stringLiteral = lexeme_d[ ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" ) ]; word = lexeme_d[ ( alpha_p | ch_p( '_' ) ) >> * ( alnum_p | ch_p( '_' ) ) ]; floatNum = real_p; vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. I would be interested to know if there was any obvious ways to optimise the Spirit parser.
At least you could have used the symbol parser (look here: http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic parser usable especially for keyword matching. I'm pretty sure, that this alone would speed up your test case a lot, because your keyword rule from above (if it contains 60 alternatives) is a _very_ ineffective way to recognise known in advance keywords.
Yes. I'd be very interested to know the results when using the symbol parser. A 60 alternative rule will definitely be slow! Some more things to note: * why use lexeme_d on the lexer stage? * the comment rule can be rewritten as: '#' >> *(anychar_p - eol_p) >> eol_p; * stringLiteral can be rewriten similarly. * word can be rewritten using chsets. One thing for sure is that the grammar is not optimized. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net