Interest in a fast lexical analyser compatible with Spirit

Christopher Diggins wrote:
Would you please post the grammar productions you used for Spirit? Would you consider testing the yard parser as well, http://yard-parser.sf.net ? Are there other advantages of your tool over Flex, other than providing a nice interface to Spirit?
The grammar for Spirit was (in a slightly cut down form): keyword = str_p( "Group" ) | str_p( "Separator" ) | //etc.; comment = lexeme_d[ ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" ) ]; stringLiteral = lexeme_d[ ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" ) ]; word = lexeme_d[ ( alpha_p | ch_p( '_' ) ) >> * ( alnum_p | ch_p( '_' ) ) ]; floatNum = real_p; vrml = *( keyword | comment | stringLiteral | word | floatNum ); I've cut down the keywords because there are over 60 of them. I would be interested to know if there was any obvious ways to optimise the Spirit parser. Regarding writing a comparitive test for YARD, I will certainly consider it, although I have no experience of using YARD. I've written quite a lot of flex/bison parsers, written a number of Spirit based parsers, and hand coded a number in my time, but I would need to spend a little time learning the YARD framework first. There are a number of advantages over Flex (including a nice interface to Spirit). The key ones are: 1) An extensible object oriented approach to writing the entire system. This can be very useful if you want to handle something like say the parsing of a filename in the lexer. You can simply write a new token type that will split the incoming filename into path, extension, name, etc. This can massively simplify the production of a final parser - allowing you to deal with grammar issues at that stage. 2) DFAs can be created at run-time, or eventually compile-time. 3) The code is considerably less obfuscated than the code produced by flex. Don't get me wrong, I like flex a lot, but the pre-processor directives, and look-up tables in the generated code are pretty unreadable IMHO. Dave Handley

----- Original Message ----- From: "Dave Handley" <dave@dah.me.uk> To: <boost@lists.boost.org> Sent: Monday, December 27, 2004 12:39 PM Subject: [boost] Interest in a fast lexical analyser compatible with Spirit
Regarding writing a comparitive test for YARD, I will certainly consider it, although I have no experience of using YARD. I've written quite a lot of flex/bison parsers, written a number of Spirit based parsers, and hand coded a number in my time, but I would need to spend a little time learning the YARD framework first.
Hi David, I will write the grammar for you. I am just finishing up version 2.0 of YARD, and then I will post the equivalent YARD grammar. Thanks. Christopher Diggins http://www.cdiggins.com http://www.heron-language.com

Dave Handley wrote:
The grammar for Spirit was (in a slightly cut down form):
keyword = str_p( "Group" ) | str_p( "Separator" ) | //etc.; comment = lexeme_d[ ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" ) ]; stringLiteral = lexeme_d[ ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" ) ]; word = lexeme_d[ ( alpha_p | ch_p( '_' ) ) >> * ( alnum_p | ch_p( '_' ) ) ]; floatNum = real_p; vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. I would be interested to know if there was any obvious ways to optimise the Spirit parser.
At least you could have used the symbol parser (look here: http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic parser usable especially for keyword matching. I'm pretty sure, that this alone would speed up your test case a lot, because your keyword rule from above (if it contains 60 alternatives) is a _very_ ineffective way to recognise known in advance keywords.
1) An extensible object oriented approach to writing the entire system. This can be very useful if you want to handle something like say the parsing of a filename in the lexer. You can simply write a new token type that will split the incoming filename into path, extension, name, etc. This can massively simplify the production of a final parser - allowing you to deal with grammar issues at that stage. 2) DFAs can be created at run-time, or eventually compile-time. 3) The code is considerably less obfuscated than the code produced by flex. Don't get me wrong, I like flex a lot, but the pre-processor directives, and look-up tables in the generated code are pretty unreadable IMHO.
Is there any documentation available? Regards Hartmut

Hartmut Kaiser wrote:
Dave Handley wrote:
The grammar for Spirit was (in a slightly cut down form):
keyword = str_p( "Group" ) | str_p( "Separator" ) | //etc.; comment = lexeme_d[ ch_p( '#' ) >> * ( ~chset_p( "\n\r" ) ) >> chset_p( "\n\r" ) ]; stringLiteral = lexeme_d[ ch_p( '\"' ) >> * ( ~chset_p( "\"\n\r" ) ) >> chset_p( "\"\n\r" ) ]; word = lexeme_d[ ( alpha_p | ch_p( '_' ) ) >> * ( alnum_p | ch_p( '_' ) ) ]; floatNum = real_p; vrml = *( keyword | comment | stringLiteral | word | floatNum );
I've cut down the keywords because there are over 60 of them. I would be interested to know if there was any obvious ways to optimise the Spirit parser.
At least you could have used the symbol parser (look here: http://www.boost.org/libs/spirit/doc/symbols.html), which is a deterministic parser usable especially for keyword matching. I'm pretty sure, that this alone would speed up your test case a lot, because your keyword rule from above (if it contains 60 alternatives) is a _very_ ineffective way to recognise known in advance keywords.
Yes. I'd be very interested to know the results when using the symbol parser. A 60 alternative rule will definitely be slow! Some more things to note: * why use lexeme_d on the lexer stage? * the comment rule can be rewritten as: '#' >> *(anychar_p - eol_p) >> eol_p; * stringLiteral can be rewriten similarly. * word can be rewritten using chsets. One thing for sure is that the grammar is not optimized. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net
participants (4)
-
christopher diggins
-
Dave Handley
-
Hartmut Kaiser
-
Joel de Guzman