
I just wrote a quick and dirty comparison between YARD and Spirit and YARD performs roughly 10x faster as a toy C++ tokenizer. I know Joel, I said I wouldn't do any comparisons, but I couldn't resist, what with Dave's claim to be outperforming Spirit by 50x! This increased performance of YARD is due to the fact that YARD generates the parser at compile-time, rather than at run-time. Clearly I am not using an optimized Spirit grammar, I opted instead to implement both grammars in a naive and straightforward manner. Here is the full Spirit grammar I used: single_comment_p = str_p("//") >> *(~ch_p('\n')) >> ~ch_p('\n'); full_comment_p = str_p("/*") >> anychar_p - str_p("*/"); comment_p = single_comment_p | full_comment_p; ws = +(space_p | comment_p); escape_char_p = ch_p('\\') >> anychar_p; string_literal_p = ch_p('"') >> *(escape_char_p | ~ch_p('"')) >> ch_p('"'); char_literal_p = ch_p('\'') >> (escape_char_p | ~ch_p('\'')) >> ch_p('\''); ident_p = (alpha_p | ch_p('_')) >> +(alnum_p | ch_p('_')); number_p = real_p; cpp_token = ws | char_literal_p | string_literal_p | number_p | ident_p[&inc_counter]; tokens = *(cpp_token | anychar_p); I would appreciate any suggestions on how to improve the Spirit grammar. The YARD grammar is far more verbose, here is only a small snippet: struct MatchBeginFullComment : public re_and< MatchChar<'/'>, MatchChar<'*'> > { }; struct MatchEndFullComment : public re_and< MatchChar<'*'>, MatchChar<'/'> > { }; struct MatchFullComment : public re_and< MatchBeginFullComment, MatchEndFullComment > { }; struct MatchComment : public re_or< MatchSingleLineComment, MatchFullComment > { }; Anyway you get the picture, YARD is verbose but quite fast. I will be including the full source in the next YARD release. Christopher Diggins http://sourceforge.net/projects/yard-parser