
Lynn Allan wrote:
Stuart Dootson wrote: Lynn - it might be interesting to use re2c (http://re2c.org/) to generate regex machines and see what their performance is like - after all, they do clain 're2c generates warning free code that is equal to hand-written code in terms of size, speed and quality.' (I'm offering no opinion on the veracity of that statement....), so (allegedly) they'd give a reasonable hand-code analogue....
Lynn Allan wrote: I took a look at it, and it looks like it could possibly be quite useful if executable size was a concern .... but it is pretty obscure
re2c seems very fast, and once you get the hang of it, not that difficult to use. For the contrived test of looking for DayOfWeek within a 1kb buffer, here are some results ... ymmv (WinXp-Sp2 Microsoft ToolKit-2003 C++ Compiler with /O2 optimize flag)
regex = "(Sunday|Sun)|(Monday|Mon) etc.(Saturday|Sat)";
By-hand parser: Elapsed for 10000 loops Ms: 110.583 By-hand parser: Elapsed for 100000 loops Ms: 1107.81
re2c generator Elapsed for 10000 loops Ms: 69.4683 re2c generator Elapsed for 100000 loops Ms: 700.546
Boost::xpressive-static-iterator: Elapsed for 10000 loops Ms: 410.492 Boost::xpressive-static-iterator: Elapsed for 100000 loops Ms: 4164.45
Boost::regex: Elapsed for 10000 loops Ms: 622.377 Boost::regex: Elapsed for 100000 loops Ms: 5965.24
Interesting! Can you confirm that re2c is not handling backreferences? That is, after a match, is there a way to access what the Nth group matched? Also, do you think you could send around the code that re2c is generating for this expression? I'm guessing that re2c is generating a DFA. Boost.Regex and Xpressive generate NFAs because DFAs aren't suited to doing backreferences[*]. I've considered adding DFA support to Xpressive, and use DFAs for those regexes that don't need the full power of NFAs. Clearly, the performance win would be worth the trouble. This would not be a trivial undertaking, however. [*] Technically, DFAs only have a problem with patterns such as "(.)\\1"; that is, when the result of the backreference is used within the pattern itself. -- Eric Niebler Boost Consulting www.boost-consulting.com