
Darren Cook wrote:
http://boost-sandbox.sf.net/libs/xpressive/doc/html/xpressive/perf.html
In short, xpressive comes out consistently ahead of Boost.Regex on short matches, and roughly on par for longer matches (with wide variation).
Interesting. This left me with two questions: 1. Why is dynamic quicker than static xpressive on some expressions?
It's only that way for gcc. On VC7.1, static xpressive is always faster. I can only guess that gcc's optimizer is at fault here.
2. Why is boost::regex quicker on longer strings? Something to do with buffering or dynamic memory usage?
I haven't fully investigated this, but I suspect that for some of those patterns, Boost.Regex is finding a clever optimization. I have noticed that if you change the pattern: Tom|Sawyer|Huckleberry|Finn to: Tom|Sawyer|.uckleberry|Finn ^ then xpressive is considerably faster than Boost.Regex at finding all matches. Clearly, I need to be testing more patterns to make sure the results are representative.
I thought "Huck[[:alpha:]]+" (expressive twice as quick) vs. "[[:alpha:]]+ing" (boost::regex twice as quick) was very curious. Is this due to some design decision, or just something waiting to be optimized?
This is a case where xpressive is finding a clever optimization that Boost.Regex is missing. When a pattern begins with a string literal, xpressive uses Boyer-Moore. It's a huge win. I have no idea why Boost.Regex is faster at matching "[[:alpha:]]+ing". It's worth looking in to.
Agreed. FYI, "_" matches any one character. ~_n matches any character that is not '\n'. I also need to describe _ln which matches a logical newline (eg., "\n" or "\r" or "\r\n" or other line separators) and ~_ln which matches any one character that is not a line separator.
_ln sounds useful. Is that in perl/PCRE ?
I don't recall where I got that idea. Perhaps from Perl 6. -- Eric Niebler Boost Consulting www.boost-consulting.com