Re: [boost] [Review] xpressive

17 Sep 2005

      Eric, my apologies for not posting before, I really must get around to a 
full review, but in the mean time:
...
In short, xpressive comes out consistently ahead of Boost.Regex on short
matches, and roughly on par for longer matches (with wide variation).
Results are shown for both gcc 3.4 and vc7.1. The xpressive download
includes the code for the perf test, so you can run it yourself, if you
like.
I'm attaching the full results I get from my performance test suite (updated 
to reflect the changes you made in the xpressive one, but without the static 
expressions: I couldn't be bothered to translate them all!).  I compiled 
with VC7.1 with all the optimisations on (any suitable inline, global 
optimisations on, inline intrinsics on).  Be aware that any such results 
should be treated with extreme caution: there is absolutely no such thing as 
a "typical" regex.  I haven't run the tests with cygwin: in the past I've 
found cygwin to be a pretty poor platform for testing code performance, I 
don't think the cygwin guys have put much effort into runtime efficiency, as 
compared to Linux say.

For the "trivial" short matches test, I see broadly very similar results 
from xpressive and Boost.Regex, I haven't counted them, but it looks like 
honors even to me.  The bad news is that PCRE kicks us both into touch on 
this test, however I'm pretty sure that Boost.Regex only dropped behind PCRE 
on this test section, when I added protection againt stack overflow (a __try 
__except block).  Since I pinched this idea from GRETA, I suspect xpression 
does the same thing?  In any case that protection is a "good thing" so it's 
staying in Boost.Regex whatever.

For the html document search test cases xpressive is consistently ahead: by 
up 5x, probably the Boyer-Moore code kicking in (but see below).  On this 
test PCRE is somewhere in between us.

For the C++ code search test cases, Boost.Regex is consistently ahead by 
about 2x (it beats PCRE on this test as well), there is one complicated 
expression that xpressive didn't compile, but which Boost.Regex and PCRE did 
handle OK.

For the plain text search test cases, Boost.Regex is ahead of xpressive in 
all cases, up to 12x faster in the short text, and 25x in the very long text 
(these are the extremes though, don't pay too much attention!).

The really interesting thing about these tests, is that some of the 
expressions are string literals: the xpressive Boyer Moore code should 
really shine here, and yet it comes out quite a bit slower.  I'm not 
completely sure what's happening here, but I would guess that the 
Boost.Regex code is easier to optimise, and so executes faster, *provided* 
it doesn't find a tentative match too often.  In comparison in the html 
tests, most of the regular expressions begin with "<", which tends to occur 
rather often in html.  So in this case, a better algorithm wins out over the 
easier to optimise code.

Ideally there would be an heuristist that would choose between the two, and 
always pick the best, but without analysing the text that you're going to 
search first I don't what it would be at present :-(

Finally, I would note that in the search tests the Boost.Regex results are 
hampered compared to PCRE by the need to construct a regex_iterator: the 
class is a pimple which uses a shared_ptr, so there's a couple of memory 
allocations in there that PCRE doesn't have to do.  For short searches this 
a big hit, but *only* in the rarified atmosphere of a test suite, back in 
the real world iterators tend to get passed around by value, so this is a 
another "good thing" in general.  I haven't looked at PCRE, but I suspect 
you do something similar?

John.

Re: [boost] [Review] xpressive

John Maddock