
Eric, my apologies for not posting before, I really must get around to a full review, but in the mean time:
In short, xpressive comes out consistently ahead of Boost.Regex on short matches, and roughly on par for longer matches (with wide variation). Results are shown for both gcc 3.4 and vc7.1. The xpressive download includes the code for the perf test, so you can run it yourself, if you like.
I'm attaching the full results I get from my performance test suite (updated to reflect the changes you made in the xpressive one, but without the static expressions: I couldn't be bothered to translate them all!). I compiled with VC7.1 with all the optimisations on (any suitable inline, global optimisations on, inline intrinsics on). Be aware that any such results should be treated with extreme caution: there is absolutely no such thing as a "typical" regex. I haven't run the tests with cygwin: in the past I've found cygwin to be a pretty poor platform for testing code performance, I don't think the cygwin guys have put much effort into runtime efficiency, as compared to Linux say. For the "trivial" short matches test, I see broadly very similar results from xpressive and Boost.Regex, I haven't counted them, but it looks like honors even to me. The bad news is that PCRE kicks us both into touch on this test, however I'm pretty sure that Boost.Regex only dropped behind PCRE on this test section, when I added protection againt stack overflow (a __try __except block). Since I pinched this idea from GRETA, I suspect xpression does the same thing? In any case that protection is a "good thing" so it's staying in Boost.Regex whatever. For the html document search test cases xpressive is consistently ahead: by up 5x, probably the Boyer-Moore code kicking in (but see below). On this test PCRE is somewhere in between us. For the C++ code search test cases, Boost.Regex is consistently ahead by about 2x (it beats PCRE on this test as well), there is one complicated expression that xpressive didn't compile, but which Boost.Regex and PCRE did handle OK. For the plain text search test cases, Boost.Regex is ahead of xpressive in all cases, up to 12x faster in the short text, and 25x in the very long text (these are the extremes though, don't pay too much attention!). The really interesting thing about these tests, is that some of the expressions are string literals: the xpressive Boyer Moore code should really shine here, and yet it comes out quite a bit slower. I'm not completely sure what's happening here, but I would guess that the Boost.Regex code is easier to optimise, and so executes faster, *provided* it doesn't find a tentative match too often. In comparison in the html tests, most of the regular expressions begin with "<", which tends to occur rather often in html. So in this case, a better algorithm wins out over the easier to optimise code. Ideally there would be an heuristist that would choose between the two, and always pick the best, but without analysing the text that you're going to search first I don't what it would be at present :-( Finally, I would note that in the search tests the Boost.Regex results are hampered compared to PCRE by the need to construct a regex_iterator: the class is a pimple which uses a shared_ptr, so there's a couple of memory allocations in there that PCRE doesn't have to do. For short searches this a big hit, but *only* in the rarified atmosphere of a test suite, back in the real world iterators tend to get passed around by value, so this is a another "good thing" in general. I haven't looked at PCRE, but I suspect you do something similar? John.