
Just an idea: for static xpressive, couldn't you detect at compile-time that the expression is truly regular, and use a DFA in that case?
Oh, sure! Why don't you submit a DFA and I'll use it in xpressive? ;-)
After nosing about on the internet a bit more, I found this interesting comparison:
http://shootout.alioth.debian.org/u32/benchmark.php?test=regexdna&lang=all
Here we see every language compared on how well it can perform on a particular regex task. The top-performer is <drumroll> Google's JavaScript V8 engine! Wow. C++ is in 5th place. The fastest C++ program submitted to the competition uses static xpressive <pats own back>. I'm not so upset about being beaten by V8. It adaptively improves its native codegen *at runtime*. What really bugs me is that we're skunked by a C library: Tcl. Grrrr. I've read a bit about Tcl's regex library; it does what Mathias is suggesting: implements both a DFA and an NFA, analyzes the pattern and chooses which to use. I've known for a while that this is the way forward, but I just don't have the time for that. (Wasn't there a GSoC project to do that for Boost.Regex?)
My memory fails me.... In any case the regex GSOC project never got off the ground. Nosing around the entries to the competition, I wonder how much of the performance difference is down to the regex engine, and how much to other tricks the entries use: for example I notice the top C program uses a thread pool to conduct everything in parallel. Cheating I say! ;-) John.