
I'm always suspect of benchmarks, but I figured I'd post this anyway.
LOL, nice to see my own benchmark suite used against me!
Anyway, it's not my intention to kick off yet another costly escalation in the regex war. God help me, I don't have the time. I post it here because I haven't come across a comparison of these four libraries by a third party before. Some quick observations:
- I'm pleased that on the balance, xpressive fares well here relative to the other NFAs. In the end, though, not much separates best from worst.
- All the NFA libraries seem to have large overhead that becomes very noticeable when matching short strings. I don't know why.
I spent some time trying mostly in vain to hammer this down lower with current Boost.Regex - it turns out that most of it is down to initialisation of state - Perl-style re's typically have a lot of state to initialise. The thing to remember is for that short matches against short texts the cost of a match is basically the cost of a function call for DFA's. Or to put it another way, even if the NFA's are 10x slower on these short tests, 10x almost nothing is still almost nothing (though I accept there are certain outlyer-use-cases where this overhead becomes an issue).
- I'm gobsmacked that my VERY OLD library greta handily beats pcre, boost.regex /and/ xpressive on short matches. Failing to beat yourself in any perf benchmark is kind of embarrassing.
Probably more features in xpressive to set up?
- I'm tickled that for long matches, both xpressive and boost.regex (NFAs) beat his library (DFA). How'd that happen?
Maybe reflects the preocupations of their authors?
- I find it interesting that both xpressive and boost.regex have large discontinuities in the chart at the bottom; the others don't. Huh.
Weird indeed, can't really imagine why that would happen, unless there's some kind of cache miss effect going on? Cheers, John.