
Some time ago Eric Niebler wrote:
You commented on the algorithmic complexity, but have you benchmarked against Boost.regex? I'd be very curious to see how your back-end stacks up performance-wise.
I have done some benchmarking of ANFA-regex versus Boost.Regex, and it revealed the following (quite expected) trends. For very simple match problems ANFA-regex is by about an order of magnitude slower. Moreover, the slowdown noticeably increases with the number of capturing parentheses. For real-world-like match problems, like finding paragraphs of "libs/libraries.htm" by matching it against ".*?(<p>.*?</p>)", the slowdown drops to about 1.5x--2x. However, in this case the "regex_search" function of Boost.Regex performs much faster than "regex_match" (thanks to Knuth-Morris-Pratt algorithm, AFAICS). For complex / pathological problems ANFA-regex performs much faster (any number of orders of magnitude, depending on the problem) than Boost.Regex---the linear complexity pays off. Both Boost.Regex and ANFA-regex have linear complexity for simple regexes. Highly optimized nature of Boost.Regex allows its complexity to have much lower time constant than that of ANFA-regex, resulting in much better performance. However, for more involved regexes Boost.Regex gets nonlinear complexity, with respective consequences. The most obvious way of speeding up is to optimize the actual code (or help the compiler to optimize it). The problem is, the ANFA simulation algorithm is quite simple and straightforward, so there's not much room for improvement. On the other hand, there's a whole new story in optimizing the algorithm itself. Two main slowdown factors are (1) excessive tag copying and (2) nondeterminism. I can name two ways to reduce (1). The factor (2) is all about ANFA to ADFA conversion. These optimizations are subject to additional research. And, of course, there are additional speed reserves in KMP for search problems. (A side note. An ANFA version used for benchmarking is not the version I uploaded into yahoo groups recently---I actually managed to double its speed, or something like that.) -- Best regards, Vladimir Pozdyayev.