
On 10/28/2011 10:05 AM, John Maddock wrote:
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.htmlhttp://swtch.com/~rsc/regexp/regexp1.html- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.pnghttp://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.
I'd be very interested in seeing the data that show this.
I confess I don't have any myself, but I recall Eric Niebler mentioning that he ran some experiments on this, but you know... lies damn lies and benchmarks and all that...
I once coded up a Thompson NFA and put it in the old Boost File Vault. I have no idea whatever became of the file vault, but with a quick google search, I found the code here: https://github.com/boost-vault/Strings-Text-Processing/blob/master/thompson-... This is basically Russ Cox's much-touted-by-him regex implementation, Boost-ified. Any interested party can confirm for themselves that it is, in fact, slower than a snail on quaaludes. I asked Russ about that. He never bothered to reply. FWIW, in the Alioth language shootout for the regex-dna test, the fastest submitted C++ program uses xpressive: http://shootout.alioth.debian.org/u32/program.php?test=regexdna&lang=gpp&id=4 It came in 3rd behind Tcl and Javascript V8, but it uses a fraction of the memory of those two. This is with gcc. I'm guessing it could have done better with MSVC or Intel. (But not 4x better, alas.) DISCLAIMER: This is a very limited test. It would be seriously unwise to draw any significant conclusions from this. -- Eric Niebler BoostPro Computing http://www.boostpro.com