Re: [Boost-users] How efficient is the boost::regex library?

29 Oct 2011

      On 10/28/2011 10:05 AM, John Maddock wrote:
...
...
...
I am looking for the most efficient open-source C++ regex library.
...
Reading this article:
http://swtch.com/~rsc/regexp/**regexp1.html<http://swtch.com/~rsc/regexp/regexp1.html>-
It
seems that GNU awk is the best overall:
http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.png<http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png>
This is all true, but also completely irrelevant. DFA's have good worst
case behaviour, but can be many times slower for common cases.
I'd be very interested in seeing the data that show this.
I confess I don't have any myself, but I recall Eric Niebler mentioning
that he ran some experiments on this,  but you know... lies damn lies
and benchmarks and all that...
I once coded up a Thompson NFA and put it in the old Boost File Vault. I
have no idea whatever became of the file vault, but with a quick google
search, I found the code here:

https://github.com/boost-vault/Strings-Text-Processing/blob/master/thompson-...

This is basically Russ Cox's much-touted-by-him regex implementation,
Boost-ified. Any interested party can confirm for themselves that it is,
in fact, slower than a snail on quaaludes. I asked Russ about that. He
never bothered to reply.

FWIW, in the Alioth language shootout for the regex-dna test, the
fastest submitted C++ program uses xpressive:

http://shootout.alioth.debian.org/u32/program.php?test=regexdna&lang=gpp&id=4

It came in 3rd behind Tcl and Javascript V8, but it uses a fraction of
the memory of those two. This is with gcc. I'm guessing it could have
done better with MSVC or Intel. (But not 4x better, alas.)

DISCLAIMER: This is a very limited test. It would be seriously unwise to
draw any significant conclusions from this.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com