How efficient is the boost::regex library?

Good afternoon, I am looking for the most efficient open-source C++ regex library. Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png How does the boost::regex library compare? Would you recommend boost::regex as the most efficient one, or would you suggest another? Thanks for all information, Alec Taylor

I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP). It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks. HTH, John.

Hi: So which of these libraries is easiest to develop against? Matthew -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Thursday, October 27, 2011 2:11 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] How efficient is the boost::regex library?
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP). It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks. HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Thanks John. I would be interested in seeing comparisons of
boost:regex with other regex libraries.
Yesterday I found a fuzzy logic string regex library. Does the
boost::regex library support this?
On Fri, Oct 28, 2011 at 5:10 AM, John Maddock
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).
It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.
It's wildly out of date, but how about: http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/vc71-performance.html
Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?
No sorry,
HTH, John.
On Fri, Oct 28, 2011 at 5:10 AM, John Maddock
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).
It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Adding to what John has said, if I may I'll offer some anecdotal advice.
Efficient use of regular expressions with /any/ implementation of a
regex library has a lot more to do with you writing an intelligent
regular expression, and knowing when it's appropriate to use them.
Certainly the engine helps, but if you write a stupid regular
expression, or attempt to use an RE for a task REs aren't well-suited
for, you won't get great results.
You can find corner cases where perl-compatible REs have poor
performance, but there are features offered that make it possible to
tweak that regular expression to get better performance.
I've used regular expressions from perl, ruby, python, C#, PCRE,
boost::regex, grep, awk, sed, mysql, postgres, and probably a few
others. Most commonly I prefer grep for one-liners, and
perl-compatible engines in code; I don't run into the corner cases in
my work for the simple fact that they're corner cases -- that is to
say, you don't run into them unless you're looking for them, or you do
something otherwise out-of-the-ordinary.
-Brian
On Fri, Oct 28, 2011 at 2:50 AM, John Maddock
Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.
It's wildly out of date, but how about: http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/vc71-performance.html
Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?
No sorry,
HTH, John.
On Fri, Oct 28, 2011 at 5:10 AM, John Maddock
wrote: I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).
It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On 28.10.2011 02:20, Alec Taylor wrote:
I would be interested in seeing comparisons of boost:regex with other regex libraries.

On 28 October 2011 00:20, Alec Taylor
Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.
This might also be of interest, http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/boost_xpressive.... Note that it's "lies, damn lies, statistics, and benchmarks" though.
Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?
On Fri, Oct 28, 2011 at 5:10 AM, John Maddock
wrote: I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).
It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).
How does the boost::regex library compare?
Would you recommend boost::regex as the most efficient one, or would you suggest another?
There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.
HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Jeroen

Hello,
On Thu, Oct 27, 2011 at 1:10 PM, John Maddock
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.htmlhttp://swtch.com/~rsc/regexp/regexp1.html- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.pnghttp://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.
I'd be very interested in seeing the data that show this. Thanks, Will

I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.htmlhttp://swtch.com/~rsc/regexp/regexp1.html- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.pnghttp://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.
I'd be very interested in seeing the data that show this.
I confess I don't have any myself, but I recall Eric Niebler mentioning that he ran some experiments on this, but you know... lies damn lies and benchmarks and all that... John.

On 10/28/2011 10:05 AM, John Maddock wrote:
I am looking for the most efficient open-source C++ regex library.
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.htmlhttp://swtch.com/~rsc/regexp/regexp1.html- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.pnghttp://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png
This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.
I'd be very interested in seeing the data that show this.
I confess I don't have any myself, but I recall Eric Niebler mentioning that he ran some experiments on this, but you know... lies damn lies and benchmarks and all that...
I once coded up a Thompson NFA and put it in the old Boost File Vault. I have no idea whatever became of the file vault, but with a quick google search, I found the code here: https://github.com/boost-vault/Strings-Text-Processing/blob/master/thompson-... This is basically Russ Cox's much-touted-by-him regex implementation, Boost-ified. Any interested party can confirm for themselves that it is, in fact, slower than a snail on quaaludes. I asked Russ about that. He never bothered to reply. FWIW, in the Alioth language shootout for the regex-dna test, the fastest submitted C++ program uses xpressive: http://shootout.alioth.debian.org/u32/program.php?test=regexdna&lang=gpp&id=4 It came in 3rd behind Tcl and Javascript V8, but it uses a fraction of the memory of those two. This is with gcc. I'm guessing it could have done better with MSVC or Intel. (But not 4x better, alas.) DISCLAIMER: This is a very limited test. It would be seriously unwise to draw any significant conclusions from this. -- Eric Niebler BoostPro Computing http://www.boostpro.com
participants (8)
-
Alec Taylor
-
Andrei Lapshin
-
Brian Vandenberg
-
Eric Niebler
-
Harelick, Matthew
-
Jeroen Habraken
-
John Maddock
-
Will Mason