How efficient is the boost::regex library?

Alec Taylor

27 Oct 2011 27 Oct '11

6 a.m.

Good afternoon, I am looking for the most efficient open-source C++ regex library. Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png How does the boost::regex library compare? Would you recommend boost::regex as the most efficient one, or would you suggest another? Thanks for all information, Alec Taylor

Show replies by date

John Maddock

27 Oct 27 Oct

6:10 p.m.

...

I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP). It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).

...

How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

Harelick, Matthew

6:49 p.m.

New subject: How about easiest to use? RE: How efficient is the boost::regex library?

Hi: So which of these libraries is easiest to develop against? Matthew -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Thursday, October 27, 2011 2:11 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] How efficient is the boost::regex library?

...

I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

...

How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks. HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Alec Taylor

10:20 p.m.

Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries. Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this? On Fri, Oct 28, 2011 at 5:10 AM, John Maddock <boost.regex@virgin.net> wrote:

...

...
I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).

It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).

...
How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

John Maddock

28 Oct 28 Oct

8:50 a.m.

...

Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.

It's wildly out of date, but how about: http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/vc71-performance.html

...

Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?

No sorry, HTH, John. On Fri, Oct 28, 2011 at 5:10 AM, John Maddock <boost.regex@virgin.net> wrote:

...

...
I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).

It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).

...
How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Brian Vandenberg

1:44 p.m.

Adding to what John has said, if I may I'll offer some anecdotal advice. Efficient use of regular expressions with /any/ implementation of a regex library has a lot more to do with you writing an intelligent regular expression, and knowing when it's appropriate to use them. Certainly the engine helps, but if you write a stupid regular expression, or attempt to use an RE for a task REs aren't well-suited for, you won't get great results. You can find corner cases where perl-compatible REs have poor performance, but there are features offered that make it possible to tweak that regular expression to get better performance. I've used regular expressions from perl, ruby, python, C#, PCRE, boost::regex, grep, awk, sed, mysql, postgres, and probably a few others. Most commonly I prefer grep for one-liners, and perl-compatible engines in code; I don't run into the corner cases in my work for the simple fact that they're corner cases -- that is to say, you don't run into them unless you're looking for them, or you do something otherwise out-of-the-ordinary. -Brian On Fri, Oct 28, 2011 at 2:50 AM, John Maddock <boost.regex@virgin.net> wrote:

...

...
Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.

It's wildly out of date, but how about: http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/vc71-performance.html

...
Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?

No sorry,

HTH, John.

On Fri, Oct 28, 2011 at 5:10 AM, John Maddock <boost.regex@virgin.net> wrote:

...
...
I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).

It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).

...
How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Andrei Lapshin

11:17 a.m.

On 28.10.2011 02:20, Alec Taylor wrote:

...

I would be interested in seeing comparisons of boost:regex with other regex libraries.

http://lh3lh3.users.sourceforge.net/reb.shtml

Jeroen Habraken

2:03 p.m.

On 28 October 2011 00:20, Alec Taylor <alec.taylor6@gmail.com> wrote:

...

Thanks John. I would be interested in seeing comparisons of boost:regex with other regex libraries.

This might also be of interest, http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/boost_xpressive.... Note that it's "lies, damn lies, statistics, and benchmarks" though.

...

Yesterday I found a fuzzy logic string regex library. Does the boost::regex library support this?

On Fri, Oct 28, 2011 at 5:10 AM, John Maddock <boost.regex@virgin.net> wrote:

...
...
I am looking for the most efficient open-source C++ regex library.

Reading this article: http://swtch.com/~rsc/regexp/regexp1.html - It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases. It's also impossible to implement a DFA matcher that offers the full range of Perl regular expression features (if you think it can be done, congratulations, you've just proved that P==NP).

It's also possible to protect the regex engine against runaway "bad" expressions and bail out in those cases (this is what Boost.Regex does, it throws an exception if the complexity of obtaining a match grows too fast).

...
How does the boost::regex library compare?

Would you recommend boost::regex as the most efficient one, or would you suggest another?

There's no such thing as best - it all depends on the data being searched and the particular regular expression. In addition since most Perl-compatible libraries use much the same algorithm they're all broadly similar albeit with different quirks.

HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Jeroen

Will Mason

3:16 p.m.

Hello, On Thu, Oct 27, 2011 at 1:10 PM, John Maddock <boost.regex@virgin.net>wrote:

...

I am looking for the most efficient open-source C++ regex library.

...
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.html<http://swtch.com/~rsc/regexp/regexp1.html>- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.png<http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png>

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.

I'd be very interested in seeing the data that show this. Thanks, Will

John Maddock

5:05 p.m.

...

...
I am looking for the most efficient open-source C++ regex library.

...
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.html<http://swtch.com/~rsc/regexp/regexp1.html>- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.png<http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png>

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.

I'd be very interested in seeing the data that show this.

I confess I don't have any myself, but I recall Eric Niebler mentioning that he ran some experiments on this, but you know... lies damn lies and benchmarks and all that... John.

Eric Niebler

29 Oct 29 Oct

5:23 a.m.

On 10/28/2011 10:05 AM, John Maddock wrote:

...

...
...
I am looking for the most efficient open-source C++ regex library.

...
Reading this article: http://swtch.com/~rsc/regexp/**regexp1.html<http://swtch.com/~rsc/regexp/regexp1.html>- It seems that GNU awk is the best overall: http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.png<http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png>

This is all true, but also completely irrelevant. DFA's have good worst case behaviour, but can be many times slower for common cases.

I'd be very interested in seeing the data that show this.

I confess I don't have any myself, but I recall Eric Niebler mentioning that he ran some experiments on this, but you know... lies damn lies and benchmarks and all that...

I once coded up a Thompson NFA and put it in the old Boost File Vault. I have no idea whatever became of the file vault, but with a quick google search, I found the code here: https://github.com/boost-vault/Strings-Text-Processing/blob/master/thompson-... This is basically Russ Cox's much-touted-by-him regex implementation, Boost-ified. Any interested party can confirm for themselves that it is, in fact, slower than a snail on quaaludes. I asked Russ about that. He never bothered to reply. FWIW, in the Alioth language shootout for the regex-dna test, the fastest submitted C++ program uses xpressive: http://shootout.alioth.debian.org/u32/program.php?test=regexdna&lang=gpp&id=4 It came in 3rd behind Tcl and Javascript V8, but it uses a fraction of the memory of those two. This is with gcc. I'm guessing it could have done better with MSVC or Intel. (But not 4x better, alas.) DISCLAIMER: This is a very limited test. It would be seriously unwise to draw any significant conclusions from this. -- Eric Niebler BoostPro Computing http://www.boostpro.com

5005

Age (days ago)

5007

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

Alec Taylor
Andrei Lapshin
Brian Vandenberg
Eric Niebler
Harelick, Matthew
Jeroen Habraken
John Maddock
Will Mason