
On 10/2/2010 5:52 PM, Erik Rydgren wrote:
Hi!
My company have been using pcre for a long while but there has been gripes about it only returning the last matched value from a capture group. Because of this I have been searching for a C++ regex engine that can handle the same stuff as the .NET implementation can do, to no avail. During my searches I've stumbled on several forum threads where others have been searching for the same thing but it doesn't seem to exist a regular expression library in C/C++ that handles both named captures and multicapture.
Found boost.xpressive and it had almost everything we need. It's open source, fast, got flexible api and named captures. But alas, just as all other C and C++ based implementations I have found, it lacked multiple captures.
It sort of has them, just not in the form you happen to be looking for. In xpressive, you can call named regexes from other regexes. When you do that, you end up with nested match_results. If you quantify a named regex, you end up with a sequence of match_results, kind of like multicapture. Sadly, it's not very efficient to create a tree of match_results, and xpressive gives you no help in navigating this tree. It's a bit of an ugly hack. FWIW, Boost.Regex has multicapture if you compile with a certain flag, IIRC.
So, I added it.
Whoa, cool!
On top of that I added support for balancing groups (http://blog.stevenlevithan.com/archives/balancing-groups). But the syntax for the pop capture and capture conditional is slightly different then the .NET version to better fit xpressive.
Syntax for pop capture: dynamic: (?P<-name>stuff) static: (name -= stuff)
Syntax for capture conditional: dynamic: (?P(name)stuff) static: (name &= stuff)
There is no support for the (?<name-othername>stuff) construct.
I'll need to read up on what those constructs do. Can you send some pointers?
All captures made by a group is stored in sub_match::captures which is a vector of sub_match_capture objects. A sub_match_capture behaves like a stripped down sub_match. It can be put in an ostream and has a length and helper function for returning a string.
The changes are in the vault and can be found here: http://tinyurl.com/3aak7mp
It can be unpacked against trunk from 2010-10-02 or the 1.44.0 release. I've run the dynamic regression tests without errors and I have added some tests for the new functionality. The code it only tested on Visual Studio 2010 since I don't have access to any other compiler.
Please give feedback on my changes since I would love to see them in an official release. Thanks in advance.
This sounds really great and I have every intention of taking this change once I grok it and look over the code. Can you open a feature request ticket at http://svn.boost.org so I don't forget, because I'm a little busy at the moment. -- Eric Niebler BoostPro Computing http://www.boostpro.com