
On 10/2/2010 5:52 PM, Erik Rydgren wrote:
Hi!
My company have been using pcre for a long while but there has been gripes about it only returning the last matched value from a capture group. Because of this I have been searching for a C++ regex engine that can handle the same stuff as the .NET implementation can do, to no avail. During my searches I've stumbled on several forum threads where others have been searching for the same thing but it doesn't seem to exist a regular expression library in C/C++ that handles both named captures and multicapture.
Found boost.xpressive and it had almost everything we need. It's open source, fast, got flexible api and named captures. But alas, just as all other C and C++ based implementations I have found, it lacked multiple captures.
It sort of has them, just not in the form you happen to be looking for. In xpressive, you can call named regexes from other regexes. When you do that, you end up with nested match_results. If you quantify a named regex, you end up with a sequence of match_results, kind of like multicapture. Sadly, it's not very efficient to create a tree of match_results, and xpressive gives you no help in navigating this tree. It's a bit of an ugly hack.
Yea, I realized that but it wasn't practical for our needs. I also did a static solution that used actions to make the captures just to try them out.
FWIW, Boost.Regex has multicapture if you compile with a certain flag, IIRC.
Ok, I didn't know that. Will take a second look at Boost.Regex then.
So, I added it.
Whoa, cool!
That is the respose I was hoping for :)
On top of that I added support for balancing groups (http://blog.stevenlevithan.com/archives/balancing-groups). But the syntax for the pop capture and capture conditional is slightly different then the .NET version to better fit xpressive.
Syntax for pop capture: dynamic: (?P<-name>stuff) static: (name -= stuff)
Syntax for capture conditional: dynamic: (?P(name)stuff) static: (name &= stuff)
There is no support for the (?<name-othername>stuff) construct.
I'll need to read up on what those constructs do. Can you send some pointers?
I already did, this blogpost explains them without fuss http://blog.stevenlevithan.com/archives/balancing-groups. But the very short version is that a (?P<-tag>exp) first matches exp then removes the last capture from an earlier group named tag. If the tag group haven't captured anything yet it fails and backtracks. The (?P(tag)exp) is a shorthand if-then-else where the else part always matches. Pseudo code: if (tag has matched) { exp must match } else { true }. To demonstrate here is the regression definitions I've made ; multi capture [test175] str=aabb pat=(..)* br0=aabb cp0_0=aabb br1=bb cp1_0=aa cp1_1=bb [end] ; multi capture several groups [test176] str=abba pat=(.){2}(.){2} br0=abba cp0_0=abba br1=b cp1_0=a cp1_1=b br2=a cp2_0=b cp2_1=a [end] ; multi capture, pop capture with backreference, check capture [test177] str=startabccbarest pat=^(.*?)(?P<n>.)+(?P<-n>(?P=n))+(?P(n)(?!))(.*)$ br0=startabccbarest cp0_0=startabccbarest br1=start cp1_0=start br2= br3=rest cp3_0=rest [end] ; match count [test178] str=aabb pat=^(?P<n>a)*(?P<-n>b)*(?P(n)(?!))$ br0=aabb br1= [end] ; match count, fail on pop [test179] str=aabbb pat=^(?P<n>a)*(?P<-n>b)*$ [end] ; match count, fail on check [test180] str=aab pat=^(?P<n>a)*(?P<-n>b)*(?P(n)(?!))$ [end]
All captures made by a group is stored in sub_match::captures which is a vector of sub_match_capture objects. A sub_match_capture behaves like a stripped down sub_match. It can be put in an ostream and has a length and helper function for returning a string.
The changes are in the vault and can be found here: http://tinyurl.com/3aak7mp
It can be unpacked against trunk from 2010-10-02 or the 1.44.0 release. I've run the dynamic regression tests without errors and I have added some tests for the new functionality. The code it only tested on Visual Studio 2010 since I don't have access to any other compiler.
Please give feedback on my changes since I would love to see them in an official release. Thanks in advance.
This sounds really great and I have every intention of taking this change once I grok it and look over the code. Can you open a feature request ticket at http://svn.boost.org so I don't forget, because I'm a little busy at the moment.
Will do.
-- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost