find the subexpression that matched

Hi is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following? for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) { thanks in advance Lorenzo

Lorenzo Bettini wrote:
Hi
is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?
for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {
If you have more than one sub-expression in your RE, there can be any number of matched strings up to the number of sub-expressions.

Edward Diener wrote:
Lorenzo Bettini wrote:
Hi
is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?
for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {
If you have more than one sub-expression in your RE, there can be any number of matched strings up to the number of sub-expressions.
right, but also in this case, my question could be generalized: why not having a set of indexes of matched sub-expressions?

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Monday, February 07, 2005 3:30 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched
is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?
for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {
Not currently, sorry, might be a useful addition though...
I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take? That seemed to me related enough to the original question to justify appending it, rather than starting a new thread. Reid

I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take?
My take is it's already been done: see the section on repeated captures in http://www.boost.org/libs/regex/doc/captures.html Does that answer your question? John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Thursday, February 10, 2005 3:45 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched
I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take?
My take is it's already been done: see the section on repeated captures in http://www.boost.org/libs/regex/doc/captures.html
Does that answer your question?
It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>); that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app. Reid

It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>);
Well then I'm not sure I understand what it is you want, you get a count of how many times a sub-expression was (repeatedly) matched from match_results_object.capture(n).size(); Is that what you wanted or not? If not you've lost me :-)
that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app.
It adds more overhead than you think: the problem is keeping the "count" correctly scoped as you backtrack etc. If we can agree on what we're actually talking about here, then something might be possible, but I'm not sure we're on the same wavelength yet <g>. John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Monday, February 14, 2005 4:19 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched
It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>);
Well then I'm not sure I understand what it is you want, you get a count of how many times a sub-expression was (repeatedly) matched from match_results_object.capture(n).size();
Is that what you wanted or not? If not you've lost me :-)
Looking more closely, yeah, I think it does. I believe I was misreading what the size() member returned; I was thinking it was the length of the matched string. I've spent a lot more time looking through PCRE source than Boost's Regex Library (my last gig used PCRE heavily); that's something I'm just starting to correct. <g>
that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app.
It adds more overhead than you think: the problem is keeping the "count" correctly scoped as you backtrack etc. If we can agree on what we're actually talking about here, then something might be possible, but I'm not sure we're on the same wavelength yet <g>.
Yeah, well my tuner burned out years ago <g>. Likely not. What are you referring to by the term "scope?" Sections of matched structure delimited by non-possessive capture braces? If that's not it, then I think I've gone through "The Scary Door." <g> But you're right that I hadn't thought about backtracking issues. Reid

Yeah, well my tuner burned out years ago <g>. Likely not. What are you referring to by the term "scope?" Sections of matched structure delimited by non-possessive capture braces?
Yes, I mean the bit you're matched so far, and may have to "unwind" if the current part of the search-tree doesn't find a successful match.
If that's not it, then I think I've gone through "The Scary Door." <g>
Welcome to Boost :-) John.
participants (4)
-
Edward Diener
-
John Maddock
-
Lorenzo Bettini
-
Reid Sweatman