find the subexpression that matched - Boost - lists.preview.boost.org

newer
Re: Re: Re: Re: Re: Querying the...

find the subexpression that matched

older
[iostreams] Link errors on...

Lorenzo Bettini

6 Feb 2005 6 Feb '05

2:49 p.m.

Hi is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following? for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) { thanks in advance Lorenzo

Reply

Sign in to reply online Use email software

Show replies by date

Edward Diener

6 Feb 6 Feb

4:49 p.m.

Lorenzo Bettini wrote:

Hi

is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

If you have more than one sub-expression in your RE, there can be any number of matched strings up to the number of sub-expressions.

Reply

Sign in to reply online Use email software

Lorenzo Bettini

5:34 p.m.

Edward Diener wrote:

Lorenzo Bettini wrote:

...
Hi

is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

If you have more than one sub-expression in your RE, there can be any number of matched strings up to the number of sub-expressions.

right, but also in this case, my question could be generalized: why not having a set of indexes of matched sub-expressions?

Reply

Sign in to reply online Use email software

John Maddock

7 Feb 7 Feb

10:30 a.m.

is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

Not currently, sorry, might be a useful addition though... John.

Reply

Sign in to reply online Use email software

Reid Sweatman

10 Feb 10 Feb

4:52 a.m.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Monday, February 07, 2005 3:30 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched

...
is there an immediate way to access the sub-expression that matched instead of checking each "what" as in the following?

for (unsigned int i = 1; i < what.size(); ++i) { if (what[i].matched) {

Not currently, sorry, might be a useful addition though...

I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take? That seemed to me related enough to the original question to justify appending it, rather than starting a new thread. Reid

Reply

Sign in to reply online Use email software

John Maddock

10:45 a.m.

I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take?

My take is it's already been done: see the section on repeated captures in http://www.boost.org/libs/regex/doc/captures.html Does that answer your question? John.

Reply

Sign in to reply online Use email software

Reid Sweatman

14 Feb 14 Feb

9:52 a.m.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Thursday, February 10, 2005 3:45 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched

...
I'd like to tag a question onto the coattails of this one. I haven't given it much thought, so if there's a terribly obvious reason this wouldn't work from a theoretical standpoint (or it's already in the implementation), please let me down easy. <g> Okay, I know there's no reasonable way to parse arbitrarily-nested constructs with regexes, but it's always seemed to me that it might be almost as useful to be able to extract the number of times a captured submatch with one of the repetition operators following actually matched. I also can't think of a reason it couldn't be done, off the top of my head. Your take?

My take is it's already been done: see the section on repeated captures in http://www.boost.org/libs/regex/doc/captures.html

Does that answer your question?

It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>); that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app. Reid

Reply

Sign in to reply online Use email software

John Maddock

11:18 a.m.

It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>);

Well then I'm not sure I understand what it is you want, you get a count of how many times a sub-expression was (repeatedly) matched from match_results_object.capture(n).size(); Is that what you wanted or not? If not you've lost me :-)

that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app.

It adds more overhead than you think: the problem is keeping the "count" correctly scoped as you backtrack etc. If we can agree on what we're actually talking about here, then something might be possible, but I'm not sure we're on the same wavelength yet <g>. John.

Reply

Sign in to reply online Use email software

Reid Sweatman

15 Feb 15 Feb

1:29 p.m.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Monday, February 14, 2005 4:19 AM To: boost@lists.boost.org Subject: Re: [boost] find the subexpression that matched

...
It may, but not in the sense I meant it. You'd have to iterate the returned captures(i) sequence and parse its contents to get a count (unless I've badly misunderstood what you meant, which is entirely possible, given the hour <g>);

Well then I'm not sure I understand what it is you want, you get a count of how many times a sub-expression was (repeatedly) matched from match_results_object.capture(n).size();

Is that what you wanted or not? If not you've lost me :-)

Looking more closely, yeah, I think it does. I believe I was misreading what the size() member returned; I was thinking it was the length of the matched string. I've spent a lot more time looking through PCRE source than Boost's Regex Library (my last gig used PCRE heavily); that's something I'm just starting to correct. <g>

...
that's likely to be considerable overhead on top of an already slow option (and one that must be compiled in). All I wanted was a low-overhead count of each captured submatch, say, a count() member on the plain return container. Am I wrong in thinking that maintaining a count wouldn't adversely affect the performance of the algorithm to the extent the full existing option does? I don't know about others, but this is a feature I'd personally be making a lot of use of, so I'd hate to compile in a lot of overhead for the entire app.

It adds more overhead than you think: the problem is keeping the "count" correctly scoped as you backtrack etc. If we can agree on what we're actually talking about here, then something might be possible, but I'm not sure we're on the same wavelength yet <g>.

Yeah, well my tuner burned out years ago <g>. Likely not. What are you referring to by the term "scope?" Sections of matched structure delimited by non-possessive capture braces? If that's not it, then I think I've gone through "The Scary Door." <g> But you're right that I hadn't thought about backtracking issues. Reid

Reply

Sign in to reply online Use email software

John Maddock

2:04 p.m.

Yeah, well my tuner burned out years ago <g>. Likely not. What are you referring to by the term "scope?" Sections of matched structure delimited by non-possessive capture braces?

Yes, I mean the bit you're matched so far, and may have to "unwind" if the current part of the search-tree doesn't find a successful match.

If that's not it, then I think I've gone through "The Scary Door." <g>

Welcome to Boost :-) John.

Reply

Sign in to reply online Use email software

7485

Age (days ago)

7494

Last active (days ago)

Download

9 comments

4 participants

tags

participants (4)

Edward Diener
John Maddock
Lorenzo Bettini
Reid Sweatman