[regex] match_partial and regex_search

I noticed some surprising behavior with match_partial and regex_search. Consider: regex e("abc|b"); string str("ab"); smatch what; if(regex_search(str, what, e, match_default | match_partial)) { cout << (what[0].matched ? "full" : "partial") << '\n'; } This code displays "partial". Clearly, regex_search is bombing out just as soon as it finds a match, partial or otherwise. But in this case, if it kept looking, it would find a full match. My understanding is that full matches are always preferred to partial matches. I couldn't find any discussion about this case in the regex docs or the std proposal. Did I miss it? What's the intention here? Is the std proposal underspecified? -- Eric Niebler Boost Consulting www.boost-consulting.com

I noticed some surprising behavior with match_partial and regex_search. Consider:
regex e("abc|b"); string str("ab"); smatch what; if(regex_search(str, what, e, match_default | match_partial)) { cout << (what[0].matched ? "full" : "partial") << '\n'; }
This code displays "partial". Clearly, regex_search is bombing out just as soon as it finds a match, partial or otherwise. But in this case, if it kept looking, it would find a full match. My understanding is that full matches are always preferred to partial matches. I couldn't find any discussion about this case in the regex docs or the std proposal. Did I miss it? What's the intention here? Is the std proposal underspecified?
It's so underspecified it's not there at all! (it got removed 'cos we couldn't figure out the right wording, even though everyone agreed it was a useful feature). As far as current Boost.Regex is concerned: it prefers in order: 1) The leftmost match. 2) A full match. 3) The longest match (if it's a POSIX expression), otherwise a "depth first search" match (Perl expressions). It's the "leftmost" bit that's getting you here. To be honest I'm not sure what the right thing to do is here, I can imagine situations when either a full or a partial match would be the correct answer in this case. At the very least, I'll have another look at the docs. Thanks, John.

Following up, after sleeping on this for a bit.... John Maddock wrote:
I noticed some surprising behavior with match_partial and regex_search. Consider:
regex e("abc|b"); string str("ab"); smatch what; if(regex_search(str, what, e, match_default | match_partial)) { cout << (what[0].matched ? "full" : "partial") << '\n'; }
This code displays "partial". Clearly, regex_search is bombing out just as soon as it finds a match, partial or otherwise. But in this case, if it kept looking, it would find a full match. My understanding is that full matches are always preferred to partial matches. I couldn't find any discussion about this case in the regex docs or the std proposal. Did I miss it? What's the intention here? Is the std proposal underspecified?
It's so underspecified it's not there at all! (it got removed 'cos we couldn't figure out the right wording, even though everyone agreed it was a useful feature).
Oh, right. I was there. Duh.
As far as current Boost.Regex is concerned: it prefers in order:
1) The leftmost match. 2) A full match. 3) The longest match (if it's a POSIX expression), otherwise a "depth first search" match (Perl expressions).
It's the "leftmost" bit that's getting you here. To be honest I'm not sure what the right thing to do is here, I can imagine situations when either a full or a partial match would be the correct answer in this case.
After pondering this for a bit, I am now of the opinion that the current behavior (bombing out of regex_search on partial matches rather that searching for a full match) is correct. I figured I'd share my reasoning and record it here for posterity. There are 2 use cases for match_partial... <<Interactive user input validation>> In this case, the only thing that matters is whether the input is invalid. So it doesn't matter whether we return a full match or a partial match because they mean the same thing: not invalid. <<Data pull>> When matching buffered data, match_partial is used to find matches that span chunks. Ideally, it should be possible to use a buffering scheme together with match_partial to find the same matches as if the data hadn't been chunked. In this case, you want regex_search to quit early and return a partial match so you can read more data and retry. In the example I gave above, matching the pattern "abc|b" against the string "ab", in a data-pull scenario, it's possible that the next chunk of text begins with a "c", in that case, the leftmost, longest match is "abc" (where the text "abc" spans the two chunks of text), not "b". Quitting early with a partial match gives users the chance to retry and find the leftmost longest match. -- Eric Niebler Boost Consulting www.boost-consulting.com

"John Maddock" <john@johnmaddock.co.uk> wrote in message news:022901c55616$1b2bed20$030e1452@fuji...
I noticed some surprising behavior with match_partial and regex_search. Consider:
regex e("abc|b"); string str("ab"); smatch what; if(regex_search(str, what, e, match_default | match_partial)) { cout << (what[0].matched ? "full" : "partial") << '\n'; } This code displays "partial". Clearly, regex_search is bombing out just as soon as it finds a match, partial or otherwise. But in this case, if it kept looking, it would find a full match.
Very stupid question, I guess, but I had to ask: The regex "abc|b" means either "abc" or "b". How could "ab" provide a full match if it kept looking ... ? Rob.

Robert Mathews wrote:
"John Maddock" <john@johnmaddock.co.uk> wrote in message news:022901c55616$1b2bed20$030e1452@fuji...
I noticed some surprising behavior with match_partial and regex_search. Consider:
regex e("abc|b"); string str("ab"); smatch what; if(regex_search(str, what, e, match_default | match_partial)) { cout << (what[0].matched ? "full" : "partial") << '\n'; } This code displays "partial". Clearly, regex_search is bombing out just as soon as it finds a match, partial or otherwise. But in this case, if it kept looking, it would find a full match.
Very stupid question, I guess, but I had to ask:
The regex "abc|b" means either "abc" or "b". How could "ab" provide a full match if it kept looking ... ?
It could match the "b" in "ab". (We're talking about regex_search, not regex_match.) -- Eric Niebler Boost Consulting www.boost-consulting.com
participants (3)
-
Eric Niebler
-
John Maddock
-
Robert Mathews