--- In Boost-Users@yahoogroups.com, "Joshua B. Smith"
On Fri, Apr 25, 2003 at 05:35:06PM -0000, Dean wrote: <snip>
While I can believe that the design intention was that "\d{3}-" should be found in "1234567-" (at the fifth character), it seems inconsistent that it is *not* also found in "123456-" and "12345678- ". I'm seeing that inconsistent behavior.
It is not inconsistant because it fails to match then keeps going. It's all about greediness. For example:
searching for a{1}b in strings
1) ab 2) aab 3) aaab
searches correctly on 1 and incorrectly on 3 but not on 2 because
a{1}b ab searches (correct) a{1}b aab Fails because it matched the two a's and then stopped because the string is done a{1}b aaab Fails on aa then begins to scan again and finds ab which fits the regex a{1}b
Makes sense? <snip>
That's what I (eventually) guessed was happening. Thanks for confirming my suspicion. However, it still seems possible that this was not the original design intention. I suppose only John Maddock can answer that question... It seems to me that when the code finds more than 1 "a", it should either: 1) skip past all subsequent "a"s before starting the scan again. This would cause "a{1}b" to be found in "ab" but not "aab", "aaab", "aaaab", etc. This would be very "greedy". :-) Or: 2) restart the scan 1 character after where the previous scan started. This would cause "a{1}b" to be found in "ab", "aab", "aaab", "aaaab", etc. FWIW, I'm told that the regex searcher in the .NET Framework exhibits behavior #1. I mention that only as a point of reference -- I realize that different implementations can have somewhat different correct behaviors. Anyway, it is either a bug or a "gotcha". I've been using regexs occasionaly for over 10 years and it "got" me. :-) Thanks again for the help! --Dean