--- In Boost-Users@yahoogroups.com, "Joshua B. Smith"
On Thu, Apr 24, 2003 at 11:37:34PM -0000, Dean wrote:
Hi all,
... snip ...
\d{3}-\d{2}-\d{4}
As expected, that pattern was found in "123-12-1234" but not in "1234- 12-1234". However it *was* found in "1234567-12-1234".
Is this behavior by design or is it a bug?
It is (probably) design. Intervals can specify a min and a max, for example:
\d{3,3}-\d{2,2}-\d{4,4} will match "123-12-1234" but NOT "1234567- 12-1234". \d{3}-\d{2}-\d{4} will match "123-12-1234" but NOT "1234567-12- 1234" also.
I'm not sure what you were trying to say above, but my understanding is that the 2 patterns you just mentioned are equivalent. The docs say "{3}" is equivalent to "{3,3}" not "{3,}".
It will, however, yeild a correct search (there is a difference
between search
and match). You didn't mention if you were doing a regex_search or regex_match ?
I'm doing a search because I don't want to know whether the whole string matches but whether the regex is found in the string. Specifically, I'm doing: m_regex.Search( sampleBody, boost::match_default | boost::match_any) While I can believe that the design intention was that "\d{3}-" should be found in "1234567-" (at the fifth character), it seems inconsistent that it is *not* also found in "123456-" and "12345678- ". I'm seeing that inconsistent behavior.
FWIW, it's easy enough for me to workaround the current behavior with a pattern like this:
(^|[^a])a{1}b
You could use this, but I wouldn't recomend it (but that's just me...regex construction is deeply personal :) ). HTH.
I realize there is more than one way to do it, and I'd be interested in what you'd recommend. FWIW, in our SSN-matching case, we'll probably just use "\b\d{3}-\d {2}-\d{4}\b". Thanks for the reply! --Dean