[boost] [regex] couple o' bugs

2 Jan 2006

      I've been porting some test cases from Boost.Regex to Boost.Xpressive 
and tracking down the discrepencies (very few, thankfully). I've turned 
up what appear to be a couple of bugs in Boost.Regex.

The regex "a(b)?c\\1d" successfully matches the string "acd". It 
shouldn't. A back-reference to a sub-matche that didn't participate in 
the match should not match. Perl, python and xpressive all agree on this 
point.

As discussed previously, Boost.Regex treats [a-Z] as a legal regex, but 
it isn't. 'a' is 97 and 'Z' is 90, which makes this character range 
ill-formed, even when icase is specified.

When matching "a(b+|((c)*))+d" against "abcd", Boost.Regex says the 
third sub-match should be "c", but perl says it should not participate 
in the match. I think perl is right here. The logic is: the quantified 
group 1 will match at least 3 times: first, it eats the b, next it eats 
the c, and finally, it matches an empty string. On this last iteration, 
the quantified group 3 will match zero times; hence, it has not 
participated in the match. (FWIW, xpressive has a bug in this area too, 
which I'm working on fixing.)

This last bug plagues several of the test cases in test_tricky_cases.cpp.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com