
I've been porting some test cases from Boost.Regex to Boost.Xpressive and tracking down the discrepencies (very few, thankfully). I've turned up what appear to be a couple of bugs in Boost.Regex. The regex "a(b)?c\\1d" successfully matches the string "acd". It shouldn't. A back-reference to a sub-matche that didn't participate in the match should not match. Perl, python and xpressive all agree on this point. As discussed previously, Boost.Regex treats [a-Z] as a legal regex, but it isn't. 'a' is 97 and 'Z' is 90, which makes this character range ill-formed, even when icase is specified. When matching "a(b+|((c)*))+d" against "abcd", Boost.Regex says the third sub-match should be "c", but perl says it should not participate in the match. I think perl is right here. The logic is: the quantified group 1 will match at least 3 times: first, it eats the b, next it eats the c, and finally, it matches an empty string. On this last iteration, the quantified group 3 will match zero times; hence, it has not participated in the match. (FWIW, xpressive has a bug in this area too, which I'm working on fixing.) This last bug plagues several of the test cases in test_tricky_cases.cpp. -- Eric Niebler Boost Consulting www.boost-consulting.com