regex bug (regex_search throws bad_expression)

hi! using the latest regex patch and vc7.1 i've accidentally encountered the following: ... regex re("([^\n]*\\n+\\s+)+NEEDEDSUBITEM2:[^\\s]"); bool matched = regex_search(text, re); // bad_expression ... i figured out those \\s -s may cause the problem. sample text attached. and one question: having "DATA.*?ITEM1(ITEM2)?" and an input like "DATA ITEM1 ITEM1ITEM2" should ITEM2 be extracted? i think it would be good to make a note on this case in the doc. yours, adam begin 666 match_test.txt M#0I)5$5-5%E014$Z#0H@(" @4U5"251%33$Z('AX> T*#0H@(" @3D5%1$5$ M4U5"251%33$Z(" @('AX>'@-"B @("!.145$141354))5$5-.B @("!X>'AX M#0H-"DE414U465!%0CH-"B @("!354))5$5-,3H@>'AX#0H-"B @("!.145$ M141354))5$5-,3H@(" @>'AX> T*(" @($Y%141%1%-50DE414TR.B @("!X '>'AX#0H-"@`` ` end

Adam Molnar wrote:
hi!
using the latest regex patch and vc7.1 i've accidentally encountered the following:
... regex re("([^\n]*\\n+\\s+)+NEEDEDSUBITEM2:[^\\s]"); bool matched = regex_search(text, re); // bad_expression
You are putting a new-line character in your regular expression, "([^\n". Perhaps that is causing your problem.

using the latest regex patch and vc7.1 i've accidentally encountered the following:
... regex re("([^\n]*\\n+\\s+)+NEEDEDSUBITEM2:[^\\s]"); bool matched = regex_search(text, re); // bad_expression
You are putting a new-line character in your regular expression, "([^\n". Perhaps that is causing your problem.
no, that's just a typo. the same with "\\n"

using the latest regex patch and vc7.1 i've accidentally encountered the following:
... regex re("([^\n]*\\n+\\s+)+NEEDEDSUBITEM2:[^\\s]"); bool matched = regex_search(text, re); // bad_expression
I think the problem is that the first repeated section: ([^\n]*\\n+\\s+)+ starts and ends with repeats either of which can match repeated whitespace - this is what causes the matcher to thrash trying to find a match, eventually leading to it giving up and throwing an exception, I think you could make your expression much more precise by using: regex re("([^\n]*\\n+)+\\s+NEEDEDSUBITEM2:[^\\s]"); By moving the \s+ out side of the repeat like this the expression is now much more deterministic - it can only do one thing for any given input character.
and one question: having "DATA.*?ITEM1(ITEM2)?" and an input like "DATA ITEM1 ITEM1ITEM2" should ITEM2 be extracted? i think it would be good to make a note on this case in the doc.
No for Perl regexes, not sure for POSIX regexes (non-greedy repeats don't sit will with POSIX semantics in cases like this, I'd advise using Perl regexes only with non-greedy repeats). John.

starts and ends with repeats either of which can match repeated whitespace - this is what causes the matcher to thrash trying to find a match, eventually leading to it giving up and throwing an exception, I think you could make your expression much more precise by using:
regex re("([^\n]*\\n+)+\\s+NEEDEDSUBITEM2:[^\\s]");
By moving the \s+ out side of the repeat like this the expression is now much more deterministic - it can only do one thing for any given input character.
indeed refactoring the expression solved the problem, thanks! adam
participants (3)
-
Adam Molnar
-
Edward Diener
-
John Maddock