Boost.Regex - problem with non-marking parenthesis
I have a regular expression that compiles & matches as expected, viz: boost::regex rxBlah("([a-z](_?[a-z0-9])*)_dvs", boost::regex_constants::icase); I would like to make the inner parentheses non-marking, so I modified the regex as below: boost::regex rxBlah("([a-z](?:_?[a-z0-9])*)_dvs", boost::regex_constants::icase); However, this throws a bad_expression exception on compilation, with the message "Invalid preceding regular expression". I've tested the second regex in another regex tool (The Regex Coach, http://weitz.de/regex-coach/) and it was accepted, and performed as expected - when presented with the text "a_dvs", it showed one capture, the text "a". So - help, what have I done wrong? Stuart Dootson
I have a regular expression that compiles & matches as expected, viz:
boost::regex rxBlah("([a-z](_?[a-z0-9])*)_dvs", boost::regex_constants::icase);
I would like to make the inner parentheses non-marking, so I modified the regex as below:
boost::regex rxBlah("([a-z](?:_?[a-z0-9])*)_dvs", boost::regex_constants::icase);
However, this throws a bad_expression exception on compilation, with the message "Invalid preceding regular expression". I've tested the second regex in another regex tool (The Regex Coach, http://weitz.de/regex-coach/) and it was accepted, and performed as expected - when presented with the text "a_dvs", it showed one capture, the text "a".
So - help, what have I done wrong?
The flags you're passing to the expression are wrong: just passing icase is roughly the same as basic|icase and POSIX basic expressions don't support Perl style features like non-marking parenthesis, but: boost::regex rxBlah("([a-z](?:_?[a-z0-9])*)_dvs", boost::regex_constants::perl | boost::regex_constants::icase); will do what you want. BTW from 1.33.0 onwards boost::regex_constants::perl now has a value of 0 precisely to avoid this problem. John.
On 12/13/05, John Maddock
I have a regular expression that compiles & matches as expected, viz:
boost::regex rxBlah("([a-z](_?[a-z0-9])*)_dvs", boost::regex_constants::icase);
I would like to make the inner parentheses non-marking, so I modified the regex as below:
boost::regex rxBlah("([a-z](?:_?[a-z0-9])*)_dvs", boost::regex_constants::icase);
However, this throws a bad_expression exception on compilation, with the message "Invalid preceding regular expression". I've tested the second regex in another regex tool (The Regex Coach, http://weitz.de/regex-coach/) and it was accepted, and performed as expected - when presented with the text "a_dvs", it showed one capture, the text "a".
So - help, what have I done wrong?
The flags you're passing to the expression are wrong: just passing icase is roughly the same as basic|icase and POSIX basic expressions don't support Perl style features like non-marking parenthesis, but:
boost::regex rxBlah("([a-z](?:_?[a-z0-9])*)_dvs", boost::regex_constants::perl | boost::regex_constants::icase);
will do what you want.
BTW from 1.33.0 onwards boost::regex_constants::perl now has a value of 0 precisely to avoid this problem.
John.
Thanks, John - the change in 1.33 looks very sensible. Stuart Dootson
participants (2)
-
John Maddock
-
Stuart Dootson