[regex] ".*" expression matches twice in regex_replace?
Hi all, I am trying to perform some simple (as I thought) text manipulation. The task was to prepend a text with some other, e.g. 1234 -> 01234. I tried to do this with the expression ".*" and format "0$&". I used regex_replace with the following parameters: std::basic_string<char> string("1234"); std::basic_string<char> format("0$&"); boost::regex exp(".*"); std::basic_string<char> result; result = boost::regex_replace(string, exp, format); I was very surprised when I got the result "012340", i.e. the format string was applied twice. After some testing, I found out that regex_replace matches a second time at the end of the string, where the expression matched an empty string (therefore only the zero was appended at the end). When using the (undocumented) option match_not_initial_null, it basically works - but then empty input strings would not match either. This also applies to the expression ".+", which works correct on the input but would not match on empty strings as well. The only solution I found so far was using "^.*" as expression. Is this a bug or a feature? Or what did I get wrong with this regular expression? Regards, Arne ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
Arne Babnik wrote:
Hi all,
I am trying to perform some simple (as I thought) text manipulation. The task was to prepend a text with some other, e.g. 1234 -> 01234. I tried to do this with the expression ".*" and format "0$&".
I used regex_replace with the following parameters:
std::basic_string<char> string("1234"); std::basic_string<char> format("0$&"); boost::regex exp(".*"); std::basic_string<char> result;
result = boost::regex_replace(string, exp, format);
I was very surprised when I got the result "012340", i.e. the format string was applied twice. After some testing, I found out that regex_replace matches a second time at the end of the string, where the expression matched an empty string (therefore only the zero was appended at the end).
When using the (undocumented) option match_not_initial_null, it basically works - but then empty input strings would not match either. This also applies to the expression ".+", which works correct on the input but would not match on empty strings as well. The only solution I found so far was using "^.*" as expression.
Is this a bug or a feature? Or what did I get wrong with this regular expression?
It's a feature: or at least a Perl-compatibility feature. When a match is found, it always checks for the next possible match starting from the end of the previous match: even if the end of the previous match is at the end of the string, so: [[:digit:]]* against 1234 always finds two matches "1234" and then the empty string "" after the "4", irrespective of whether the "1234" occurs in the middle of a text or at the end. John.
participants (2)
-
Arne Babnik
-
John Maddock