Regex: Question about greedy/non-greedy precedence
Hello, I'm using boost::regex 1.34.0 and experience behaviour with mixed greedy and non-greedy operators in the same expression which I do not understand. I have either one of the following input strings: user1@hostname.com user2 and want to keep only the usernames, so I use the following expression: (.*?)(@.*)? with the format string $1. I would expect that basically the regex machine would first try to satisfy the greedy operator "?", and then fill up the non-greedy "*?" with the remaing part of the input. However, the following does NOT work (the hostname is not removed): std::string output; output = boost::regex_replace(std::string("user1@hostname.com"), boost::regex("(.*?)(@.*)?"), std::string("$1"), boost::regex_constants::format_first_only); However, if I omit the "format_first_only", it works as expected. What do I miss? Thanks, Arne ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
Arne Babnik wrote:
Hello,
I'm using boost::regex 1.34.0 and experience behaviour with mixed greedy and non-greedy operators in the same expression which I do not understand.
I have either one of the following input strings: user1@hostname.com user2
and want to keep only the usernames, so I use the following expression: (.*?)(@.*)? with the format string $1.
I would expect that basically the regex machine would first try to satisfy the greedy operator "?", and then fill up the non-greedy "*?" with the remaing part of the input.
No that's not how Perl-Regexes work, they move from left to right through the expression matching each part in turn, and then backtracking if they can't satisfy something. In the case of (.*?)(@.*)? the (.*?) part can suceesfully match zero characters by repeating zero times as can (@.*)?, so there are multiple matches to the string possible, each of zero characters in length (It's more complicated still when there are zero length matches, but that will do for now!). So instead try: ([^@]+)(?:@.*)? which I believe will do as you want. HTH, John.
However, the following does NOT work (the hostname is not removed): std::string output; output = boost::regex_replace(std::string("user1@hostname.com"), boost::regex("(.*?)(@.*)?"), std::string("$1"),
boost::regex_constants::format_first_only);
However, if I omit the "format_first_only", it works as expected.
What do I miss?
Thanks,
Arne
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Arne Babnik
-
John Maddock