RE: [boost] regex lib: non-greedy repeats w/ match_posix

I noticed that the newest regex version no longer supports non-greedy repeats "+?" in match_posix mode. What was the rationale for this?
It is deliberate, non-greedy repeats don't really fit into the POSIX matching philosophy, where repeats are neither greedy not non-greedy, rather it's parenthesis that determine what the best match is.
John, I'm not sure I understand what you're saying. A regular expression either (completely) matches a given text, or it doesn't; if we ignore parentheses for a moment, then there's no room for different "matching strategies" (like, say, POSIX or not). When searching for (i.e., grepping) a regular expression, POSIX states "left-most longest", so it shouldn't be ambiguous to determine the left-most longest match among all possible matches (with parenthesis indices breaking ties), even if the RE contains a non-greedy repeat. Do you have a specific example at hand why you think non-greedy repeats are inappropriate for POSIX-style greps? Personally, I find it very convenient to write REs like "<b>.*?</b>", especially in POSIX mode. Ralph

I'm not sure I understand what you're saying. A regular expression either (completely) matches a given text, or it doesn't; if we ignore >parentheses for a moment, then there's no room for different "matching strategies" (like, say, POSIX or not).
When searching for (i.e., grepping) a regular expression, POSIX states "left-most longest", so it shouldn't be ambiguous to determine the left-most longest match among all possible matches (with parenthesis indices breaking ties), even if the RE contains a non-greedy repeat.
Do you have a specific example at hand why you think non-greedy repeats are inappropriate for POSIX-style greps? Personally, I find >it very convenient to write REs like "<b>.*?</b>", especially in POSIX mode.
That's actually quite a good example - it doesn't produce a "leftmost-longest" match does it? I forgot to mention that you can mix modes if you really want to, by compiling the expression as a Perl regex and then passing match_posix to the matching functions (although there are a couple of Perl specific features that don't work in POSIX matching mode - independent sub-expressions is one that comes to mind, and more will be added in the future). In all seriousness though, why not use Perl-regexes when you want Perl-compatible features? This is the default if you don't specify a mode anyway, and it's also faster than POSIX leftmost-longest mode. John.
participants (2)
-
Benzinger, Ralph
-
John Maddock