[regex][1.33.1-rc2] regression wrt 1.32.0

Good day, We observe a regression in the regex library compared to 1.32.0. The following program: #include <string> #include <iostream> #include <boost/regex.hpp> int main () { using std::string; using boost::regex; using boost::regex_merge; regex expr ("(\\.(idl|cidl|cdl))?$"); std::cerr << regex_merge ( string ("test.cidl"), expr, string ("E.idl"), boost::match_default | boost::format_all ) << std::endl; } prints "testE.idlE.idl" when compiled and linked against 1.33.x and "testE.idl" against previous versions. hth, -boris

We observe a regression in the regex library compared to 1.32.0. The following program:
prints "testE.idlE.idl" when compiled and linked against 1.33.x and "testE.idl" against previous versions.
I'm afraid it's not a regression, it's a fix! The expression you're using can match a zero-length string, so after it's matched the ".cidl" suffix, it then finds a second match of zero-length immediately afterwards, hense the two copies of "E.idl" in the output string. I'm afraid this dark corner was/is under-documented in the docs, but the TR1 text (with which this version is intended to conform) is quite clear that this is the required behaviour. As a workaround, you could specify format_first_only in the format flags (assuming you're replacing the suffix on a single filename), or you could use an expression like: (.)(\\.(idl|cidl|cdl))?$ and replace with: $1E.idl HTH, John.

"John Maddock" <john@johnmaddock.co.uk> writes:
The expression you're using can match a zero-length string, so after it's matched the ".cidl" suffix, it then finds a second match of zero-length immediately afterwards, hense the two copies of "E.idl" in the output string.
The original expression is "(\.(idl|cidl|cdl))?$". Doesn't '$' at the end means that this expression by definition cannot match two things in a single string (since a string has only one end)?
I'm afraid this dark corner was/is under-documented in the docs, but the TR1 text (with which this version is intended to conform) is quite clear that this is the required behaviour.
For what it's worth, perl on my box (5.8.7) also think there is only one match. But I guess C++ TR1 is more authoritative when it comes to regular expressions (sorry, couldn't resist sarcasm when it comes to Std C++).
As a workaround, you could specify format_first_only in the format flags (assuming you're replacing the suffix on a single filename), or you could use an expression like:
(.)(\\.(idl|cidl|cdl))?$
Following the logic above it will match all single letters in the string, no? The following seems to work thought: "^(.+?)(\.(idl|cidl|cdl))?$" Thanks for your help, -boris

The original expression is "(\.(idl|cidl|cdl))?$". Doesn't '$' at the end means that this expression by definition cannot match two things in a single string (since a string has only one end)?
Good point, it's not that smart though...
I'm afraid this dark corner was/is under-documented in the docs, but the TR1 text (with which this version is intended to conform) is quite clear that this is the required behaviour.
For what it's worth, perl on my box (5.8.7) also think there is only one match. But I guess C++ TR1 is more authoritative when it comes to regular expressions (sorry, couldn't resist sarcasm when it comes to Std C++).
OK, we messed up the text then (well I did actually).
As a workaround, you could specify format_first_only in the format flags (assuming you're replacing the suffix on a single filename), or you could use an expression like:
(.)(\\.(idl|cidl|cdl))?$
Following the logic above it will match all single letters in the string, no?
No, because of the trailing $.
The following seems to work thought: "^(.+?)(\.(idl|cidl|cdl))?$"
Thanks for your help, -boris
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

John, "John Maddock" <john@johnmaddock.co.uk> writes:
Following the logic above it will match all single letters in the string, no?
No, because of the trailing $.
The original expression had trailing $ as well but it didn't help much, did it? So what's the verdict, is this a bug or a feature? thanks, -boris

Boris Kolpackov wrote:
John,
"John Maddock" <john@johnmaddock.co.uk> writes:
Following the logic above it will match all single letters in the string, no?
No, because of the trailing $.
The original expression had trailing $ as well but it didn't help much, did it?
So what's the verdict, is this a bug or a feature?
Feature. Perl has the same behavior: $str = 'test.cidl'; $str =~ s/(\.(idl|cidl|cdl))?$/E.idl/g; print "$str\n"; ... prints: testE.idlE.idl -- Eric Niebler Boost Consulting www.boost-consulting.com

On 12/6/05, Eric Niebler <eric@boost-consulting.com> wrote:
Boris Kolpackov wrote:
John,
"John Maddock" <john@johnmaddock.co.uk> writes:
Following the logic above it will match all single letters in the string, no?
No, because of the trailing $.
The original expression had trailing $ as well but it didn't help much, did it?
So what's the verdict, is this a bug or a feature?
Feature. Perl has the same behavior:
$str = 'test.cidl'; $str =~ s/(\.(idl|cidl|cdl))?$/E.idl/g; print "$str\n";
... prints:
testE.idlE.idl
Indeed it does. If the user wants only one substitution, I'd say the use of /g, which allows s/// to match multiple times, is the central mistake. ** With /g $ perl -le '($a = "test.cidl") =~ s/(\.(c?idl|cdl))?$/E.idl/g; print $a' testE.idlE.idl ** Without $ perl -le '($a = "test.cidl") =~ s/(\.(c?idl|cdl))?$/E.idl/; print $a' testE.idl -- Caleb Epstein caleb dot epstein at gmail dot com

Indeed it does.
If the user wants only one substitution, I'd say the use of /g, which allows s/// to match multiple times, is the central mistake.
And just to clarify: regex_replace finds and replaces *all* occurances (equivalent to /g) unless you tell it otherwise with the format_first_only flag. John.
participants (4)
-
Boris Kolpackov
-
Caleb Epstein
-
Eric Niebler
-
John Maddock