Boost: Regular Expression question/bug?

Hi, I'm trying to understand if a regular expression I've designed is working correctly. It seems that I can make the regular expression match work "off by one" character in certain circumstances. For instance: The pattern: ".{1,3}" finds 1 to 3 characters, however The pattern: ".[^b]{1,3}" finds 1 to 4 characters...? For some reason when I apply a "do not allow the character b" or any other bracket operator the {min,max} range value seems to allow for one additional character. I cannot imagine this being done on purpose. The work around is to allow one less then you really want (i.e. {1,2}) but I figured I'd post something and see if anyone has seen it or could explain what is happening. Maybe its just a bug. The code example: boost::wregex pattern( TEXT( ".{1,3}" ) ); boost::wcmatch matchResults; if( boost::regex_match( TEXT( "asd" ), matchResults, pattern ) ) { // Match } else { // No match } } catch( runtime_error &e ) { printf( "\n%%FAILURE: Regular expression error: %s.\n", e.what() ); } Yes, I'm running in a UTF-16 build environment in Microsoft Windows .NET using C++. If this isn't the correct venue for this message, sorry in advance. Thanks, Derrick

Hi Derrick,
On 15/02/07, Derrick Schommer
Hi,
I'm trying to understand if a regular expression I've designed is working correctly. It seems that I can make the regular expression match work "off by one" character in certain circumstances. For instance:
The pattern: ".{1,3}" finds 1 to 3 characters, however
The pattern: ".[^b]{1,3}" finds 1 to 4 characters...? <dice>
The reason the second patter finds 1-4 chars is because you've asked for '.' (any character) followed by [^b]{1,3} (1-3 of any character except b). If you want 1-3 of anything except b, then just "[^b]{1,3}" is all you need. hth, Darren

I'll give that a try, thanks. Just cannot see why is it that .{1,3}
doesn't have the same effect given the only difference is the
exclusion of a 'b'.
Thanks,
Derrick
2007/2/15, Darren Garvey
Hi Derrick,
On 15/02/07, Derrick Schommer
wrote: Hi,
I'm trying to understand if a regular expression I've designed is working correctly. It seems that I can make the regular expression match work "off by one" character in certain circumstances. For instance:
The pattern: ".{1,3}" finds 1 to 3 characters, however
The pattern: ".[^b]{1,3}" finds 1 to 4 characters...? <dice>
The reason the second patter finds 1-4 chars is because you've asked for '.' (any character) followed by [^b]{1,3} (1-3 of any character except b). If you want 1-3 of anything except b, then just "[^b]{1,3}" is all you need.
hth, Darren
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On 2/15/07, Derrick Schommer
I'll give that a try, thanks. Just cannot see why is it that .{1,3} doesn't have the same effect given the only difference is the exclusion of a 'b'.
I'm getting the impression that you have a wrong impression of regular expressions. When you say .[^b] the [^b] is not modifying the . expression. What you have are two regular expressions next to each other, first the dot (for one character), then [^b], which matches anything except b. So .[^b] matches 2 characters, "any-character followed by any-character-except-b". Now when you put the repeatitions on it, ".[^b]{1,3}" you have "any-character followed by one-to -three-characters-that-aren't-b" Hope that helps. Chris

Okay, thanks, that makes a bit more sense!
Derrick
2007/2/15, Chris Uzdavinis
On 2/15/07, Derrick Schommer
wrote: I'll give that a try, thanks. Just cannot see why is it that .{1,3} doesn't have the same effect given the only difference is the exclusion of a 'b'.
I'm getting the impression that you have a wrong impression of regular expressions. When you say .[^b] the [^b] is not modifying the . expression. What you have are two regular expressions next to each other, first the dot (for one character), then [^b], which matches anything except b. So .[^b] matches 2 characters, "any-character followed by any-character-except-b". Now when you put the repeatitions on it, ".[^b]{1,3}" you have "any-character followed by one-to -three-characters-that-aren't-b"
Hope that helps. Chris _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
Chris Uzdavinis
-
Darren Garvey
-
Derrick Schommer