REG_PERLEX Revisited...
In follow up to the message and response quoted below. Boost regex
seems
to work fine on Mac OS X and on our Linux platforms. But, on Windows
32 bit
we have the following situation. Note this message is a little bit on
the long side given that I am including a short program and the output
from running on Windows and Linux platforms.
The brief program shown below illustrates this problem. The results
are from the Linux and Windows 32-bit machine. You can see on Windows
when using the Posix API, I get the right offset only if I use
boost::REG_PERL or boost::REG_PERLEX. On Linux, it works fine for all
flags.
Program
----------
#include
Message: 4 Date: Mon, 10 Mar 2008 18:08:16 -0000 From: "John Maddock"
Subject: Re: [Boost-users] REG_PERLEX To: Message-ID: <00a201c882d9$bab38360$83d56b51@fuji> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Phil Hystad wrote:
Does anyone know the definition of REG_PERLEX?
I am using the regex/regcomp traditional unix/posix API supported by Boost Regular Expression library. On a Windows 32 bit platform we are forced to use REG_PERLEX on the regcomp flags argument whereas for the same application we get by using a zero flag value on regcomp on platforms: Mac OS X and Linux.
REG_PERLEX allows the engine to accept Perl style regular expressions - what kind of expressions are you using, and what differences do you observe on the different platforms - there shouldn't really be any difference in behaviour.
John.
Phil Hystad wrote:
In follow up to the message and response quoted below. Boost regex seems to work fine on Mac OS X and on our Linux platforms. But, on Windows 32 bit we have the following situation. Note this message is a little bit on the long side given that I am including a short program and the output from running on Windows and Linux platforms.
The brief program shown below illustrates this problem. The results are from the Linux and Windows 32-bit machine. You can see on Windows when using the Posix API, I get the right offset only if I use boost::REG_PERL or boost::REG_PERLEX. On Linux, it works fine for all flags.
Right this is by design in order to be std conformant but confusing: for POSIX regular expressions the behaviour of [x-y] is implementation defined in the latest POSIX std, while for the previous std it was *required* to be locale sensitive. Therefore Boost.Regex is locale sensitive for POSIX regular expressions by default - which means that [A-Z] will match any single character that collates in the range 'A' to 'Z' in the current locale. On Win32 that's the default user locale - so [A-Z] will typically match "b" for example. On Linux what happens depends on the setting of LC_CTYPE. For Perl regular expressions the default is to not be locale sensitive on character ranges as it confuses too many people! For POSIX style regexes you can turn off locale dependent behaviour by passing REG_NOCOLLATE in combination with whatever other flags you may be using in regcomp. For POSIX regexes with boost::regex then use the flags posix & ~collate to disable locale specific collation with POSIX regexes. HTH, John.
participants (2)
-
John Maddock
-
Phil Hystad