Re: [Boost-users] REG_PERLEX Revisited...

12 Mar 2008

      Phil Hystad wrote:
...
...
In follow up to the message and response quoted below.  Boost regex
seems
to work fine on Mac OS X and on our Linux platforms.  But, on Windows
32 bit
we have the following situation.  Note this message is a little bit
on the long side given that I am including a short program and the
output from running on Windows and Linux platforms.
The brief program shown below illustrates this problem.  The results
are from the Linux and Windows 32-bit machine.  You can see on
Windows when using the Posix API, I get the right offset only if I
use boost::REG_PERL or boost::REG_PERLEX.  On Linux, it works fine
for all flags.
Right this is by design in order to be std conformant but confusing: for 
POSIX regular expressions the behaviour of [x-y] is implementation defined 
in the latest POSIX std, while for the previous std it was *required* to be 
locale sensitive.  Therefore Boost.Regex is locale sensitive for POSIX 
regular expressions by default - which means that [A-Z] will match any 
single character that collates in the range 'A' to 'Z' in the current 
locale.  On Win32 that's the default user locale - so [A-Z] will typically 
match "b" for example.  On Linux what happens depends on the setting of 
LC_CTYPE.  For Perl regular expressions the default is to not be locale 
sensitive on character ranges as it confuses too many people!

For POSIX style regexes you can turn off locale dependent behaviour by 
passing REG_NOCOLLATE in combination with whatever other flags you may be 
using in regcomp.

For POSIX regexes with boost::regex then use the flags

posix & ~collate

to disable locale specific collation with POSIX regexes.

HTH, John.

Re: [Boost-users] REG_PERLEX Revisited...

John Maddock