Phil Hystad wrote:
In follow up to the message and response quoted below. Boost regex seems to work fine on Mac OS X and on our Linux platforms. But, on Windows 32 bit we have the following situation. Note this message is a little bit on the long side given that I am including a short program and the output from running on Windows and Linux platforms.
The brief program shown below illustrates this problem. The results are from the Linux and Windows 32-bit machine. You can see on Windows when using the Posix API, I get the right offset only if I use boost::REG_PERL or boost::REG_PERLEX. On Linux, it works fine for all flags.
Right this is by design in order to be std conformant but confusing: for POSIX regular expressions the behaviour of [x-y] is implementation defined in the latest POSIX std, while for the previous std it was *required* to be locale sensitive. Therefore Boost.Regex is locale sensitive for POSIX regular expressions by default - which means that [A-Z] will match any single character that collates in the range 'A' to 'Z' in the current locale. On Win32 that's the default user locale - so [A-Z] will typically match "b" for example. On Linux what happens depends on the setting of LC_CTYPE. For Perl regular expressions the default is to not be locale sensitive on character ranges as it confuses too many people! For POSIX style regexes you can turn off locale dependent behaviour by passing REG_NOCOLLATE in combination with whatever other flags you may be using in regcomp. For POSIX regexes with boost::regex then use the flags posix & ~collate to disable locale specific collation with POSIX regexes. HTH, John.