REG_PERLEX Revisited...

12 Mar 2008

      In follow up to the message and response quoted below.  Boost regex  
seems
to work fine on Mac OS X and on our Linux platforms.  But, on Windows  
32 bit
we have the following situation.  Note this message is a little bit on  
the long side given that I am including a short program and the output  
from running on Windows and Linux platforms.

The brief program shown below illustrates this problem.  The results  
are from the Linux and Windows 32-bit machine.  You can see on Windows  
when using the Posix API, I get the right offset only if I use  
boost::REG_PERL or boost::REG_PERLEX.  On Linux, it works fine for all  
flags.

Program
----------
#include <boost/regex.hpp>
#include <boost/regex.h>
#include <string>
#include <iostream>

using namespace std;

static const char* szPattern="[A-Z][a-z]*";
static const char* szString="small is Great for the Big and Tall";

void f1_(boost::regex::flag_type flag, const char* flag_str)
{
   cout << "\nUsing boost::regex, flag=" << flag << " (" << flag_str  
<< ")" << endl;
   std::string s = szString;
   boost::regex re(szPattern, flag);
   boost::match_results<std::string::const_iterator> what;
   boost::regex_search(s, what, re);
   std::cout << "pos=" << what.position() << " len=" << what.length()  
<< std::endl;
}

void f2_(int flag, const char* flag_str)
{
   cout << "\nUsing Posix, flag=" << flag << " (" << flag_str << ")"  
<< endl;
   regex_t pattern;
   int x = regcomp(&pattern, szPattern, flag);
   if ( x != 0 ) { std::cout << "regcomp - error" << std::endl;  
return; }
   regmatch_t matches[5];
   x = regexec(&pattern, szString, 5, matches, 0);
   if ( x != 0 ) { std::cout << "regexec - error" << std::endl;  
return; }
   std::cout << "matches[0].rm_so=" << matches[0].rm_so << std::endl;
   std::cout << "matches[0].rm_eo=" << matches[0].rm_eo << std::endl;
}

#define f1(x) f1_(x, #x)
#define f2(x) f2_(x, #x)

int main()
{
   cout << "Regex=" << szPattern << endl;
   cout << "Input=" << szString << endl;

   f1(boost::regex::normal);
   f1(boost::regex::basic);
   f1(boost::regex::extended);
   f1(boost::regex::awk);
   f1(boost::regex::grep);
   f1(boost::regex::egrep);
   f1(boost::regex::sed);
   f1(boost::regex::perl);
   f2(0); // default
   f2(boost::REG_EXTENDED);
   f2(boost::REG_BASIC);
   f2(boost::REG_PERL);
   f2(boost::REG_AWK);
   f2(boost::REG_GREP);
   f2(boost::REG_EGREP);
   f2(boost::REG_PERLEX);
   return 0;
}

Output From Windows
--------------------------
Regex=[A-Z][a-z]*
Input=small is Great for the Big and Tall

Using boost::regex, flag=0 (boost::regex::normal)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::basic)
pos=0 len=5

Using boost::regex, flag=2163456 (boost::regex::extended)
pos=0 len=5

Using boost::regex, flag=2097920 (boost::regex::awk)
pos=0 len=5

Using boost::regex, flag=2293761 (boost::regex::grep)
pos=0 len=5

Using boost::regex, flag=2294528 (boost::regex::egrep)
pos=0 len=5

Using boost::regex, flag=2162689 (boost::regex::sed)
pos=0 len=5

Using boost::regex, flag=0 (boost::regex::perl)
pos=9 len=5

Using Posix, flag=0 (0)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1 (boost::REG_EXTENDED)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=0 (boost::REG_BASIC)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=2817 (boost::REG_PERL)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=513 (boost::REG_AWK)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1024 (boost::REG_GREP)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1025 (boost::REG_EGREP)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=2048 (boost::REG_PERLEX)
matches[0].rm_so=9
matches[0].rm_eo=14

LINUX (Redhat) Output
----------------------------
Regex=[A-Z][a-z]*
Input=small is Great for the Big and Tall

Using boost::regex, flag=0 (boost::regex::normal)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::basic)
pos=9 len=5

Using boost::regex, flag=2163456 (boost::regex::extended)
pos=9 len=5

Using boost::regex, flag=2097920 (boost::regex::awk)
pos=9 len=5

Using boost::regex, flag=2293761 (boost::regex::grep)
pos=9 len=5

Using boost::regex, flag=2294528 (boost::regex::egrep)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::sed)
pos=9 len=5

Using boost::regex, flag=0 (boost::regex::perl)
pos=9 len=5

Using Posix, flag=0 (0)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1 (boost::REG_EXTENDED)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=0 (boost::REG_BASIC)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=2817 (boost::REG_PERL)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=513 (boost::REG_AWK)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1024 (boost::REG_GREP)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1025 (boost::REG_EGREP)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=2048 (boost::REG_PERLEX)
matches[0].rm_so=9
matches[0].rm_eo=14
...
Message: 4
Date: Mon, 10 Mar 2008 18:08:16 -0000
From: "John Maddock" <john@johnmaddock.co.uk>
Subject: Re: [Boost-users] REG_PERLEX
To: <boost-users@lists.boost.org>
Message-ID: <00a201c882d9$bab38360$83d56b51@fuji>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
  reply-type=original
Phil Hystad wrote:
...
Does anyone know the definition of REG_PERLEX?
I am using the regex/regcomp traditional unix/posix API supported by
Boost Regular Expression library.  On a Windows 32 bit platform we  
are
forced to use REG_PERLEX on the regcomp flags argument whereas for  
the
same application we get by using a zero flag value on regcomp on
platforms: Mac OS X and Linux.
REG_PERLEX allows the engine to accept Perl style regular  
expressions - what
kind of expressions are you using, and what differences do you  
observe on
the different platforms - there shouldn't really be any difference in
behaviour.
John.

Phil Hystad

John Maddock

tags

participants (2)