Hi, the following regex expression works in perl-mode, but does not work in extented-mode. What's the reason? I think I haven't used any perl-specific construct, or did I? Where? ... #define NUM "[+-]?(\\d+\\.?\\d*|\\.\\d+)([eE][+-]?\\d+)?" #define WORD "[\\w\\-\\.~/]+" #define EQU "\\s*\\=\\s*" const string sUserRE_VarAndVal = WORD EQU NUM; #if 0 #define REGEX_ENGINE boost::regex::extended #else #define REGEX_ENGINE boost::regex::perl #endif regex reNum(sUserRE_VarAndVal, REGEX_ENGINE); ...
U. Mutlu
the following regex expression works in perl-mode, but does not work in extented-mode.
It would help if you were more specific about what went wrong. Did the constructor throw an exception? Did the extended engine fail to match your input string? If so, what input text are you using?
I think I haven't used any perl-specific construct, or did I? Where?
I only use the Perl-compatible engine myself. I check the POSIX extended mode and did not see any obvious problems, though I did wonder about this:
#define WORD "[\\w\\-\\.~/]+"
I suspected that Perl would support the escape sequence \w
inside a [] character set, but POSIX extended might not.
The tests I ran affirmed those suspicions.
I tested this regex in a Perl one-liner and it matched the
input string "alpha only". I tried the same regex in egrep
and it didn't work. egrep did match that input string when
I replaced "\w" with "a-z".
I find it much easier to program with regular expressions if
I test them using a command-line tool first.
|+| M a r k |+|
Mark Stallard
Business Application Services
Global Business Services Information Technology
Raytheon Company
(business)
978-436-8487
(cell)
617-331-5443
stallard@raytheon.com
880 Technology Drive
Billerica, MA 01821
www.raytheon.com
This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this message
in error, please so advise the sender by reply e-mail and delete this
message. Thank you for your cooperation.
From: "U.Mutlu"
Mark R Stallard wrote, On 02/13/2015 03:03 PM:
U. Mutlu
wrote: #define WORD "[\\w\\-\\.~/]+"
I suspected that Perl would support the escape sequence \w inside a [] character set, but POSIX extended might not. The tests I ran affirmed those suspicions.
Thank you. But on the "POSIX Extended Regular Expression Syntax" page the escape sequence \w is very well listed: http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/syntax/... This indicates a possible error either in the regex lib or in the above doc, isn't it? -- Uenal
U. Mutlu
Thank you. But on the "POSIX Extended Regular Expression Syntax" page the escape sequence \w is very well listed:
I should have explained this more thoroughly. I did not say that POSIX does not support \w at all. What I said was:
I suspected that Perl would support the escape sequence \w inside a [] character set, but POSIX extended might not. The tests I ran affirmed those suspicions.
POSIX extended supports \w, but NOT inside a [] character set. For POSIX extended, the regex [\w\-\.~/]+ is the same as [w\-\.~/]+. Inside of [], POSIX extended sees \w as a literal "w". You can test this yourself if you have an up-to-date version of the egrep shell command. |+| M a r k |+| Mark Stallard Business Application Services Global Business Services Information Technology Raytheon Company (business) 978-436-8487 (cell) 617-331-5443 stallard@raytheon.com 880 Technology Drive Billerica, MA 01821 www.raytheon.com This message contains information that may be confidential and privileged. Unless you are the addressee (or authorized to receive mail for the addressee), you should not use, copy or disclose to anyone this message or any information contained in this message. If you have received this message in error, please so advise the sender by reply e-mail and delete this message. Thank you for your cooperation.
Mark R Stallard wrote, On 02/16/2015 02:57 PM:
U. Mutlu
wrote: Thank you. But on the "POSIX Extended Regular Expression Syntax" page the escape sequence \w is very well listed:
I should have explained this more thoroughly. I did not say that POSIX does not support \w at all. What I said was:
I suspected that Perl would support the escape sequence \w inside a [] character set, but POSIX extended might not. The tests I ran affirmed those suspicions.
POSIX extended supports \w, but NOT inside a [] character set. For POSIX extended, the regex [\w\-\.~/]+ is the same as [w\-\.~/]+. Inside of [], POSIX extended sees \w as a literal "w".
Ok, I see, then one better should (generally) use the equivalent, [[:word:]], inside []. Thx for clarification. cu Uenal
You can test this yourself if you have an up-to-date version of the egrep shell command.
|+| M a r k |+|
Mark Stallard Business Application Services Global Business Services Information Technology Raytheon Company
participants (2)
-
Mark R Stallard
-
U.Mutlu