[regex][bug]Regex::regex_search does not handle /b&/B correclty in Win32

Hello Folks. I found an issue of regex_search() with "/b" or "/B" assertion in Win32 and created a patch(attached). Please review it. Issue: /b, /B does not match with several word boundaries. i.e. regPat = /\bis\b/; regPat.exec("This\u00C0is\u00C0bad"); //Does not match regPat.exec("Thisあisあbad"); // Does not match Desc: In the ECMA262 regex spec(15.10.2.6), for "/b" and "/B" assertion only charaters [0-9A-Za-z_] are allowed as a word character and other should be treated as out of word. In the boost:regex inpmenetation for Win32, GetStringTypeEx() with C1_ALPHA | C1_DIGIT flags is used to determine char type. However the API does not differentiate [0-9A-Za-z_] and other characters (e.g. European characters, Kanji) (just linguistic characters and everybody else) which does not meet the spec. Patch: Patch for w32_regex_traits.hpp, attached. Modified isctype() to determine character type without the API use. Regards. Hak Matsuda Lead Dev. CRI Middleware inc. 340 Brannan St #400, San Francisco, CA 94107 -- Thanks! HAK

Hakuro wrote:
Hello Folks. I found an issue of regex_search() with "/b" or "/B" assertion in Win32 and created a patch(attached). Please review it. In the ECMA262 regex spec(15.10.2.6), for "/b" and "/B" assertion only charaters [0-9A-Za-z_] are allowed as a word character and other should be treated as out of word. In the boost:regex inpmenetation for Win32, GetStringTypeEx() with C1_ALPHA | C1_DIGIT flags is used to determine char type. However the API does not differentiate [0-9A-Za-z_] and other characters (e.g. European characters, Kanji) (just linguistic characters and everybody else) which does not meet the spec.
The current behaviour is deliberate: if you really want "C" locale behaviour then use basic_regex<char, cpp_regex_traits<char> > as you're regular expression type. Crippling win32 locale support IMO is not the right thing to do. Thanks, John Maddock.
participants (2)
-
Hakuro
-
John Maddock