
Hi all I'm having trouble with the behaviour of the wildcard character when using boost regex and unicode strings. I would expect a . to match a character, not a byte, but that's not the behaviour I'm seeing. I would have thought one wildcard would match any previous character, but for multi-byte characters in UTF-8 I have to use multiple wildcards to match them. I would appreciate it if someone could explain whether this is expected behaviour or not, or if there are flags that control this. What I'm trying to accomplish is to match a pattern (in UTF-8 ) against a string (in UTF-8). I'm creating icu UnicodeStrings since I'm having other problems with straight UTF-8 char*s and my platform doesn't support w_chars. I can show examples of the non-UnicodeString problems if desired. I'm using 1.42 Test program follows - output is: $ g++ regex2.cc -l icui18n -l icuuc -l icudata -lboost_regex -o example && ./example unicodeString tests failed Success! ---- #include <iostream> #include <boost/regex.hpp> #include <boost/regex/icu.hpp> using namespace boost; using namespace std; int main(){ static const char input[]={0xC2,0xA3, 0xC3,0x98, 0xC2,0xB2, 0 }; UnicodeString uInput(input); const char match1[] = {0xC2,0xA3,0x2E,0xC2,0xB2, 0} ; // one . const char match2[] = {0xC2,0xA3,0x2E,0x2E,0xC2,0xB2, 0} ; // two .s UnicodeString uMatch1(match1); UnicodeString uMatch2(match2); u16match what; cout << "unicodeString tests" << endl; if(u32regex_search( uInput , what, // one . fails make_u32regex(uMatch1,regex::extended))) { cout << "Success!" << endl; } else { cout << "failed" << endl; } if(u32regex_search( uInput , what, // two . succeeds make_u32regex(uMatch2,regex::extended))) { cout << "Success!" << endl; } else { cout << "failed" << endl; } }