[xpressive] wide character blank not matching properly

It seems that xpressive's blank isn't matched properly to a space when used with wide strings. Here is a short example: #include <boost/xpressive/xpressive_static.hpp> #include <string> #include <iostream> #if 1 // 1 for wide character, 0 for normal. # define REGEX_TYPE boost::xpressive::wsregex # define REGEX_ITER_TYPE boost::xpressive::wsregex_iterator # define REGEX_MATCH_TYPE boost::xpressive::wsmatch # define STR_TYPE std::wstring # define STREAM_TYPE std::wcout # define STR(s) L##s #else # define REGEX_TYPE boost::xpressive::sregex # define REGEX_ITER_TYPE boost::xpressive::sregex_iterator # define REGEX_MATCH_TYPE boost::xpressive::smatch # define STR_TYPE std::string # define STREAM_TYPE std::cout # define STR(s) s #endif int main() { using namespace boost::xpressive; REGEX_TYPE expr = ( s1= ( STR("something") >> +( (*blank >> +alpha) ) ) ) ; STR_TYPE strings[] = { STR("something"), STR("something "), STR("somethingelse"), STR("something else") }; const int numStrings(sizeof(strings) / sizeof(strings[0])); for(int i(0); numStrings != i; ++i) { const STR_TYPE & s(strings[i]); STREAM_TYPE << "String \"" << s << "\":" << std::endl; for(REGEX_ITER_TYPE current(s.begin(), s.end(), expr), end; current != end; ++current) { const REGEX_MATCH_TYPE & what(*current); const STR_TYPE match(what[1]); STREAM_TYPE << "\t" << match << std::endl; } } return 0; } I would expect the last string to match (as it does with simple characters) but it doesn't. Using space instead, or using (set= STR(' '),STR('\t')) seems to work in both cases. I'm using MSVC 2005 SP1. I'm not sure if the problem is in the compiler's char traits and how blank uses them (if it does) or somewhere in the bowels of xpressive. Thanks, JF

Jean-Francois Bastien wrote:
It seems that xpressive's blank isn't matched properly to a space when used with wide strings.
Short answer: you're right. Long answer: blank is an oddball, in that there is no ctype mask for it. The TR1 spec says of the characters matched by blank, "an implementation-defined subset of the characters for which isspace(c, getloc()) returns true, otherwise returns false." So, you really can't rely on using blank in portable programs. Many std libraries have support for blank in the ctype facet anyway. MSVC is not one of them, so I had to fudge it. Looking at the TR1 spec, I see that C++0x is picking up the isblank() and iswblank() functions from C99. It seems to only make sense that ctype and regex should be updated accordingly. (Incidentally, C99 iswblank() in the "C" locale will return true for L' ' and L'\t', false for everything else. Which is what xpressive should do.) I've just fixed this is HEAD. Thanks for the report. You can work around it by using (set=' ','\t'). -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com
participants (2)
-
Eric Niebler
-
Jean-Francois Bastien