data:image/s3,"s3://crabby-images/8b183/8b183445285a6efa56bd95fe6220845a05d30613" alt=""
John, Thanks for help! I've managed to create my own traits classes and even made the whole stuff compile, but I found that it would not work :-) Right now I am doing intermediate encoding/decoding between wchar_t and the local encoding (which is determined by the locale). However, I do not like that approach much. I am intrigued with what you said about converting data from UTF-8 to UTF-32 on the fly. It is absolutely not a problem to convert my Unicode strings to UTF-8 encoded strings. Where could I read about those on the fly conversions and what limitations do they have (e.g. how locale settings are handled)? Thanks, Andrei -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Tuesday, March 21, 2006 12:36 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [regex] Working with wchar_t on older UNIXplatforms
Now I tried to integrate wregex in the software, but it just would not compile complaining about missing wstring (and defined BOOST_NO_WREGEX). I tried to make up my own regex character traits class, but this does not seem to help, because some other classes/types (such as sub_match) make use of basic_string<charT>.
Is there any way to bypass the problem?
OK all the following comments apply to 1.33.1. There are two easy options and one harder option: Easy option #1, use STLport if it supports wstring. Easy option #2, use the ICU/Unicode support in 1.33.1 to search your data directly (as long as it's in UTF-8, UTF-16 or UTF-32 format). You'll get back iterators into your data (whatever encoding it's in), so there's no problems determining offsets etc. The slightly harder option, as you've guessed already: write your own traits class, from 1.33 onwards you can use vector<charT> in place of basic_string<charT> in the traits class. If you take a look at the traits class used by the Unicode/ICU support code it should give you the general idea, and there are docs here: http://www.boost.org/libs/regex/doc/concepts.html#traits And finally... if you data is in MBCS format you might get some ideas from the unicode suuport code in 1.33.x: basically in order to handle multibyte encodings it converts from UTF-8 or UTF-16 to UTF-32 code points on the fly. Of course this requires that the on-the-fly conversions are bidirectional, this works OK for Unicode, but I'm not sure about how far you would get with other encodings. HTH, John. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users