I am sorry the last message had an mistake.I wanted to say that I want to do a search that would take all the data as though it is Utf32 rather than utf8 ( as i incorrectly wrote). I don't know whether i am making myself clear (I am not very good in expressing the opnion). What i really want to do is a unicode search on the available data. Anjaly G S On Mon, 2007-10-01 at 09:42 +0100, John Maddock wrote:
Anjaly wrote:
In the regex document it was said that the size of data type of the variable passed to the make_u32regex that determines character encoding (utf8,utf16 or utf32) .
*For construction of the regex object*.
The search algorithms operate independently on any of UTF8/16/32.
I passed wchar_t (which i think size is 4) so that the buffer encoding is considered as utf8 by u32regex_search irrespectively. Actually i am trying to do a utf8 search.
Except the data file you sent *was not valid UTF8* !
It looks like it's probably UTF16LE, it's up to you in that case to decode the byte order mark and read the text into something that Boost.Regex can handle (for example platform-native UTF16). ICU should have some file IO routines for doing that kind of thing: for example for loading a file into a UnicodeString type.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
______________________________________ Scanned and protected by Email scanner