
On Fri, Nov 4, 2011 at 22:54, Andrey Moshbear <andrey.vul@gmail.com> wrote:
On Fri, Nov 4, 2011 at 11:28, Igor R <boost.lists@gmail.com> wrote:
If I have a string that is in UTF-8, how do I tell the path constructor?
path p1 ("my utf8 data", SOME_CODECVT);
I think it is a matter of passing the right SOME_CODECVT. What is it? The path::value_type is wchar_t, according to the docs.
On Windows you should convert it to utf16.
Word of warning: the boost utf8 codecvt will cause undefined operations if you have and cps above U+FFFF. You'll have to hack do_in to and do_out in order to emit/parse surrogate pairs. Also, hack do_length to increment the counter by 2 for cp>0xFFFF.
For my rewrite of UTF-8 to UTF-16/32, look at https://github.com/moshbear/fastcgipp/blob/master/src/utf8_cvt.cpp. While it can still decode above U+10FFFF, it's still more RFC 3629 compliant than utf8_codecvt_facet. It also supports true UTF-16.