Re: [Boost-users] boost::filesystem::path in UTF-8 on Windows

5 Nov 2011

      On Sat, Nov 5, 2011 at 12:43, John M. Dlugosz <mpbecey7gu@snkmail.com> wrote:
...
On Fri, Nov 4, 2011 at 11:28, Igor R <boost.lists@gmail.com> wrote:
...
...
...
On Windows you should convert it to utf16.
I know that is how it stores it internally.
My question is "how".  Given that I have data that are file names and
encoded in UTF-8, how do I make the Boost path class accept them, and
operate conveniently enough to be worth using instead of plain strings?
On Fri, Nov 4, 2011 at 22:54, Andrey Moshbear <andrey.vul@gmail.com> wrote:
...
For my rewrite of UTF-8 to UTF-16/32, look at
https://github.com/moshbear/fastcgipp/blob/master/src/utf8_cvt.cpp.
So this is a codecvt that I should use as the extra argument, that works
better than the undocumented one that came with Boost?
And the boost utf8<->utf32 one is indeed documented:
http://www.boost.org/doc/libs/1_47_0/libs/serialization/doc/codecvt.html.
It's just not going to work correctly with extended Unicode if you
decide to use 16-bit char as the char type.

The code itself isn't that self-documenting, though, which makes
hacking in the U+10FFFF limit and surrogate pair parsing more
work than simply rewriting the codecvt.
...
And, the implicit answer is that this is indeed how I do it?
But:
1) When I write something like
  path p2= p1 / "Foo" / s1 / name;
there is no place to pass the extra codecvt argument.  I thought it might
take strings and keep the existing encoding, but it actually uses the
default code page.  How can I use path in a simple and convenient manner
given that in this program all the strings I will use with it are already in
UTF-8?
Make a std::wstringstream.
Imbue it with locale(locale::classic(), new Utf8_cvt).
Use  operator<< to build up a path.
Call .str() to get the string.
Pass that to the path constructor.
...
2) How can I write a line like:
  path p2 (somestring, codecvt());
in a portable manner?  On the Mac the internal representation is char, so
will it object to having the codecvt passed?  Once I set things up, I want
the bulk of the source code to be the same on all platforms, so writing the
argument on Windows and leaving it out on Mac is not acceptable.
Because Mac assumes char, use of wide UTF isn't going to work because
the libraries look for char 0 as terminators,
not wchar_t 0.

The best solution is to #ifdef _WIN32 the utf-8 to utf-16 code.