
Dave Abrahams wrote: ...
OK. You're designing a portable library that talks to the OS. It has the following functions:
T get_path( ... ); void process_path( T );
What do you use for T? string or utf8_string?
I'm even less of an expert on encodings at the OS boundary than I am on an expert on encodings in general, but I'll take a shot at this one.
OK, according to all the experts (like you), we should be trafficking in UTF-8 everywhere, so I guess I'd say T is utf8_string (well, T is boost::filesystem::path, but that begs the same questions, ultimately).
My answer is different. T is std::string, and: - on POSIX OSes, this string is taken directly from the OS and given directly to the OS, without any conversion; - on Windows, this string is UTF-8 and is converted to UTF-16 before being given to the OS, and converted from UTF-16 after being received from it. This conversion should tolerate broken UTF-16 because the OS does so as well.