
On Wed, Jan 19, 2011 at 7:54 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
Agreed, again if Microsoft could move by default to UTF-8 for the various locales instead of using the current encodings then this whole discussion would be moot.
For the time being we would need to do something like this even if a complete transcoding is not possible:
std::string filepath(get_path_in_utf8()) std::fstream file(utf8_to_locale_encoding(filepath));
everywhere the implementation (STL, etc.) expects native encoding. This is the ugliest part of the whole transition. Boost could hide this completely by using the wide-char interfaces and doing CreateFileW(utf8_to_winapi_wide(filepath), ...).
It also could be an opportunity for alternate implementations of STL which would handle it transparently.
Hmmmm ... I'm starting to come round to your std::string == UTF-8 point-of view.
The one thing that would still annoy me is that std::string's interface was clearly designed for single-byte == single-character/codepoint/whatever operation. I don't suppose anyone will be adding .begin_character()/.end_character() methods to std::string any time soon.
This is where the (Boost.)Locale and (Boost.)Unicode libraries could provide insight into how to extend the std::string interface or be the testbed for new additions to the standard library related to string manipulation. (Provided, the standard adopts UTF-8 as a native encoding. Or does it already ?) Matus