
On Thu, Nov 3, 2011 at 4:14 PM, Stephan T. Lavavej <stl@exchange.microsoft.com> wrote:
[Alf P. Steinbach]
I found that the Visual C++ implementation of the C library i/o generally does not support console input of international characters. It can deal with narrow character input from the current codepage, if that codepage is not UTF-8.
Changing the console's codepage isn't the right magic. See http://blogs.msdn.com/b/michkap/archive/2008/03/18/8306597.aspx
With _O_U16TEXT, VC8+ can write Unicode to the console perfectly. However, I believe that input was broken up to and including VC10, and that it's been fixed in VC11.
(I don't know about UTF-8. For reasons that are still mysterious to me, UTF-8 typically isn't handled as well as people expect it to be. Windows really really likes UTF-16 for Unicode. In practice, this is not a big deal, because UTF-8 and UTF-16 are losslessly convertible.)
I've found that for a multi-platform library, the most straight-forward strategy for handling Unicode is to use UTF-8 which when running on Windows gets converted to UTF-16 just before calling a SomethingSomethingW function. Why not the other way around (use UTF-16 and convert to UTF-8 before calling Posix functions)? Because: - most portable Unicode-aware libraries use UTF-8, - many unaware libraries just work with UTF-8, - even on Windows, last time I checked MinGW still doesn't support std::wstring which makes it difficult to manage UTF-16 strings (assuming portability is important.) Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode