
On Tue, Apr 26, 2011 at 9:27 PM, Artyom <artyomtnk@yahoo.com> wrote:
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
On 26/04/2011 11:17, Sebastian Redl wrote:
GCC has options to control both the source (-finput-charset) and the execution character set (-fexec-charset). They both default to UTF-8. However, MSVC is more complicated. It will try to auto-detect the source character set, but while it can detect UTF-16, it will treat everything else as the system narrow encoding (usually a Windows-xxxx codepage) unless the file starts with a UTF-8-encoded BOM. The worse problem is that, except for a very new, poorly documented, and probably experimental pragma, there is *no way* to change MSVC's execution character set away from the system narrow encoding.
A long time ago, I asked Vladimir Prus to help me add an option to Boost.Build that would allow to automatically prepend the BOM to source files when using MSVC, but unfortunately he was never able to help me do this.
The problem even if the source is UTF-8 with BOM "שלום" would be encoded according to locale's 8bit codepage like 1255 or 936 and not UTF-8 string (codepage 65001).
It is rather stupid, but this is how MSVC works or understands the place of UTF-8 in this world.
It's not stupid. It's because ANSI version of Win32 API expect these encodings. To me, encoding of ordinary string literal use source file's encoding is a stupid idea.
Unicode and Visual Studio is just broken...
Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Ryou Ezoe