
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org> On 30/04/2011 18:45, Vladimir Prus wrote:
On 26/04/2011 11:17, Sebastian Redl wrote:
GCC has options to control both the source (-finput-charset) and the execution character set (-fexec-charset). They both default to UTF-8. However, MSVC is more complicated. It will try to auto-detect the source character set, but while it can detect UTF-16, it will treat everything else as the system narrow encoding (usually a Windows-xxxx codepage) unless the file starts with a UTF-8-encoded BOM. The worse problem is that, except for a very new, poorly documented, and probably experimental pragma, there is *no way* to change MSVC's execution character set away from the system narrow encoding.
A long time ago, I asked Vladimir Prus to help me add an option to Boost.Build that would allow to automatically prepend the BOM to source files when using MSVC, but unfortunately he was never able to help me do this.
Well, if you have a command that can prepend BOM to a file, you can easily modify 'actions compile-c-c++' in msvc.jam to run that command.
It would be nice if I could only do this when the source files have been tagged as utf-8 or something like that.
Few points: 1. -fexec-charset in MSVC can be simulated with #pragma setlocale(".XXXX") where XXXX is the codepage. However 65001 (UTF-8) can't be used! 2. -finput-charset can be either defined by the same setlocale pragma and can't be 65001 (UTF-8) as well, and it can be UTF-8 if you add BOM. But in fact BOM is needed for files that contain non-ASCII characters. But the bigger question is what exactly do you want to do with BOM and how it would help you to make the "cross-platform" software? If you write for MSVC add BOM in first place, if you work for cross platform/compiler software MSVC incompatibility with the rest of the world would actually make it impossible to use UTF-8 in cross platform way because the only real Unicode strings with MSVC would be L"" and they are actually would be encoded with UTF-16 encoding while all non-Windows world uses UTF-32 as wide character encodings. So basically I can say that untill Microsoft Visual Studio team would take UTF-8 seriously and either support 65001 codepage as expected or provide GCC's like options for input and exec encodings I don't see how this BOM would be useful. Does anybody know how to open a bug or feature request for MSVC? Such that MSVC11 /201[^0] would support it properly? My $0.02 Artyom