Re: [boost] [locale] Review results for Boost.Locale library

1 May 2011

      ...
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
On 30/04/2011 18:45, Vladimir Prus wrote:
...
...
On 26/04/2011 11:17, Sebastian Redl  wrote:
...
GCC has options to control both the source  (-finput-charset) and the
execution character set  (-fexec-charset). They both default to UTF-8.
However, MSVC is  more complicated. It will try to auto-detect the source
 character set, but while it can detect UTF-16, it will treat  everything
else as the system narrow encoding (usually a  Windows-xxxx codepage)
unless the file starts with a  UTF-8-encoded BOM. The worse problem is
that, except for a very  new, poorly documented, and probably
experimental pragma, there  is *no way* to change MSVC's execution
character set away from  the system narrow encoding.
A long time ago, I asked  Vladimir Prus to help me add an option to
Boost.Build that would  allow to automatically prepend the BOM to source
files when using  MSVC, but unfortunately he was never able to help me do
 this.
Well, if you have a command that can prepend BOM to a  file, you can
easily modify 'actions compile-c-c++' in msvc.jam to run  that command.
It would be nice if I could only do this when the source  files have been 
tagged as utf-8 or something like  that.
Few points:

1. -fexec-charset in MSVC can be simulated with 

   #pragma setlocale(".XXXX") where XXXX is the codepage.

   However 65001 (UTF-8) can't be used!

2. -finput-charset can be either defined by the same setlocale pragma
   and can't be 65001 (UTF-8) as well, and it can be UTF-8 if you
   add BOM.

   But in fact BOM is needed for files that contain non-ASCII characters.

But the bigger question is what exactly do you want to do with BOM
and how it would help you to make the "cross-platform" software?

If you write for MSVC add BOM in first place, if you work for
cross platform/compiler software MSVC incompatibility with
the rest of the world would actually make it impossible
to use UTF-8 in cross platform way because the only
real Unicode strings with MSVC would be L"" and they
are actually would be encoded with UTF-16 encoding
while all non-Windows world uses UTF-32 as wide character
encodings.

So basically I can say that untill Microsoft Visual Studio
team would take UTF-8 seriously and either support 65001
codepage as expected or provide GCC's like options
for input and exec encodings I don't see how
this BOM would be useful.

Does anybody know how to open a bug or feature request
for MSVC? Such that MSVC11 /201[^0] would support it
properly?

My $0.02

Artyom