Re: [boost] [strings][unicode] Proposals for Improved String Interoperability in a Unicode World

29 Jan 2012

      ...
From: Yakov Galka <ybungalobill@gmail.com>
[...] What probably should be done is that compilers should be compelled to
...
support UTF-8 as the source character set in a unified way.
Yes, it could be nice. It would solve half the problem, which is a huge
step forward given the current mood of the committee. However, embedding
Unicode string literals in source code is still not something you routinely
do. Internationalization usually uses external string tables.
Not right. Sometimes you do want non ASCII symbols in the source code,
what is wrong to have © in the text or € symbol in the code.

Also the fact that C++ does not define Unicode source code is
standard design problem, there is nothing wrong to have
Unicode literals in the source code.

In fact the ONLY modern compiler that deos not suppor them is Vistual Studio,
all others I had ever used (gcc, clang, intel, sunstudio) work fine
with UTF-8.
...
I once asked volodya if it were feasible to implement this in the build
...
system (add a BOM for MSVC), but he didn't seem to think it was worth 
it.
I don't understand. MSVC already understands BOM, and GCC has already been
fixed according to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33415(didn't test it).
Few points.

1. BOM should not be used in source code, no compiler except MSVC uses it and most
   do not support it.

   BOM is totally stupid for UTF-8 as it does not have "byte order" so it should 
   just die for UTF-8.

2. Setting UTF-8 BOM makes narrow literals to be encoded in ANSI encoding which
   makes BOM useless (crap... sory) with MSVC even more.

Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/