
From: Yakov Galka <ybungalobill@gmail.com>
[...] What probably should be done is that compilers should be compelled to
support UTF-8 as the source character set in a unified way.
Yes, it could be nice. It would solve half the problem, which is a huge step forward given the current mood of the committee. However, embedding Unicode string literals in source code is still not something you routinely do. Internationalization usually uses external string tables.
Not right. Sometimes you do want non ASCII symbols in the source code, what is wrong to have © in the text or € symbol in the code. Also the fact that C++ does not define Unicode source code is standard design problem, there is nothing wrong to have Unicode literals in the source code. In fact the ONLY modern compiler that deos not suppor them is Vistual Studio, all others I had ever used (gcc, clang, intel, sunstudio) work fine with UTF-8.
I once asked volodya if it were feasible to implement this in the build
system (add a BOM for MSVC), but he didn't seem to think it was worth it.
I don't understand. MSVC already understands BOM, and GCC has already been fixed according to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33415(didn't test it).
Few points. 1. BOM should not be used in source code, no compiler except MSVC uses it and most do not support it. BOM is totally stupid for UTF-8 as it does not have "byte order" so it should just die for UTF-8. 2. Setting UTF-8 BOM makes narrow literals to be encoded in ANSI encoding which makes BOM useless (crap... sory) with MSVC even more. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.com/ CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/