
On Sun, Jan 29, 2012 at 02:49, Mathias Gaunard <mathias.gaunard@ens-lyon.org
wrote:
On 01/28/2012 08:48 PM, Yakov Galka wrote:
The user can just write
cout<< u8"您好世界";
Even better is:
cout<< "您好世界";
which *just works* on most compilers (e.g. GCC: http://ideone.com/lBpMJ) and needs some trickery on others (MSVC: save as UTF-8 without BOM).
No, that's just wrong. That's not the model that C++ uses. By not storing it with the BOM, you're essentially tricking MSVC into believing it is ANSI (windows-1252 on western systems), and thus avoiding source character set to the execution character set, since those happen to be the same.
The way a C++ compiler is supposed to work is that all of your source is in the source character set, regardless of the type of string literal you use. Then the compiler will convert your source character set to the execution character set for narrow string literals, to the wide execution character set for wide string literals, to UTF-8 for u8 literals, etc.
Sorry for not being clear enough. I agree and I've not said otherwise. The second 'cout' line *is* a hack. I admit it won't work if you mix such string literals with wide literals or external identifiers containing Unicode. The intent was to show how it could be done if the effort was focused on making narrow string literals "Unicode compatible". [...] What probably should be done is that compilers should be compelled to
support UTF-8 as the source character set in a unified way.
Yes, it could be nice. It would solve half the problem, which is a huge step forward given the current mood of the committee. However, embedding Unicode string literals in source code is still not something you routinely do. Internationalization usually uses external string tables. I once asked volodya if it were feasible to implement this in the build
system (add a BOM for MSVC), but he didn't seem to think it was worth it.
I don't understand. MSVC already understands BOM, and GCC has already been fixed according to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33415(didn't test it). On Sun, Jan 29, 2012 at 03:12, Mathias Gaunard <mathias.gaunard@ens-lyon.org
wrote:
I think you should consider the points being made in N3334. While that proposal is in my opinion not good enough, it raises an important issue that is often present with std::string-based or similar designs.
A function that takes a std::string, or a boost::filesystem::path for that matter, necessarily causes the callee to copy the data into a heap-allocated buffer, even if there is no need to.
Use of the range concept would solve that issue, but then that requires making the function a template. A type-erased range would be possible, but that has significant performance overhead. a string_ref or path_ref is maybe the lesser evil.
+1 This topic has been raised here in program-options context: http://boost.2283326.n4.nabble.com/program-options-Some-methods-take-const-c... -- Yakov