
On Tue, Jan 31, 2012 at 10:57, Daryle Walker <darylew@hotmail.com> wrote:
----------------------------------------
Date: Mon, 30 Jan 2012 00:24:30 -0800 From: Artyom
----- Original Message -----
From: Beman Dawes <bdawes@acm.org>
What probably should be done is that compilers should be compelled to support UTF-8 as the source character set in a unified way.
Makes sense to me.
Why don't you write up an issue for the C and C++ committees? My
[snip]
Another possibility is to start lobbying compiler vendors, or at least Microsoft, to support UTF-8 both with and without BOM.
It is not only BOM not BOM issue. It is mostly the ability to define execution character set. i.e. character set for normal "some text" literals and the input character set and what is even more important that C++ compilers must support UTF-8 for the two of them.
This probably isn't the right post to respond to, but I don't want to spend forever figuring it out.
Not every system is a 8/16/32(/64)-bit computer using ASCII/Latin-1/UTF-8. C++ (from C) was designed so a user with a 9/36/81-bit EBSDIC system and one with a 8/16/32/64 UTF-16 system can write programs for the other (with the appropriate cross-compiler). We don't want to obnoxiously be prejudiced against systems not matching the current configuration trends.
(I was originally going to write "9/36/72", but then realized that higher types only have to be a multiple of char, not each other, so my new system breaks more common-programmer assumptions. BTW, that's 9-bit bytes (char), 36-bit words (short and int), and 81-bit long-words (long and long-long). I wonder if anyone here can fabricate this custom hardware, to mess people up.)
Daryle W.
Thanks Daryle. I'm aware of this issue and thus restrained from talking about UTF-8 only. The wording I'm interested in is "execution character set is capable of storing any Unicode data". This would mean that it will be UTF-8 on systems having CHAR_BIT==8 and compatible with ASCII, UTF-EBCDIC on IBM mainframes, perhaps UTF-32 on DSP with CHAR_BIT==32 and sizeof(char) == sizeof(long). Yet another option is to restrict the requirement to hosted implementations only. -- Yakov