
On Fri, Jan 21, 2011 at 6:25 AM, Matus Chochlik <chochlik@gmail.com> wrote:
Dear list,
following the whole string encoding discussion I would like to make some suggestions.
From the whole debate it is becoming clear, that instant switch from encoding-agnostic/platform-native std::string to UTF-8-encoded std::string is not likely to happen.
Then it was proposed that we create a utf8_t string type that would be used *together* (for all eternity) with the standard basic_string<>. While I see the advantages here, I (as I already said elsewhere) have the following problem with this approach:
Using a name like utf8_t or u8string, string_utf8, etc. at least to me (and I've consulted this off the list, with several people) suggests, that UTF-8 is still something special and IMO also sends the message that it is OK to remain forever with the various encodings and std::string as it is today. We should *IMO* endorse the opposite.
IMO, Any serious Unicode string proposal has to address UTF-8 strings, UTF-16 strings, UTF-32 strings, and probably UTF strings where the particular UTF encoding is established at runtime. Applications that deal with Asian languages, do a lot of random access, or would pay a performance or storage penalty will demand more than just UTF-8 strings. There might be other variants, too, such as a BMP-string. If a Unicode string library provides a strong design framework that is clearly articulated, then an initial implementation would only have to provide the most needed types; UTF-8 and UTF-16/BMP. I really doubt any proposal will get taken very seriously is it only supports one of the UTF encodings. --Beman