Re: [boost] [string] proposal

21 Jan 2011

      On Fri, Jan 21, 2011 at 6:25 AM, Matus Chochlik <chochlik@gmail.com> wrote:
...
Dear list,
following the whole string encoding discussion I would like
to make some suggestions.
...
From the whole debate it is becoming clear, that
instant switch from encoding-agnostic/platform-native
std::string to UTF-8-encoded std::string is not likely
to happen.
Then it was proposed that we create a utf8_t string type
that would be used *together* (for all eternity) with
the standard basic_string<>. While I see the advantages
here, I (as I already said elsewhere) have the following
problem with this approach:
Using a name like utf8_t or u8string, string_utf8, etc.
at least to me (and I've consulted this off the list,
with several people) suggests, that UTF-8 is still
something special and IMO also sends the message
that it is OK to remain forever with the various encodings
and std::string as it is today. We should *IMO* endorse
the opposite.
IMO, Any serious Unicode string proposal has to address UTF-8 strings,
UTF-16 strings, UTF-32 strings, and probably UTF strings where the
particular UTF encoding is established at runtime. Applications that
deal with Asian languages, do a lot of random access, or would pay a
performance or storage penalty will demand more than just UTF-8
strings. There might be other variants, too, such as a BMP-string. If
a Unicode string library provides a strong design framework that is
clearly articulated, then an initial implementation would only have to
provide the most needed types; UTF-8 and UTF-16/BMP.

I really doubt any proposal will get taken very seriously is it only
supports one of the UTF encodings.

--Beman

Re: [boost] [string] proposal

Beman Dawes