Re: [boost] [string] proposal

28 Jan 2011

      On Thu, Jan 27, 2011 at 10:57 PM, Patrick Horgan <phorgan1@gmail.com> wrote:
...
On 01/27/2011 04:45 AM, Matus Chochlik wrote:
...
... elision by patrick ...
In general? Nothing. I do not have (nor did I have in the past)
anything against a general efficient encoding-agnostic string
if it is called general_string. But std::string IMO is and always
has been primarily about handling text. I certainly do not know
anyone who would store a MPEG inside std::string.
You may think it strange, but there's a lot of code out there that uses
std::string as a binary buffer.
Your're right, just because I don't use it that way does not mean that
it cannot be done, that is why I said that I'm OK if we call it 'text' instead
of string in one of my previous posts.

[snip-of-things-that-we-basically-agree-upon/]
...
...
Usability. It is usually more difficult to use the super-generic
everything-
solving things. I again for probably the 10-th time repeat that I'm not
against
such string in general but this is not std::string.
And neither would a string that enforced utf-8 encoding be std::string.  We
already have one in the spec, and it's not that.
Yes, also see above. But the main reason why I strongly oppose
any mentioning of 'utf8' in the name of the general-text-handling-class
is basically the same as why I would oppose the
general-floating-point-hanling-classes in C++ to be called
'IEEE_754_float' and 'IEEE_754_double' instead of just plain
'float' and 'double'.

I (and many others around here) have dealt with various text
encodings and all those problems they cause in "non-ascii"
environments, so many times, that my blood pressure skyrockets :)
every time I hear that term.
And I do not want to be reminded about it  every time when dealing
with text. Let us mention the encoding only when necessary.

[snip/]
...
No.  You're not trying to solve the same problem at all!  (And neither of
you are trying to deal with std::string.)
You, Dean, are trying to solve an efficiency problem caused by mutable
strings, and note that an external view can interpret as any encoding
desired.  You correctly point out that this is more general and flexible,
that it has a power that can be applied to many things while giving you all
the efficiency advantages of immutable data types.  (Although why a general
buffer for immutable data would be called string which is normally
associated with text _is_ a bit confusing.  I suspect you've gone down a
road you never intended trying to make this point.)
You, Matus, are trying to solve a problem caused by a plethora of possible
encodings and the extra work that has to be done every time you have to deal
with them, by specifying that a string will have an encoding type associated
with it, (and in particular utf-8 as the natural default), and that the
specialized string itself will enforce the encoding as well as provide ways
to convert other encodings to it.  (And I think the natural way to do this
is with code conversion facets.)  You correctly point out that this
specificity allows a power in solving this one particular problem that a
more general solution wouldn't be able to match.  A general string with a
view into it would allow you to get invalidly encoded data into it (N.B for
an immutable string _into it_ would have a different meaning) and you would
only know about this after the fact.
These are both great things.  Kudos to you both.  You're both right.  You
guys keep arguing apples and orangutans and it makes it hard for others to
talk about either one of your ideas because you're so busy going back and
forth telling each other that the other doesn't get what they're trying to
say.
Believe me, Patrick, I have had the exactly the same feeling (about the
apples and orangutans) the whole time I've participated in the immutable
vs. unicode string discussion. I know that Dean tries to focus
on performance and does not care about encodings and I do care
about performance just not so much Dean, does.

The reason why I kept participating in this 'bike-shed-quarrel' is that
I would hate to see the outcome to be 1 just-another-super-efficient-string
and 1 just-another-unicode-string. There are plenty of those already.

I would like to see the *text* handling in C++ to be addressed
*in the standard* not only on the byte-sequence-level, but on
the code-point/character/word/etc. level.
...
I wish you'd split into threads like [immutable string] and [unicode
string].
I start to like the idea of immutability and if it indeed has
so many advantages I don't see why the text class could
not be build on the immutable_string class.

Best,

Matus