
On Thu, Jan 27, 2011 at 10:57 PM, Patrick Horgan <phorgan1@gmail.com> wrote:
On 01/27/2011 04:45 AM, Matus Chochlik wrote:
... elision by patrick ... In general? Nothing. I do not have (nor did I have in the past) anything against a general efficient encoding-agnostic string if it is called general_string. But std::string IMO is and always has been primarily about handling text. I certainly do not know anyone who would store a MPEG inside std::string.
You may think it strange, but there's a lot of code out there that uses std::string as a binary buffer.
Your're right, just because I don't use it that way does not mean that it cannot be done, that is why I said that I'm OK if we call it 'text' instead of string in one of my previous posts. [snip-of-things-that-we-basically-agree-upon/]
Usability. It is usually more difficult to use the super-generic everything- solving things. I again for probably the 10-th time repeat that I'm not against such string in general but this is not std::string.
And neither would a string that enforced utf-8 encoding be std::string. We already have one in the spec, and it's not that.
Yes, also see above. But the main reason why I strongly oppose any mentioning of 'utf8' in the name of the general-text-handling-class is basically the same as why I would oppose the general-floating-point-hanling-classes in C++ to be called 'IEEE_754_float' and 'IEEE_754_double' instead of just plain 'float' and 'double'. I (and many others around here) have dealt with various text encodings and all those problems they cause in "non-ascii" environments, so many times, that my blood pressure skyrockets :) every time I hear that term. And I do not want to be reminded about it every time when dealing with text. Let us mention the encoding only when necessary. [snip/]
No. You're not trying to solve the same problem at all! (And neither of you are trying to deal with std::string.)
You, Dean, are trying to solve an efficiency problem caused by mutable strings, and note that an external view can interpret as any encoding desired. You correctly point out that this is more general and flexible, that it has a power that can be applied to many things while giving you all the efficiency advantages of immutable data types. (Although why a general buffer for immutable data would be called string which is normally associated with text _is_ a bit confusing. I suspect you've gone down a road you never intended trying to make this point.)
You, Matus, are trying to solve a problem caused by a plethora of possible encodings and the extra work that has to be done every time you have to deal with them, by specifying that a string will have an encoding type associated with it, (and in particular utf-8 as the natural default), and that the specialized string itself will enforce the encoding as well as provide ways to convert other encodings to it. (And I think the natural way to do this is with code conversion facets.) You correctly point out that this specificity allows a power in solving this one particular problem that a more general solution wouldn't be able to match. A general string with a view into it would allow you to get invalidly encoded data into it (N.B for an immutable string _into it_ would have a different meaning) and you would only know about this after the fact.
These are both great things. Kudos to you both. You're both right. You guys keep arguing apples and orangutans and it makes it hard for others to talk about either one of your ideas because you're so busy going back and forth telling each other that the other doesn't get what they're trying to say.
Believe me, Patrick, I have had the exactly the same feeling (about the apples and orangutans) the whole time I've participated in the immutable vs. unicode string discussion. I know that Dean tries to focus on performance and does not care about encodings and I do care about performance just not so much Dean, does. The reason why I kept participating in this 'bike-shed-quarrel' is that I would hate to see the outcome to be 1 just-another-super-efficient-string and 1 just-another-unicode-string. There are plenty of those already. I would like to see the *text* handling in C++ to be addressed *in the standard* not only on the byte-sequence-level, but on the code-point/character/word/etc. level.
I wish you'd split into threads like [immutable string] and [unicode string].
I start to like the idea of immutability and if it indeed has so many advantages I don't see why the text class could not be build on the immutable_string class. Best, Matus