
On Fri, Jan 21, 2011 at 5:48 PM, Robert Ramey <ramey@rrsd.com> wrote:
Matus Chochlik wrote:
Using a name like utf8_t or u8string, string_utf8, etc. at least to me (and I've consulted this off the list, with several people) suggests, that UTF-8 is still something special and IMO also sends the message that it is OK to remain forever with the various encodings and std::string as it is today.
rather than viewing std::string as a sequence of character encodings, view it as a sequence of bytes along with a few extra functions compared to std::vector. Lot's of programs use std::string in this way without depending upon any behavior related to character encoding.
Of course, this is what has been during the discussion referred-to as encoding agnostic usage. But if I use a string to refer to the same thing on different platforms (path, url, proper name, etc.) then I would like that the byte-sequence would be the same, for the following reason: Today data are commonly sent over network between computers with different platforms and even if on one machine you don't care about which byte sequence represents a string of logical characters you have to worry about it when you send it to another machine because it might interpret the sequence differently. To avoid data corruption during this process there has to be an agreement on a common representation at some point during the transfer. In the past this was not such a big deal because computers were standalone and the transcoding could be handled manually. But today moving data around is so prevalent that it becomes unfeasible to do it explicitly.
now, consider utf8_string as a sequence of character encodings which might be implemented in terms of std::string. It's a different thing and should have a different thing.
This would mean that if someone uses for example a class member variable that you intended to be just a byte sequence as a character sequence he would have to make a copy.
We should *IMO* endorse the opposite.
It is not our proper role to endorse or deprecate programming practices. It's a fools errand in any case. The best anyone can do is provide alternatives and explain why he thinks they are superior.
OK, by "endorsing" I meant here not just talking about it and convincing people that it is superior without proving it, (as it become clear to me in the other thread of the debate) but actually implementing something better as the current std::string with the properties described above and let the "market" decide. But in the end you have to believe in what you are doing.
My suggestion is the following:
Let us create a class called boost::string that will have all the properties that a string handling class in 2011+ A.D.
What happens in 2021 A.D. when it is discovered that "they did it wrong".
Then the people who find that out, will do a lot of complaining about it and eventually they will create something even better. I'm not as naive as to think that we create a string class which will be used for the next 500 years :) But if we create something that will make the life in the next 10-20 years easier, than it will be worth the effort.
should have, basically what std::string should have been.
what you (or we, or someone else) thinks string should have been.
Of course I don't think that I alone can come up with the "uber_string", but this is Boost with all its gurus :) so if there is a place where a good string class can be born then it is IMO here.
This idea depends upon a few presumptions which are not true. a) that std::string is used only for character encodings.
No, I imagine it to be (partially) backward compatible with std::string, but also to have Unicode-aware features, so it can be used as both the byte sequence and the logical-character sequence.
b) that someone can know all the things that std::string might be used for as it is
I think we can do reasonable assumptions.
c) that someone now has the knowledge to design a new version of std::string which will never need be changed.
I never said anything like this, see above.
Basically, if you're going to make a "new" thing - fine - just make sure you give it a new name.
I'm not thinking about it as a completely new thing, more like future std::string 2.0, an upgrade not a replacement. BR, Matus