
On Sat, 22 Jan 2011 01:56:36 +0800 Dean Michael Berris <mikhailberis@gmail.com> wrote:
I think strings are different from the encoding they're interpreted as. Let's fix the problem of a string data structure first then tack on encoding/decoding as something that depends on the string abstraction first.
That gets back to the problem that I was originally trying to solve with the UTF types: that a string needs a way to carry around its encoding. A UTF-8 type could be built on such a thing very easily.
Hmm... I OTOH don't think the encoding should be part of the string. The encoding is really external to the string, more like a function that is applied to the string.
It's a property of the string. It may change, but some encoding (even if it's just "none") should be associated with a particular string throughout its existence. Otherwise you might as well use the existing std::string.
If you can wrap the string in a UTF-8, UTF-16, UTF-32 encoder/decoder then that should be the way to go. However building it into the string is not something that will scale in case there are other encodings that would be supported -- think about not just Unicode, but things like Base64, Zip, <insert encoding here>.
I assume that there is some unique identification for each language and encoding, or that one could be created. But that's too big a task for one volunteer developer, so my UTF classes are intended only to handle the three types that can encode any Unicode code-point.
Ultimately the underlying string should be efficient and could be operated upon in a predictable manner. It should be lightweight so that it can be referred to in many different situations and there should be an infinite number of possibilities for what you can use a string for.
You've just described std::string. Or alternately, std::vector<char>. -- Chad Nelson Oak Circle Software, Inc. * * *