Re: [boost] [string] proposal

22 Jan 2011


      On Sat, 22 Jan 2011 01:56:36 +0800
Dean Michael Berris <mikhailberis@gmail.com> wrote:
...
...
...
I think strings are different from the encoding they're interpreted
as. Let's fix the problem of a string data structure first then tack
on encoding/decoding as something that depends on the string
abstraction first.
That gets back to the problem that I was originally trying to solve
with the UTF types: that a string needs a way to carry around its
encoding. A UTF-8 type could be built on such a thing very easily.
Hmm... I OTOH don't think the encoding should be part of the string.
The encoding is really external to the string, more like a function
that is applied to the string.
It's a property of the string. It may change, but some encoding (even
if it's just "none") should be associated with a particular string
throughout its existence. Otherwise you might as well use the existing
std::string.
...
If you can wrap the string in a UTF-8, UTF-16, UTF-32 encoder/decoder
then that should be the way to go. However building it into the string
is not something that will scale in case there are other encodings
that would be supported -- think about not just Unicode, but things
like Base64, Zip, <insert encoding here>.
I assume that there is some unique identification for each language and
encoding, or that one could be created. But that's too big a task for
one volunteer developer, so my UTF classes are intended only to handle
the three types that can encode any Unicode code-point.
...
Ultimately the underlying string should be efficient and could be
operated upon in a predictable manner. It should be lightweight so
that it can be referred to in many different situations and there
should be an infinite number of possibilities for what you can use a
string for.
You've just described std::string. Or alternately, std::vector<char>.
-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*