Re: [boost] [string] --> [text] ?

28 Jan 2011

      On Fri, Jan 28, 2011 at 6:47 AM, Gregory Crosswhite
<gcross@phys.washington.edu> wrote:
...
Since there has been a lot of talk about what the name of a new immutable
string class should be, may I toss the name "boost::text" into the ring?
Hmm... Unfortunately it denotes the wrong thing for my case.
...
 The advantage of this name is that it explicitly conveys what it is meant
for: working with human-readable text encoded in some
implementation-specific form.  The name "string" would then continue to have
its current interpretation as a string of contiguous 8-bit chars.
Right, so then I can keep saying 'string' and meaning it in the
computer science context. :)
...
It has also been suggested that different classes be created for different
UTF encodings.  I propose that boost::text have the internal encoding be an
implementation (and potentially platform-specific) detail.  Since at the end
of a serious of manipulations with the rope-like data structure one will
have to do a final transformation to convert the text into a string of bytes
anyway, that provides a natural point at which the desired encoding of the
string of bytes can be specified.
This was the point for my 'view' template idea. That the view would
give some semblance of encoding appropriately.
...
That is, given a boost::text object "t",
one could convert it into a UTF-8 string by calling "t.utf8_c_str()", a
UTF-16 string by calling "t.utf16_c_str()", and so on, depending on what the
underlying API is expecting.
And then you run into the problem of having a ton of member functions
that do encapsulate the logic instead of having multiple types to do
the conversion instead. The member functions idea will not scale
appropriately and would be a hell to manage.
...
Some of these calls might require recoding the
text to a different encoding, so the internal encoding of boost::text could
be optimized to whatever is most likely to be needed on that platform so
that it is least likely to need recoding.  Alternatively, the encoding could
be specified as a parameter to the constructor and be carried around as a
runtime parameter since nobody needs to know what it is until the final
encoding of the string.
Hmmm... So why isn't boost::text just a typedef to `view<some_encoding>`?

And more to the point, why do you need to make the final encoding a
runtime choice when it can easily be made a compile-time choice? Even
if you needed to switch appropriately you can always linearize it into
a character buffer at some point in time.

-- 
Dean Michael Berris
about.me/deanberris