Re: [boost] [UTF String] UTF String library 1.5 ready for perusal

12 Feb 2011


      On Sat, 12 Feb 2011 11:00:31 -0800
Jeremy Maitin-Shepard <jeremy@jeremyms.com> wrote:
...
...
...
...
The size in code-points *is* the size of the string, according to
the view of the string that the class exposes.
Ok, but what would I actually want to use that for?
What do you use string.length() for? :-) Efficiently providing an
answer to that is one of several things the UTF string classes keep
track of it for.
std::string::length specifies the amount of memory required to
represent it as encoded, and is useful if you intend to pass it to
something else as a char array, length pair.  Given that number of
code points is directly related to neither the memory required nor the
number of logical characters/glyphs/size it will take up to display,
it seems it is unlikely to be useful in many cases.
But for those few cases where it *would* be useful, I see no reason not
to provide it. It costs essentially nothing, since the count is
originally provided by the same function that validates the encoded
data when it's put into a UTF type, and is used for other things as
well. And people are used to being able to retrieve the size of a
string, eliminating that function would discomfort some developers.
...
In cases where there is a limit of the maximum length of a string, I
believe that is almost certainly going to be in terms of the encoded
length in a particular encoding (i.e.g UTF-8 or UTF-16), rather than
in code points.
Well, that's easily available too, via T.coded().length().
-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*