
On Sat, 12 Feb 2011 20:19:09 +0100 Matus Chochlik <chochlik@gmail.com> wrote:
What do you use string.length() for? :-) Efficiently providing an answer to that is one of several things the UTF string classes keep track of it for.
std::string::length specifies the amount of memory required to represent it as encoded, and is useful if you intend to pass it to something else as a char array, length pair. Given that number of code points is directly related to neither the memory required nor the number of logical characters/glyphs/size it will take up to display, it seems it is unlikely to be useful in many cases. [...]
How about size() returning the required storage size for the string as in number of bytes and length() returning the number of code points?
Wouldn't that confuse any STL algorithm that uses the number of elements? Anything that cares about the number of elements seems to use size() to retrieve it, since length() is only provided by strings. In any case, both measurements are easily available already. T.length() (or T.size()) gives the length in code-points, i.e. the size it would be as a UTF-32 string. T.coded() exposes the underlying encoded type, so T.coded().length() gives the amount of memory needed for the encoded data.
length() could be used for example when allocating an array of code-points (char32_t) where the string could be 'expanded' from UTF-8 for algorithms that require true random-access.
True, though the utf32_t type makes that unnecessary most of the time. -- Chad Nelson Oak Circle Software, Inc. * * *