
On 01/20/2011 12:52 PM, Mostafa wrote:
... elision by patrick ...
On second thought, is there really a need to access the underlying data of utf8_t? I argue that having a view of the underlying data via iterators accomplishes just as much(*), and is more inline with the stl tradition of containers and iterators, not to mention the better encapsulation it affords the interface. Do clients really need to know, and potentially develop a dependency on, the fact that utf8_t (for now?) is really just a wrapper for std::string? What type would be returned by operator* on the iterator for a utf8_string? char32_t? What do you do about combining characters? Return them one at a time and let the application deal with it? That's what I think. I don't see what else you could do. There's a lot of other issues. Assuming it has the same interface as std::string how would you do max_size()? How about the comparison operators? There's:
template<typename charT, typename traits, typename Allocator> bool operator<=(const basic_string<charT, traits, Allocator>& lhs, const charT* rhs); What would the equivalent be for utf8_string? For the above, the rhs is in effect converted to basic_string for the comparison. For a utf8_string, what if the rhs doesn't convert to utf-8? Should there be some conversion facet able to be specified for the rhs? std::string's comparison operators are supposed to take linear time. These would capacity() is supposed to return the largest number of characters the string can hold without reallocation. Would you return that by considering that the smallest characters would only take one byte? The std::string's operator[] is supposed to work in constant time. This one couldn't. It would be fun to make it, but it would have to differ in some ways from the specification of std::string. How about push_back or insert? What do they take for the argument? A char32_t encoded as utf-32? Of course you'd have to insert combining characters one part at a time. If you have LC_COLLATE set to en_US.utf8 then std::sort should just work. (Replace en_ with whatever is used in your locale.) Patrick