
On Fri, 21 Jan 2011 01:35:15 -0800 Patrick Horgan <phorgan1@gmail.com> wrote:
On 01/20/2011 12:52 PM, Mostafa wrote:
On second thought, is there really a need to access the underlying data of utf8_t? I argue that having a view of the underlying data via iterators accomplishes just as much(*), and is more inline with the stl tradition of containers and iterators, not to mention the better encapsulation it affords the interface. Do clients really need to know, and potentially develop a dependency on, the fact that utf8_t (for now?) is really just a wrapper for std::string?
What type would be returned by operator* on the iterator for a utf8_string? [...]
Which iterator? ;-) As I'd envisioned it, there would be three: an element iterator using char, a code-point iterator using char32_t, and a true character iterator using a custom class. The custom class might be ugly and hard to work with, but would be guaranteed to do the right thing.
There's a lot of other issues. Assuming it has the same interface as std::string how would you do max_size()? How about the comparison operators? [...]
max_size would have to operate on char elements, as there's no other accurate answer. Comparison operators would either operate on code-points or, through Boost.Locale, characters.
What would the equivalent be for utf8_string? For the above, the rhs is in effect converted to basic_string for the comparison. For a utf8_string, what if the rhs doesn't convert to utf-8? Should there be some conversion facet able to be specified for the rhs?
The more people discuss it, the more I think automatic conversions from std::string to the UTF types is the wrong way to go about it. It would be convenient, and would do the right thing in 90% of cases -- but it would do absolutely the *wrong* thing in the other 10%, where the std::string does *not* contain the encoding that the UTF constructor assumes. And most developers wouldn't think about that until they ran into it the hard way, after their programs were in widespread use.
std::string's comparison operators are supposed to take linear time. [...]
Obviously the hypothetical boost::string would have some slight differences from std::string. It would have to. -- Chad Nelson Oak Circle Software, Inc. * * *