
On Thu, Aug 11, 2011 at 14:41, Daniel James <dnljms@gmail.com> wrote:
On 11 August 2011 12:03, Artyom Beilis <artyomtnk@yahoo.com> wrote:
The problem is policy the problem is Boost just can't decide once and forever that std::string is UTF-8...
Even if there was a consensus within boost, that isn't feasible. We don't own std::string, so we don't have a say in what it represents.
Of course it's feasible. We have the right to say what it represents in the interface of *our* libraries. If Boost.ProgramOptions, Boost.Locale and Sqlite did it, surely we can adopt this policy to the rest of the libraries. There's a lot of existing code which is not based on that assumption -
we can't just wish it out of existence and boost should be compatible with it.
Most of existing code working with plain chars is either encoding agnostic or is already wrong. As per the design of the proposed library: It mixes two orthogonal concepts, namely encoding and storage. The two shall be separate. I don't like reference counted strings. Passing strings by reference is not that hard. Moreover, lots of atomic memory-bus locks in a multiprocessor system degrade performance. The 'unicode' support (codepoint iteration, etc) is purely algorithmic and thus shall be independent of the way the data is stored. I wold like to see something like `codepoints(any_char_iterator_range)` returning a range of codepoints. -- Yakov