Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?

20 Oct 2004

      Erik Wien wrote:
...
Right now i have a single encoded_string class that has two template
parameters, namely encoding and encoding_traits. encoding_traits is a
class where all encoding specific implementation is kept, and this
class is used to setup the encoded_string class to correctly represent
strings in the given encoding.
Yes, that's close to what I thought.

Do not repeat the basic_string mistake and make encoding_traits a template 
parameter. A traits class is never used in this way. Your encoding_traits is 
actually a policy.

A traits class is independent of the components that use it. It is basically 
a mapping from a type to something; in your case, a mapping between the 
encoding parameter and the operations.

So, encoding_traits aside, you essentially have string<utf8>.
...
The iterators used are bidirectional, not random access (impossible
on UTF-8 and UTF-16) and they are as of now not constant. It IS
possible to assign a code unit to a UTF-8 encoded string through an
iterator, even if the resulting code unit sequence would be longer
than the one the iterator is pointing to. The underlying container is
automatically resized to make room for the new sequence. (This is of
course slow!)
This is another basic_string mistake that effectively rules out efficient 
reference counting. ;-) Just make the iterators constant. The functionality 
can be obtained with explicit erase/insert/replace members.

Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?

Peter Dimov