Re: [boost] Re: Any interest in adding unicode support to boost?

20 Oct 2004

      On Tue, 19 Oct 2004 18:32:50 +0200, Erik Wien <wien@start.no> wrote:
...
----- Original Message -----
From: "Rogier van Dalen" <rogiervd@gmail.com>
...
I've recently started on the first draft of a Unicode library.
Interesting. Is there a discussion going about this library that I have
missed, or haven't you posted anything about it yet?  I'd hate to start
something like this, if there is already being made an effort on the
subject.
It's in the planning stage; I have a preliminary implementation of
some parts. Your message made me bring out my ideas into the public.
...
...
I think a definition of unicode::code as uint32_t would be much
better. Problem is, codecvt is only implemented for wchar_t and char,
so it's not possible to make a Unicode codecvt without manually adding
(dummy) implementations of codecvt<unicode::code,char,mbstate_t> to
the std namespace. I guess this is the reason that Ron Garcia just
used wchar_t.
I don't really feel locking the code unit size to 32bits is a good solution
either as strings would then become unneccesarily large.
As I tried to show, the choice of the underlying buffer is templated.
This could be std::string, or an SGI rope<wchar_t>, or anything else.
A char-based buffer would automatically make it a UTF-8-encoded
string, etcetera. I agree with you (and with the Unicode standard)
that using strings of UTF-16 is probably best for most practical
applications. The interface should IMHO always use UTF-32 (I agree
with the Unicode standard here too):
codepoint_string<...> s = ....;
I think *s.begin() should return a UTF-32-encoded codepoint.

The codecvt class converts to UTF-32 because it didn't occur to me to
do anything else; and why would you?

Regards,
Rogier