[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      ...
In article <cl268e$okv$1@sea.gmane.org>, "Robert Ramey" <ramey@rrsd.com>
wrote:
...
I think you should spend a little more time investigating the following:
a) The "vault" files section has code by A Barbati which addresses issue
related to unicode.
b) Ron Garcia contributed codecvt facets for unicode that have been
incorporated into boost are currently used by two boost libraries
(serialization and program options.)
c) asni library functions exist for converting strings and characters
to/from wstrings/wchar s in accordance with the currently selected
locale.
Not all libraries implement these functions however.
So its not clear to me what exactly needs to be done here - other than
fixing up some older stdandard libraries.  I don't think that's what you
had
in mind.
There is a lot Unicode work to be done in the standard C++ library and
boost.
C++ currently has no Unicode-aware string abstraction, and this is a big
...
for anyone who has to deal with Unicode strings in C++ code. std::string
is
poorly suited for any Unicode-savvy work, for many reasons -- mainly
having to
do with the fact that std::string and STL and boost algorithms using
std::string::iterator don't know how to handle strings in accordance with
"Miro Jurisic" <macdev@meeroh.org> wrote in message
news:macdev-320EB0.01505419102004@sea.gmane.org...
problem
the
...
Unicode spec.
Hmmm - it would never occur to me to use std::string for characters wider
than 8 bits.  My studied this issue in some detail and concluded that one
uses unicode or othe 2 ro 4 byte encoding, the simplest and most natural way
is to use std::wstring (a synonym for std::basic_string<wchar_t>.  At this
point the only issues would be

a) implementations which are not based on basic_string (I don't know if
there are any of these around)
b) input/output to other encoding such as utf-8 or ? - this is handled by
codecvt facets.

I believe that STL and boost algorithms that handle std::string can (or
should) be able to handle any std::basic_string<?> . That is my basis for
the view that unicode shouldn't be a big issue.  Of course if one want's to
handle unicode as std::string containing - say UTF-8 encoding of unicode
characters - then that would be a separate issue.  I don't think anyone
would want to do that.

I'm willing to be convinced I'm wrong about this - but  I just don't see it
yet.

Robert Ramey