[boost] Re: Any interest in adding unicode support to boost?

19 Oct 2004

      Erik Wien wrote:
...
Hi. I am in the process of planning a library for handling unicode
strings
in C++, and would like to probe the interest in the boost community
for something like that. I read through the unicode dicussion that
was up back
in april, and from what I could gather there was some amount of
interest,
but no one felt comfortable taking on the task as of yet.
I am hoping to be able to run this project as my Bachelor's Thesis in
Computer Engineering (Not sure if that is the correct translation from
Norwegian.) and if it gets approved by my college, myself and two
other programmers will spend one semester working exclusively on
this. (of course in collaboration with the boost community) At the
end of that semester I
hope the library (Or at least parts of it) will be in such a state it
can submitted for review by boost.
The library should ultimately have suppport for at least basic
handling of unicode strings (in all encodings), collation of strings
and other locale specific operations. The library should also be (to
the extent that is possible) integrated with the standard C++ library
(and boost) to get as
much functionality as possible "for free". I'm here thinking of,
among other things, the std::locale class and compabillity with
iostreams. How these requirements are fulfilled will be determined as
the project (hopefully) moves forward.
A few points you probably already know:

1) Wide characters and Unicode characters are not necessarily the same thing
for any given implementation.
2) There are quite a few Unicode encodings.
3) The idea is to be able to plug in a Unicode encoding into the same
standard library templates and boost templates which now support 'char' and
wchar_t'. In other words ideally you want to treat your Unicode encoding as
just another character type, with extra smarts depending on the encoding.
The extra smarts would be used in specializations.

In the past in comp.std.c++ I attempted to promote the idea that all
standard library functionality which dealt generally in characters and
strings should be parameterized on the character type for the sake of
orthogonality and the future. While most are, there is still some
functionality which does not, ie exceptions and file names and locale
message files, and assume that only narrow characters exist in its usage. I
am still amazed that programmers from countries which would normally use
wide characters as Unicode encodings, such as the Japanese, have not made
more of an issue with this, but perhaps they are so used to their far more
difficult DBCS roots that pursuing wide characters everywhere, much less a
real Unicode encoding, is a minor issue with them.

[boost] Re: Any interest in adding unicode support to boost?

Edward Diener