[boost] Re: Boost Unicode support ideas

15 Apr 2004


      In article <022101c4220f$a0c363f0$a8500352@fuji>,
 "John Maddock" <john@johnmaddock.co.uk> wrote:
...
...
...
...
- The standard facets (and the locale class itself, in that it is a
   functor for comparing basic_strings) are tied to facilities such as 
   std::basic_string and std::ios_base which are not suitable for 
   Unicode support.
Why not?  Once the locale facets are provided, the std iostreams will
"just
work", that was the whole point of templating them in the first place.
I have already gone over this in other posts, but, in short, std::basic_string
makes performance guarantees that are at odds with Unicode strings.
Basic_string is a sequence of code points, no more no less, all performance 
guarentees for basic_string can be met as such.
If all you want basic_string for is a sequence of code points, you should use a 
vector<codePointT> instead, as vector does not provide additional methods that 
would be at best deceptive and at worst dangerous when applied to Unicode 
strings.
...
Iterator adapters for normalisation / composition / compression would also be 
useful additions.
Likewise adapters for iterating "characters" and "glyphs".
Leaving compression out, as I don't see what it has to do with Unicode strings 
per se, I don't think they would be useful additions, I think they would be 
required in order a boost Unicode library to meet my expectations.
...
Working on sequences of code points always requires care: clearly one could 
erase a low surrogate and leave a high surrogate "orphanned" behind for 
example.  One would need to make it clear in the documention that potential 
problems like this can occur.
It is precisely because this interface is dangerous that I believe that it 
should not be the default interface to a Unicode string. It is rarely useful and 
often harmful. It does not make it easy to do things right.
...
Unicode is such a large and complex issue, that it's actually pretty hard to 
keep even a small fraction of the issues in ones mind at a time, hence my 
suggestion to split the issue up into a series of steps.
The problem is that I think that some of the steps you propose do not take us in 
the direction of a useful Unicode string abstraction in boost, but merely 
provide convenient wrappers for the simple problems without tackling the 
complicated problems. I don't have a problem with solving simple problems first, 
but I would like to have a reason to believe that solving those simple problems 
gets us closer to solving the hard problems at a later time; I am not convinced 
the approach you proposal fits that bill.

meeroh

-- 
If this message helped you, consider buying an item
from my wish list: <http://web.meeroh.org/wishlist>

[boost] Re: Boost Unicode support ideas

Miro Jurisic