RE: [boost] Re: Any interest in adding unicode support to boost?

22 Oct 2004

      ...
...
"Miro Jurisic" <macdev@meeroh.org> wrote in message news:macdev-
...
I am not sure I buy this. I think that if you want to have 
unchecked 
Unicode data, you should use a vector<char*_t>. Unicode 
strings have 
well-defined invariants with respect to canonicalization and 
well-formedness, and I think that the a Unicode string abstraction 
should enforce those invariants.
Having intermediate states that are invalid and a final 
state that is 
valid is not a feature, it's a bug. It's a silent failure
From: Eric Niebler
Erik Wien wrote:
that I want
...
...
to know about.
Amen. ;)
No fair bringing religion into this. ;-) I'll repeat what I 
said before 
-- this would be an unfortunate design, and you'll hear about it from 
your users. If you force people to do their bit twiddling in 
vector<char*_t>, then you impose an extra allocation and a 
copy to get 
it into a unicode::string, and most people won't bother.
If it imposes a copy I certainly won't use it. What I'm interested in are
functions to compare utf* encoded arrays and create sort keys from those
same arrays. In truth, all I need are unicode aware versions of strcoll,
strxfrm, and strlwr that don't require locking a mutex around the global
locale in my multithreaded code. But other things such as substring and
regex matching would also be welcome.

I'm sure its already been discussed adnausem, but the fact that atof,
ostringstream ctor/dtor, printf, tolower, etc all may mutex around access of
a global locale has forced me to strip them out of quite a bit of code. In
my tests this was fine on a uniprocessor, marginal on a dual, but on a quad
or better it was the single biggest bottleneck.

I almost always want to treat strings as char arrays. Most operations are
simply copying/moving or checking to see if the string is in a set or hash.
If I need to do a lot of locale aware comparisons I'm going to generate sort
keys first.

Glen

Glen Knowles

tags

participants (1)