RE: [boost] Re: Any interest in adding unicode support to boost?

"Miro Jurisic" <macdev@meeroh.org> wrote in message news:macdev-
I am not sure I buy this. I think that if you want to have unchecked Unicode data, you should use a vector<char*_t>. Unicode strings have well-defined invariants with respect to canonicalization and well-formedness, and I think that the a Unicode string abstraction should enforce those invariants.
Having intermediate states that are invalid and a final state that is valid is not a feature, it's a bug. It's a silent failure
From: Eric Niebler Erik Wien wrote: that I want
to know about.
Amen. ;)
No fair bringing religion into this. ;-) I'll repeat what I said before -- this would be an unfortunate design, and you'll hear about it from your users. If you force people to do their bit twiddling in vector<char*_t>, then you impose an extra allocation and a copy to get it into a unicode::string, and most people won't bother.
If it imposes a copy I certainly won't use it. What I'm interested in are functions to compare utf* encoded arrays and create sort keys from those same arrays. In truth, all I need are unicode aware versions of strcoll, strxfrm, and strlwr that don't require locking a mutex around the global locale in my multithreaded code. But other things such as substring and regex matching would also be welcome. I'm sure its already been discussed adnausem, but the fact that atof, ostringstream ctor/dtor, printf, tolower, etc all may mutex around access of a global locale has forced me to strip them out of quite a bit of code. In my tests this was fine on a uniprocessor, marginal on a dual, but on a quad or better it was the single biggest bottleneck. I almost always want to treat strings as char arrays. Most operations are simply copying/moving or checking to see if the string is in a set or hash. If I need to do a lot of locale aware comparisons I'm going to generate sort keys first. Glen
participants (1)
-
Glen Knowles