Re: [boost] Boost.Unicode (was Re: Boost.Locale)

16 Dec 2010

      ...
...
2. case  conversion - is locale dependent - for  example if the locale is 
Turkish
    then upper("i")=="İ"  while upper("i")="I" for other languages.
Simple case conversions are the  easy 1:1 language- and context-agnostic 
mappings.
I can't do the more  complex conversions because they depend on specific 
languages and  contexts.
Thankfully case folding is not language- nor context-dependent,  and is 
probably what most
people want rather than case  conversion.
Then don't do case conversion!

Do just case folding. For such "simple" and incorrect
case conversion I don't need sophisticated Unicode library, I can use use 
standard
operating system API and even std::locale::ctype very successfully
(which I do in Boost.Locale if user prefers to use non-icu based backend)

Case conversion is:

- context dependent: Greek letter "Σ" is converted to "σ" or to "ς", according 
to position in the word.
- locale dependent: Turkish i goes to İ
- not 1-to-1: German ß goes to SS in upper case.

So if you don't do this right, just don't do it.
I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may 
be wrong.
...
Yes, it definitely is; but you  could still have a "general" collation that 
would work
well enough for most  languages.
For general collation that works "well" in most languages I can use strcmp... I 
don't
need Unicode library for this.

Artyom