Re: [boost] Boost.Unicode (was Re: Boost.Locale)

15 Dec 2010

      On 15/12/2010 18:50, Artyom wrote:
...
Few notes or questions, you say that your library is locale agnostic,
I see a contradiction between what you say and what you need to implement
My personal belief is that the locale matters for few things, and it's a 
big burden to set up and manage.

So if I can avoid having to choose one, I'd rather do that, and only 
specify one when I really need it.
...
1. AFAIK boundary analysis is locale dependent.
Tailoring of break properties is not supported: the default values are used.
The specification in question (UAX #29) barely mentions tailoring anyway.

A possibility to achieve a locale-dependent behaviour here would be to 
swap the database with a tailored one.
...
2. case  conversion - is locale dependent - for example if the locale is Turkish
    then upper("i")=="İ" while upper("i")="I" for other languages.
Simple case conversions are the easy 1:1 language- and context-agnostic 
mappings.

I can't do the more complex conversions because they depend on specific 
languages and contexts.

Thankfully case folding is not language- nor context-dependent, and is 
probably what most people want rather than case conversion.
...
3. collation - **is** locale dependent as text sorting in different languages
    is very different - even if they use same script (Latin for example)
Yes, it definitely is; but you could still have a "general" collation 
that would work well enough for most languages.

I said it in 'maybe', but I had forgotten how complicated the official 
algorithm was. So I won't do the collation support before a while.

Re: [boost] Boost.Unicode (was Re: Boost.Locale)

Mathias Gaunard