
On 15/12/2010 18:50, Artyom wrote:
Few notes or questions, you say that your library is locale agnostic, I see a contradiction between what you say and what you need to implement
My personal belief is that the locale matters for few things, and it's a big burden to set up and manage. So if I can avoid having to choose one, I'd rather do that, and only specify one when I really need it.
1. AFAIK boundary analysis is locale dependent.
Tailoring of break properties is not supported: the default values are used. The specification in question (UAX #29) barely mentions tailoring anyway. A possibility to achieve a locale-dependent behaviour here would be to swap the database with a tailored one.
2. case conversion - is locale dependent - for example if the locale is Turkish then upper("i")=="İ" while upper("i")="I" for other languages.
Simple case conversions are the easy 1:1 language- and context-agnostic mappings. I can't do the more complex conversions because they depend on specific languages and contexts. Thankfully case folding is not language- nor context-dependent, and is probably what most people want rather than case conversion.
3. collation - **is** locale dependent as text sorting in different languages is very different - even if they use same script (Latin for example)
Yes, it definitely is; but you could still have a "general" collation that would work well enough for most languages. I said it in 'maybe', but I had forgotten how complicated the official algorithm was. So I won't do the collation support before a while.