
----- Original Message ----
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
- I need to finish support for word, sentence and line boundaries - The ABI needs to be more clearly defined to guarantee backward and upward compatibility - The convert and segment subsystem must be clearly separated into its own library and namespace - The system must be made SIMD-ready - Simple case conversion should be added - General case folding (and maybe collation) should be added
Nothing among these is particularly difficult.
Few notes or questions, you say that your library is locale agnostic, I see a contradiction between what you say and what you need to implement 1. AFAIK boundary analysis is locale dependent. 2. case conversion - is locale dependent - for example if the locale is Turkish then upper("i")=="İ" while upper("i")="I" for other languages. 3. collation - **is** locale dependent as text sorting in different languages is very different - even if they use same script (Latin for example) Artyom