Re: [boost] [locale] Review results for Boost.Locale library

26 Apr 2011

      On 24/04/2011 22:01, Ryou Ezoe wrote:
...
Collation and Conversions:
Japanese doesn't have concepts of case and accent.
Since we don't have these concepts, we never need it.
I believe all CJK characters can be decomposed to radicals, which are 
equivalent, so you could want to do normalization.

Also, converting between halfwidth and fullwidth katakana could have 
some uses.
...
Boundary analysis:
What is the definition of boundary and how does it analyse?
It sounds too smart for such a small things it actually does.
It uses the boundary analysis algorithms defined by the Unicode 
standard, which doesn't use heuristics or anything like that.

Remember Boost.Locale is just a wrapper of ICU, which is the real smart 
library.
...
I'd rather call it strtok with hard-coded delimiters.
Japanese doesn't separate each words by space.
So unless we perform really complicated natural language
processing(which is impossible to be perfect since we never have
complete Japanese dictionary),
we can't split Japanese text by words.
Also, Japanese doesn't have a concept of word wrap.
So "find appropriate places for line breaks" is unnecessary.
Actually, there are some rules for line break in Japanese.
You can still break at punctuation marks, and there are places where you 
should definitely not break.

Thai, Lao, Chinese and Japanese do require the use of dictionaries or 
heuristics to correctly distinguish words. However, the default 
algorithm provided by Unicode still provides a best effort 
implementation without those things.

Re: [boost] [locale] Review results for Boost.Locale library

Mathias Gaunard