Re: [boost] [locale] Review results for Boost.Locale library

25 Apr 2011


      On Mon, Apr 25, 2011 at 6:04 AM, Artyom <artyomtnk@yahoo.com> wrote:
...
...
From: Ryou Ezoe <boostcpp@gmail.com>
...
Number and Date  formatting:
There are so many possible ways to express numbers.
Some  people want comma separation by 3 digits, other want 4 digits.
Some want      to be 100万(万 means 10000). some want 百万(百 means 100)。
Formatting  based on locale doesn't work because there is no uniform  format.
Have you actually read the manuals?
This is the output of :
  std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ;
in ja_JP.UTF-8 locale
  1000000
  1,000,000
  百万
Not so bad, isn't it?
Not bad.
Still I doubt anybody want to use Boost.locale just for that.
...
...
Collation and Conversions:
Japanese doesn't have concepts of  case and accent.
Since we don't have these concepts, we never need  it.
Irrelevant, even when this feature not required
for CJK it is required like many other things (spaces,
plural forms for other languages)
...
Boundary analysis:
What is the definition of boundary and how does  it analyse?
It sounds too smart for such a small things it actually  does.
I'd rather call it strtok with hard-coded delimiters.
Japanese  doesn't separate each words by space.
So unless we perform really complicated  natural language
processing(which is impossible to be perfect since we never  have
complete Japanese dictionary),
we can't split Japanese text by  words.
Ok this is word splitting
  |私|は|日本|の|東京都|に|住|んでいます|。|私|は|大|きな|家|に|住|んでいます|。
of the text:
  私は日本の東京都に住んでいます。私は大きな家に住んでいます。
To me, it looks like splitting by contiguous kanas and kanzis.
I don't think I ever need that kind of splitting.
...
I assume it is not perfect and I don't know Japanese to
say but I can see at lease that words like:
 私 - I
 日本 - Japan
 東京都 - City of Tokyo
But this is not only defined by "space-based" separation.
Also for some languages like Thai ICU uses dictionaries.
So it is not naive algorithm that separates text by
spaces.
...
Also, Japanese doesn't have a concept of word wrap.
So "find  appropriate places for line breaks" is unnecessary.
Actually, there are some  rules for line break in Japanese.
These rules are too complicated and it  requires more than text processing.
Same for Chinese and Korean.
This is possible line-break separation of the same sentences above.
|私|は|日|本|の|東|京|都|に|住|ん|で|い|ま|す。|私|は|大|き|な|家|に|住|ん|で|い|ま|す。|
At least I can see that it does not allows to start a line with "。" .
We have a lot of characters that should not be the initial character of a line.
But there is no uniform rule.
And it must be work along with font rendering.
Simple text processing doesn't suffice.
...
...
Of  course, strtok is still a handy tool and I appreciate yet another design.
But  I think it's better be handled by more generic library, like Boost
String  Algorithms.
It far more complicated then strtok.
Bottom line I see that you hadn't really try
to use this library or understand how it
works.
I'm sorry but it makes me doubt about the review
you had sent.
Artyom
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Ryou Ezoe