
On Mon, Apr 25, 2011 at 6:04 AM, Artyom <artyomtnk@yahoo.com> wrote:
From: Ryou Ezoe <boostcpp@gmail.com>
Number and Date formatting: There are so many possible ways to express numbers. Some people want comma separation by 3 digits, other want 4 digits. Some want to be 100万(万 means 10000). some want 百万(百 means 100)。 Formatting based on locale doesn't work because there is no uniform format.
Have you actually read the manuals?
This is the output of :
std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ;
in ja_JP.UTF-8 locale
1000000 1,000,000 百万
Not so bad, isn't it? Not bad. Still I doubt anybody want to use Boost.locale just for that.
Collation and Conversions: Japanese doesn't have concepts of case and accent. Since we don't have these concepts, we never need it.
Irrelevant, even when this feature not required for CJK it is required like many other things (spaces, plural forms for other languages)
Boundary analysis: What is the definition of boundary and how does it analyse? It sounds too smart for such a small things it actually does. I'd rather call it strtok with hard-coded delimiters. Japanese doesn't separate each words by space. So unless we perform really complicated natural language processing(which is impossible to be perfect since we never have complete Japanese dictionary), we can't split Japanese text by words.
Ok this is word splitting
|私|は|日本|の|東京都|に|住|んでいます|。|私|は|大|きな|家|に|住|んでいます|。
of the text:
私は日本の東京都に住んでいます。私は大きな家に住んでいます。
To me, it looks like splitting by contiguous kanas and kanzis. I don't think I ever need that kind of splitting.
I assume it is not perfect and I don't know Japanese to say but I can see at lease that words like:
私 - I 日本 - Japan 東京都 - City of Tokyo
But this is not only defined by "space-based" separation. Also for some languages like Thai ICU uses dictionaries.
So it is not naive algorithm that separates text by spaces.
Also, Japanese doesn't have a concept of word wrap. So "find appropriate places for line breaks" is unnecessary. Actually, there are some rules for line break in Japanese. These rules are too complicated and it requires more than text processing. Same for Chinese and Korean.
This is possible line-break separation of the same sentences above.
|私|は|日|本|の|東|京|都|に|住|ん|で|い|ま|す。|私|は|大|き|な|家|に|住|ん|で|い|ま|す。|
At least I can see that it does not allows to start a line with "。" .
We have a lot of characters that should not be the initial character of a line. But there is no uniform rule. And it must be work along with font rendering. Simple text processing doesn't suffice.
Of course, strtok is still a handy tool and I appreciate yet another design. But I think it's better be handled by more generic library, like Boost String Algorithms.
It far more complicated then strtok.
Bottom line I see that you hadn't really try to use this library or understand how it works.
I'm sorry but it makes me doubt about the review you had sent.
Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Ryou Ezoe