Fw: [locale] Formal review of Boost.Locale library

Chad Nelson

7 Apr 2011 7 Apr '11

1:01 p.m.

Begin forwarded message: Date: Thu, 07 Apr 2011 08:49:20 +0900 From: Darren Cook <darren@dcook.org> Subject: [locale] Formal review of Boost.Locale library Hello, I'm not on the devel list but just had a look at the Boost.Locale library documentation. Feel free to forward this to the list, or discard it.

...

- What is your evaluation of the documentation?

* No prev/next page links [This should be fixed ASAP, as it discourages evaluating the library] * Confusing first page as it seems to only talk about other libraries, not the one in question. (?) * The examples are not motivating, and just end up looking complex. * I'd have liked to see examples in more languages, in particular some Asian languages, to comfort me that the library has taken their issues into consideration. If the itch being scratched here is just for a couple of European languages then the library may have design flaws. * The backends page ( http://cppcms.sourceforge.net/boost_locale/html/using_localization_backends.... ) looks interesting, and may be the start of justifying the reason for this library to exist. It could be expanded with something more concrete: complete examples of doing something useful with different backends, with timings and exe sizes we could compare.

...

- What is your evaluation of the potential usefulness of the library?

I would choose to use ICU directly, as I couldn't see a reason not to, and I'd be concerned that the wrapper would not have wrapped some obscure function in ICU that I find I end up needing.

...

- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

A quick reading.

...

- Are you knowledgeable about the problem domain?

I've used ICU (it is essential in some situations), though possibly not from C/C++. Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles)

Attachments:

signature.asc (application/pgp-signature — 198 bytes)

Show replies by date

Artyom

7 Apr 7 Apr

1:51 p.m.

Hello,

...

...
- What is your evaluation of the documentation? [SNIP]

* The examples are not motivating, and just end up looking complex.

Can you be more specific?

...

* I'd have liked to see examples in more languages, in particular some Asian languages, to comfort me that the library has taken their issues into consideration. If the itch being scratched here is just for a couple of European languages then the library may have design flaws.

Unfortunately I don't know any East-Asian Language, neither Chinese nor Japanese. So it would be very hard for me to write reasonable sane examples in these languages. I know Hebrew, Russian, English, Ukrainian so these languages are chosen for writing examples and added some other points like Arabic-Locale numbers and some other things. However with their East-Asian lingual problems, relevant unit tests include East-Asian text and I relay heavily on correctness of 3rd part libraries like ICU.

...

* The backends page ( http://cppcms.sourceforge.net/boost_locale/html/using_localization_backends.... ) looks interesting, and may be the start of justifying the reason for this library to exist.

The biggest difference between using and not using ICU backend is linking with ICU library which weights about 18 MB on Linux/x86_64. Timings are really depend on specific cases and not always closely comparable, sometimes I find ICU faster sometimes it slower. I must admit that after recent modifications Boost.Locale's ICU backend is has comparable performance with other backends like std or posix. In may experience when I use Boost.Locale in CppCMS I almost always may use std or posix backends without particular problems. Only when I need more advanced features ICU is required part.

...

It could be expanded with something more concrete: complete examples of doing something useful with different backends, with timings and exe sizes we could compare.

Actually take a look on this table, and general examples, almost everything is usable with std or other backends however the quality and nuances are different for example whether the number 103 in ar_EG.UTF-8 locale represented as "103" or as "١٠٣" So almost all examples are valid (see the table), this is the general idea of multiple back-ends support

...

...
- What is your evaluation of the potential usefulness of the library?

I would choose to use ICU directly, as I couldn't see a reason not to, and I'd be concerned that the wrapper would not have wrapped some obscure function in ICU that I find I end up needing.

To answer I first point you to this: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_icu this: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_icu_api_is... And also to this: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#rationale_why And remind that not everything is implemented in terms of ICU (message translation) and allows to use successfully non-ICU backends. However, in certain cases ICU seems to be only reasonable options (and most full featured). For example on Solaris and FreeBSD were the OS and the GCC compiler does not provide reasonable locales support ICU is only usable backends.

...

Darren

Thanks for the comments. Artyom

Artyom

12 Apr 12 Apr

8:59 p.m.

...

From: Chad Nelson <chad.thecomfychair@gmail.com>

...

Begin forwarded message: [snip]

...
- What is your evaluation of the documentation?

[snip]

* Confusing first page as it seems to only talk about other libraries, not the one in question. (?)

After reading the introduction page once again I had realized it may be a little bit misleadings. So there is the rewritten text I think should appear in general description of Boost.Locale library: ---------------------------------------------------- Boost.Locale is a library that provides high quality localization facilities in a C++ way. It gives powerful tools for development of cross platform localized software - the software that talks to user in its language. Provided Features: * Correct case conversion, case folding and normalization. * Collation (sorting), including support for 4 Unicode collation levels. * Date, time, timezone and calendar manipulations, formatting and parsing, including transparent support for calendars other than Gregorian. * Boundary analysis for characters, words, sentences and line-breaks. * Number formatting, spelling and parsing. * Monetary formatting and parsing. * Powerful message formatting (string translation) including support for plural forms, using GNU catalogs. * Character set conversion. * Transparent support for 8-bit character sets like Latin1 * Support for char and wchar_t * Experimental support for C++0x char16_t and char32_t strings and streams. Boost.Locale enhances and unifies the standard library's API the way it becomes useful and convenient for development of cross platform and "cross-culture" software. In order to achieve this goal Boost.Locale uses the-state-of-the-art Unicode and Localization library: ICU - International Components for Unicode. Boost.Locale creates the natural glue between the C++ locales framework, iostreams, and the powerful ICU library. Boost.Locale provides non-ICU based localization support as well. It is based on the operating system native API or on the standard C++ library support. Sacrificing some less important features, Boost.Locale becomes less powerful but lighter and easier to deploy and use library. ----------------------------------------------------------- Artyom Beilis

Gevorg Voskanyan

13 Apr 13 Apr

5:56 a.m.

Artyom wrote:

...

It gives powerful tools for development of cross platform localized software - the software that talks to user in its language.

Granted, English is not my native language, but is it correct to say "talks to user in its language"? Shouldn't be "in his/her language"? Gevorg

Artyom

6:08 a.m.

...

From: Gevorg Voskanyan <v_gevorg@yahoo.com> Artyom wrote:

...
It gives powerful tools for development of cross platform localized software - the software that talks to user in its language.

Granted, English is not my native language, but is it correct to say "talks to

user in its language"? Shouldn't be "in his/her language"?

Gevorg

Yes, I think you are right. It is better to write "his/her language". Artyom

Paul A. Bristow

8:32 a.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Artyom Sent: Wednesday, April 13, 2011 7:09 AM To: boost@lists.boost.org Subject: Re: [boost] Fw: [locale] Formal review of Boost.Locale library

...
From: Gevorg Voskanyan <v_gevorg@yahoo.com> Artyom wrote:

...
It gives powerful tools for development of cross platform localized software - the software that talks to user in its language.

Granted, English is not my native language, but is it correct to say "talks to

user in its language"? Shouldn't be "in his/her language"?

Gevorg

Yes, I think you are right. It is better to write "his/her language".

Is "in the user language" a bit less Politically Correct ;-) Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

Gordon Woodhull

9:32 a.m.

On Apr 13, 2011, at 4:32 AM, Paul A. Bristow wrote:

...

...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Artyom Sent: Wednesday, April 13, 2011 7:09 AM To: boost@lists.boost.org Subject: Re: [boost] Fw: [locale] Formal review of Boost.Locale library

...
From: Gevorg Voskanyan <v_gevorg@yahoo.com> Artyom wrote:

...
It gives powerful tools for development of cross platform localized software - the software that talks to user in its language.

Granted, English is not my native language, but is it correct to say "talks to

user in its language"? Shouldn't be "in his/her language"?

Gevorg

Yes, I think you are right. It is better to write "his/her language".

Is "in the user language" a bit less Politically Correct ;-)

yeah, i would say "talks to the user in the user's language" - or it seems to be popular in English computer science to use "in her language"

Robert Kawulak

10:50 a.m.

...

From: Gordon Woodhull it seems to be popular in English computer science to use "in her language"

Which is silly given that in English grammar the correct generic pronoun used when gender is not known is "he", not "she". This is more a political than linguistic issue. Best regards, Robert

Gordon Woodhull

11:25 a.m.

On Apr 13, 2011, at 6:50 AM, Robert Kawulak wrote:

...

...
From: Gordon Woodhull it seems to be popular in English computer science to use "in her language"

Which is silly given that in English grammar the correct generic pronoun used when gender is not known is "he", not "she". This is more a political than linguistic issue.

And partially OT. Apologies. I stand by avoiding gendered pronouns when referring to users in documentation, but it's up to individual taste. No "its" though. ;-)

Daniel James

11:34 a.m.

On 13 April 2011 12:25, Gordon Woodhull <gordon@woodhull.com> wrote:

...

I stand by avoiding gendered pronouns when referring to users in documentation, but it's up to individual taste. No "its" though. ;-)

Just say, "talks to the user in their language". Here's a discussion at a more appropriate forum: http://english.stackexchange.com/questions/48/gender-neutral-pronoun

Scott McMurray

4:16 p.m.

On Wed, Apr 13, 2011 at 04:34, Daniel James <dnljms@gmail.com> wrote:

...

Just say, "talks to the user in their language".

Of course, the link mentions that mixing numbers with "the user" and "their" invites criticism :) Pluralizing is usually the easiest non-controversial fix: "talks to users in their languages". Or you can borrow the usual solution used in French, which is to use the male gender throughout, and add a footnote stating that, "The use of the male gender in this document is for convenience and shortness, and should not be interpreted to exclude any other". (The accords for gender on verbs and adjectives makes the his/her solution a nuisance.) ~ Scott

Hal Finkel

14 Apr 14 Apr

2:33 a.m.

On Wed, 2011-04-13 at 12:34 +0100, Daniel James wrote:

...

On 13 April 2011 12:25, Gordon Woodhull <gordon@woodhull.com> wrote:

...
I stand by avoiding gendered pronouns when referring to users in documentation, but it's up to individual taste. No "its" though. ;-)

Just say, "talks to the user in their language". Here's a discussion at a more appropriate forum:

http://english.stackexchange.com/questions/48/gender-neutral-pronoun

Please don't do that. As noted on the stackexchange page, '"singular they" also enjoys a long history of criticism', and it certainly sounds wrong to me. It is correct, albeit a bit verbose, to use 'his or her', and if you don't wish to choose one gender over another, then please use both genders in that way. [My wife was a newspaper copyeditor for some time, and after having my grammar corrected for years, now I'm doing it to other people, scary...] -Hal

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Joshua Boyce

4:20 p.m.

I'm all for good documentation, but in this case, aren't we really fussing over 'pointless' details? This is documentation written for programmers, so whilst grammar is important in getting your point across clearly it seems to be irrelevant in this case. I know that I for one am not going to stress over the use of 'his' vs 'their' vs 'his or her' etc when what I'm REALLY interested in is the actual information about the library I'm trying to use. On Thu, Apr 14, 2011 at 12:33 PM, Hal Finkel <half@halssoftware.com> wrote:

...

On Wed, 2011-04-13 at 12:34 +0100, Daniel James wrote:

...
On 13 April 2011 12:25, Gordon Woodhull <gordon@woodhull.com> wrote:

...
I stand by avoiding gendered pronouns when referring to users in

documentation, but it's up to individual taste. No "its" though. ;-)

Just say, "talks to the user in their language". Here's a discussion at a more appropriate forum:

http://english.stackexchange.com/questions/48/gender-neutral-pronoun

Please don't do that. As noted on the stackexchange page, '"singular they" also enjoys a long history of criticism', and it certainly sounds wrong to me. It is correct, albeit a bit verbose, to use 'his or her', and if you don't wish to choose one gender over another, then please use both genders in that way. [My wife was a newspaper copyeditor for some time, and after having my grammar corrected for years, now I'm doing it to other people, scary...]

-Hal

...
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ryou Ezoe

4:45 p.m.

Here is my review. I am Japanese. I use Microsoft Windows OS. Short conclusion. This library is badly designed and completely useless for Japanese language and Windows programmer. It must be rejected or drop support of MSVC and Windows OS. First of all, std::locale is very poorly designed. It's useless for language like Japanese. So anything built on top of std::locale is useless too. This library is no exception. Collation and Conversions is unnecessary for Japanese. We don't have such concepts. So I skip this. Numbers, Time and Currency formatting and parsing Simply replacing words is useless for Japanese. But this is not a big problem. The real issue is... Messages Formatting (Translation) This API is very badly designed. I really disappointed about use of hard coded char and std::string. It's not just output(return type) it matters. We, Japanese Windows programmer, natively use wchar_t. So input(function parameter) must support it too. Further reading of the document reveals that auther of this library believes "should use wide strings everywhere" is a myth. http://cppcms.sourceforge.net/boost_locale/html/recommendations_and_myths.ht... I strongly object about that is a myth. Well, it's true you don't need to use wide strings, but using both char and wchar_t is not practical. This library treat wchar_t like a second class citizen. We should use one encoding, one type in one program. For a program runs on Windows OS, encoding should be UTF-16. Because Windows' native encoding is UTF-16. Type should be wchar_t. Because MSVC's wchar_t encoding is UTF-16. So we use wchar_t. All native Win32 API accept wchar_t(assuming its encoding is UTF-16). ANSI version of Win32 API is just a wrapper of native Wide version. It's just not practical converting between wchar_t and char just for passing it to the locale library. Besides, what encoding shoud we use for char in Windows? It isn't fixed under the Windows and MSVC. MSVC doesn't support UTF-8 literal. So in order to use this library, we need to store all UTF-8 encoded string in a file and load it in runtime or convert it in runtime. Isn't it silly we do that for using localization library? So, we need char set conversion between UTF-8 and UTF-16 in order to use this library. Does this library support this conversion? No. Why it doesn't support char set conversion between UTFs? Why it hardcode char and std::string? Hard coded char and std::string parameter is really really REALLY bad design. I suspect auther of this library doesn't have any serious experience in Windows programming and languages like Japanese. What kind of joke it doesn't support Win32 API backend in MSVC? "All of the examples that come with Boost.Locale are designed for UTF-8 and it is the default encoding used by Boost.Locale." Then, don't pretend it support MSVC and Windows OS in the first place! Using UTF-8 in Windows is not a practical option. wchar_t is NOT a second class citizen! Sorry about my bad language. But this library is just completely useless for the Japanese Windows programmer. It must be rejected if this library intended to support Japanese and Windows OS. If Boost accept this library, dropt the support of MSVC and Windows OS so the Windows programmer knows it doesn't work for them. -- Ryou Ezoe

Artyom

7:20 p.m.

...

Here is my review. I am Japanese. I use Microsoft Windows OS.

First of all before I begin to answer I'd like to notice a one important point: 1. Wide characters are not second class citizens in Boost.Locale everything is fully supported for Wide API. 2. I do use mostly UTF-8 examples as they much more portable but all API is mirrored to wide one. 3. You are welcome to take a look on a libs/locale/examples/w*.cpp files... 4. UTF-16 even has a performance advantage on widows platform as it is native encoding for ICU.

...

Short conclusion. This library is badly designed and completely useless for Japanese language and Windows programmer. It must be rejected or drop support of MSVC and Windows OS.

First of all, std::locale is very poorly designed. It's useless for language like Japanese. So anything built on top of std::locale is useless too. This library is no exception.

std::locale is container not the content. The content is filled with ICU or with Win32 API calls.

...

Numbers, Time and Currency formatting and parsing Simply replacing words is useless for Japanese. But this is not a big problem. The real issue is...

Messages Formatting (Translation)

This API is very badly designed.

More specific issues?

...

I really disappointed about use of hard coded char and std::string. It's not just output(return type) it matters. We, Japanese Windows programmer, natively use wchar_t. So input(function parameter) must support it too.

What is wrong with wformat or with wformat(translate("")) or wgettext? Every message formatting API is fully wide enabled.

...

Further reading of the document reveals that auther of this library believes "should use wide strings everywhere" is a myth. http://cppcms.sourceforge.net/boost_locale/html/recommendations_and_myths.ht...

I strongly object about that is a myth.

I'm sorry but in context of cross platform programming wide characters quite useless. So I can't recommend using wide characters as they may be UTF-16 or UTF-32, however for Windows only development wide API is fine and fully supported.

...

Well, it's true you don't need to use wide strings, but using both char and wchar_t is not practical. This library treat wchar_t like a second class citizen.

It does not, and if you feel point to specific points.

...

We should use one encoding, one type in one program. For a program runs on Windows OS, encoding should be UTF-16. Because Windows' native encoding is UTF-16. Type should be wchar_t. Because MSVC's wchar_t encoding is UTF-16. So we use wchar_t. All native Win32 API accept wchar_t(assuming its encoding is UTF-16). ANSI version of Win32 API is just a wrapper of native Wide version. It's just not practical converting between wchar_t and char just for passing it to the locale library.

And Boost.Locale uses Wide Win32 API only for this purpose.

...

Besides, what encoding shoud we use for char in Windows? It isn't fixed under the Windows and MSVC. MSVC doesn't support UTF-8 literal.

So in order to use this library, we need to store all UTF-8 encoded string in a file and load it in runtime or convert it in runtime. Isn't it silly we do that for using localization library?

This is the most popular and most widely used format of the message catalogs around. http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_gnu_gettex... So this is the way they work. They converted on load and then remain UTF-16 and not converted again.

...

So, we need char set conversion between UTF-8 and UTF-16 in order to use this library. Does this library support this conversion? No.

I'm sorry? Have you read the documentation? It does.

...

Why it doesn't support char set conversion between UTFs?

It does

...

Why it hardcode char and std::string?

There are few very specific places is the library where only "char *" or "std::string" is used for rest of purposes wide API is fully provided.

...

Hard coded char and std::string parameter is really really REALLY bad design.

Once again, you probably hand't look deep enough to the documentation.

...

I suspect auther of this library doesn't have any serious experience in Windows programming and languages like Japanese. What kind of joke it doesn't support Win32 API backend in MSVC?

I'm sorry? Win32API backend supported by MSVC, GCC/Mingw and even Cygwin!

...

"All of the examples that come with Boost.Locale are designed for UTF-8 and it is the default encoding used by Boost.Locale."

There are plenty wide examples in the documentation and in the library sources.

...

Then, don't pretend it support MSVC and Windows OS in the first place! Using UTF-8 in Windows is not a practical option. wchar_t is NOT a second class citizen!

It is not. I really don't understand how did you get to the conclusion. Try to be more specific.

...

Sorry about my bad language. But this library is just completely useless for the Japanese Windows programmer. It must be rejected if this library intended to support Japanese and Windows OS. If Boost accept this library, dropt the support of MSVC and Windows OS so the Windows programmer knows it doesn't work for them.

-- Ryou Ezoe

I'd suggest to look to the documentation and the examples again. Artyom

Ryou Ezoe

7:44 p.m.

...

...
...
Here is my review. I am Japanese. I use Microsoft Windows OS.

First of all before I begin to answer I'd like to notice a one important point:

1. Wide characters are not second class citizens in Boost.Locale everything is fully supported for Wide API.

On Fri, Apr 15, 2011 at 4:20 AM, Artyom <artyomtnk@yahoo.com> wrote: transalte only accept char * or std::string.

...

2. I do use mostly UTF-8 examples as they much more portable but all API is mirrored to wide one.

3. You are welcome to take a look on a libs/locale/examples/w*.cpp files... 4. UTF-16 even has a performance advantage on widows platform as it is native encoding for ICU.

...
Short conclusion. This library is badly designed and completely useless for Japanese language and Windows programmer. It must be rejected or drop support of MSVC and Windows OS.

First of all, std::locale is very poorly designed. It's useless for language like Japanese. So anything built on top of std::locale is useless too. This library is no exception.

std::locale is container not the content. The content is filled with ICU or with Win32 API calls.

...
Numbers, Time and Currency formatting and parsing Simply replacing words is useless for Japanese. But this is not a big problem. The real issue is...

Messages Formatting (Translation)

This API is very badly designed.

More specific issues?

...
I really disappointed about use of hard coded char and std::string. It's not just output(return type) it matters. We, Japanese Windows programmer, natively use wchar_t. So input(function parameter) must support it too.

What is wrong with wformat or with wformat(translate("")) or wgettext?

As I said, using only UTF-8 is impossible in Windows. Because MSVC doesn't support UTF-8 literal yet. Input(parameter of translate) must support UTF-16 as well.

...

Every message formatting API is fully wide enabled.

...
Further reading of the document reveals that auther of this library believes "should use wide strings everywhere" is a myth. http://cppcms.sourceforge.net/boost_locale/html/recommendations_and_myths.ht...

I strongly object about that is a myth.

I'm sorry but in context of cross platform programming wide characters quite useless.

So this library doesn't work well on Windows. Good cross platform.

...

So I can't recommend using wide characters as they may be UTF-16 or UTF-32, however for Windows only development wide API is fine and fully supported.

You say "fully supported" without taking wide characters as an input?

...

...
Well, it's true you don't need to use wide strings, but using both char and wchar_t is not practical. This library treat wchar_t like a second class citizen.

It does not, and if you feel point to specific points.

...
We should use one encoding, one type in one program. For a program runs on Windows OS, encoding should be UTF-16. Because Windows' native encoding is UTF-16. Type should be wchar_t. Because MSVC's wchar_t encoding is UTF-16. So we use wchar_t. All native Win32 API accept wchar_t(assuming its encoding is UTF-16). ANSI version of Win32 API is just a wrapper of native Wide version. It's just not practical converting between wchar_t and char just for passing it to the locale library.

And Boost.Locale uses Wide Win32 API only for this purpose.

...
Besides, what encoding shoud we use for char in Windows? It isn't fixed under the Windows and MSVC. MSVC doesn't support UTF-8 literal.

So in order to use this library, we need to store all UTF-8 encoded string in a file and load it in runtime or convert it in runtime. Isn't it silly we do that for using localization library?

This is the most popular and most widely used format of the message catalogs around.

http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_gnu_gettex...

So this is the way they work.

They converted on load and then remain UTF-16 and not converted again.

...
So, we need char set conversion between UTF-8 and UTF-16 in order to use this library. Does this library support this conversion? No.

I'm sorry? Have you read the documentation? It does.

...
Why it doesn't support char set conversion between UTFs?

It does

hmm I can't find it from document.

...

...
Why it hardcode char and std::string?

There are few very specific places is the library where only "char *" or "std::string" is used for rest of purposes wide API is fully provided.

...
Hard coded char and std::string parameter is really really REALLY bad design.

Once again, you probably hand't look deep enough to the documentation.

...
I suspect auther of this library doesn't have any serious experience in Windows programming and languages like Japanese. What kind of joke it doesn't support Win32 API backend in MSVC?

I'm sorry? Win32API backend supported by MSVC, GCC/Mingw and even Cygwin!

It looks like document is bad. http://cppcms.sourceforge.net/boost_locale/html/using_localization_backends.... You need GCC-4.x to use it. Only UTF-8 encoding is supported.

...

...
"All of the examples that come with Boost.Locale are designed for UTF-8 and it is the default encoding used by Boost.Locale."

There are plenty wide examples in the documentation and in the library sources.

...
Then, don't pretend it support MSVC and Windows OS in the first place! Using UTF-8 in Windows is not a practical option. wchar_t is NOT a second class citizen!

It is not.

I really don't understand how did you get to the conclusion.

Try to be more specific.

...
Sorry about my bad language. But this library is just completely useless for the Japanese Windows programmer. It must be rejected if this library intended to support Japanese and Windows OS. If Boost accept this library, dropt the support of MSVC and Windows OS so the Windows programmer knows it doesn't work for them.

-- Ryou Ezoe

I'd suggest to look to the documentation and the examples again.

Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Ryou Ezoe

8:03 p.m.

gcc support UTF-8 literal. This library can be used in Windows if i use mingw or cygwin. Still, I don't want to use UTF-8 in Windows. -- Ryou Ezoe

Artyom

8:13 p.m.

...

From: Ryou Ezoe <boostcpp@gmail.com> Subject: Re: [boost] Fw: [locale] Formal review of Boost.Locale library To: boost@lists.boost.org Date: Thursday, April 14, 2011, 11:03 PM gcc support UTF-8 literal. This library can be used in Windows if i use mingw or cygwin. Still, I don't want to use UTF-8 in Windows.

I'll specifically answer to this once again. I assume that following had mislead you in documentations.

...

winapi backend [snip]

* You need GCC-4.x to use it. * Only UTF-8 encoding is supported.

Meaning you can't use GCC-3.4 but 4.x series as the 3.x does not support wide characters. Sorry if it mislead you. MSVC is first class citizen and actually much better supported then GCC...

...

-- Ryou Ezoe

Artyom

Ryou Ezoe

8:35 p.m.

Saving source file in UTF-8 with BOM doesn't make MSVC use UTF-8 as char encoding. It's just because ASCII is compatible with UTF-8 encoding. It's still a Multibyte Character Set. I use Windows with Japanese locale. In my environment, string literal with no encoding prefix become Shift-JIS. So this library takes whatever encoding the compiler use for char and use it as unique identifier for a corresponding translated text. If you really want to use UTF-8, then you have to use C++0x's u8 encoding prefix. Not many compiler support it yet though. -- Ryou Ezoe

Artyom

8:42 p.m.

...

Saving source file in UTF-8 with BOM doesn't make MSVC use UTF-8 as char encoding. It's just because ASCII is compatible with UTF-8 encoding. It's still a Multibyte Character Set. I use Windows with Japanese locale. In my environment, string literal with no encoding prefix become Shift-JIS.

So this library takes whatever encoding the compiler use for char and use it as unique identifier for a corresponding translated text.

If you really want to use UTF-8, then you have to use C++0x's u8 encoding prefix. Not many compiler support it yet though.

It actually does. Take a look on libs/locale/examples/wboundary.cpp. Add UTF-8 BOM to it, compile it with MSVC it works perfectly. There is no problems with that. I'm working usually in Hebrew locale so it does not "work" for me either. However Adding BOM solves problems.

...

-- Ryou Ezoe

On the same note I'd like to address you to this table http://cppcms.sourceforge.net/boost_locale/html/appendix.html#tested_compile... You'll see that MSVC 2008 and 2010 is fully supported with all icu, winapi and std backends. I've got reports that MSVC 2005 works fully as well. Artyom

Ryou Ezoe

8:48 p.m.

...

Take a look on libs/locale/examples/wboundary.cpp. It use L encoding prefix. Of course MSVC use UTF-16 for L prefix. In fact, you can save wboundary.cpp in any encoding that support Japanese such as Shift-JIS, EUC-JP or JIS and UTFs encoding and it still works. MSVC automaticlly convert it for you.

string literal with no encoding prefix is still ASCII or other MBCS encoding. On Fri, Apr 15, 2011 at 5:42 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
...
Saving source file in UTF-8 with BOM doesn't make MSVC use UTF-8 as char encoding. It's just because ASCII is compatible with UTF-8 encoding. It's still a Multibyte Character Set. I use Windows with Japanese locale. In my environment, string literal with no encoding prefix become Shift-JIS.

So this library takes whatever encoding the compiler use for char and use it as unique identifier for a corresponding translated text.

If you really want to use UTF-8, then you have to use C++0x's u8 encoding prefix. Not many compiler support it yet though.

It actually does.

Take a look on libs/locale/examples/wboundary.cpp. Add UTF-8 BOM to it, compile it with MSVC it works perfectly.

There is no problems with that.

I'm working usually in Hebrew locale so it does not "work" for me either. However Adding BOM solves problems.

...
-- Ryou Ezoe

On the same note I'd like to address you to this table

http://cppcms.sourceforge.net/boost_locale/html/appendix.html#tested_compile...

You'll see that MSVC 2008 and 2010 is fully supported with all icu, winapi and std backends.

I've got reports that MSVC 2005 works fully as well.

Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Ryou Ezoe

8:49 p.m.

well, except last שָלוֹם עוֹלָם part. -- Ryou Ezoe

Artyom

8:55 p.m.

...

well, except last שָלוֹם עוֹלָם part.

Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew (with vowel marks) :-) The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... Artyom

Ryou Ezoe

9:01 p.m.

On Fri, Apr 15, 2011 at 5:55 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
well, except last שָלוֹם עוֹלָם part.

Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew It can be UTF-16 as wel. It's just other encoding can't represents hebrew characters.

...

(with vowel marks)

:-)

The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... I agree with that part.

But the point is string literal with no encoding prefix is evil. if i write char s[] = "あいうえお" ; MSVC use shift-jis encoding. Japanese will sure to write something like translate("日本語").

...

Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Artyom

9:14 p.m.

...

...
Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew

...

It can be UTF-16 as wel. It's just other encoding can't represents hebrew characters.

...
(with vowel marks)

:-)

The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... I agree with that part.

But the point is string literal with no encoding prefix is evil. if i write

char s[] = "あいうえお" ;

MSVC use shift-jis encoding.

Japanese will sure to write something like translate("日本語").

The point is that in the source you should write: MessageBoxW(wgettext("Language").c_str(),wgettext("Japanese language").c_str(),MB_OK) // (ASCII source) And it would be translated by the dictionaries to MessageBoxW(L"言語",L"日本語",MB_OK); // In real calls, not the source code The point if you use Japanese - inline then you don't need to translate strings, and if you do translate then write source code with ASCII strings that would work with any encoding and would be understandable for any software) This is how all translation systems I know work. Having original text strings in Japanese and translating them to French or Hebrew is very-very-bad idea as it is much simpler to find somebody to be able to translate the text from English to Hebrew then from Japanese to Hebrew. English and ASCII is the "source code" of the final text, it is "Anglo-Centric" but so the software development world. Artyom

Ryou Ezoe

9:20 p.m.

You can't just force all programmer to use English. Ideally, it should be. but it cannot be. If you think it that way, this library will never be used by people who don't know English. If you don't mind it. then fine. On Fri, Apr 15, 2011 at 6:14 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
...
Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew

...
It can be UTF-16 as wel. It's just other encoding can't represents hebrew characters.

...
(with vowel marks)

:-)

The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... I agree with that part.

But the point is string literal with no encoding prefix is evil. if i write

char s[] = "あいうえお" ;

MSVC use shift-jis encoding.

Japanese will sure to write something like translate("日本語").

The point is that in the source you should write:

MessageBoxW(wgettext("Language").c_str(),wgettext("Japanese language").c_str(),MB_OK)

// (ASCII source)

And it would be translated by the dictionaries to

MessageBoxW(L"言語",L"日本語",MB_OK); // In real calls, not the source code

The point if you use Japanese - inline then you don't need to translate strings, and if you do translate then write source code with ASCII strings that would work with any encoding and would be understandable for any software)

This is how all translation systems I know work.

Having original text strings in Japanese and translating them to French or Hebrew is very-very-bad idea as it is much simpler to find somebody to be able to translate the text from English to Hebrew then from Japanese to Hebrew.

English and ASCII is the "source code" of the final text, it is "Anglo-Centric" but so the software development world.

Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Artyom

9:43 p.m.

...

You can't just force all programmer to use English. Ideally, it should be. but it cannot be. If you think it that way, this library will never be used by people who don't know English. If you don't mind it. then fine.

The point is that it is not the restriction I personally came up with, it is the way the localization works and should work in the world. And if the programmer in the team does not know how to express himself in English then you probably need to do one of the following: 1. Hire somebody else as if he does not know to write English then it would have problems in many other areas directly related to programming. 2. Let him develop non-localized or non interactive software One of the things I try to talk about in this Boost.Locale project is not only the tools but also methodology. You probably know how hard localization can be and how could it be hard for English speaker understand that text may not have spaces (your case) or may be written Right-to-Left (my case) or don't have a concept of a letter case (our both case). So when you develop a software that can handle multiple languages you must work according certain methodology. Unfortunately it is much easier to learn C++ then all rich diversity of cultures in the world. So the methodology is not less important then the library (even more). Example? How would the Japanese programmer that works in Japanese only would write the following code that displays a message "私は2つのシャツを持っている" (I used google translate so I'm not responsible there) It would probably write: int n=2; MessageBox((wformat(translate("私は{1}つのシャツを持っている")) % n).str()); What is the problem with that? That actually you need to write MessageBox((wformat(translate("I have 1 shirt","I have {1} shirts"))) % n).str()); Why? Because in Japanese there is no plural forms (as-far-as-I-remember) so if you would write in Japanese you'll likely miss this point and you would not be able to support other languages correctly. Of course same can happen other way around. So no, you can't tell "my programmer knows only one language so I'll let him write the text in his own language and not in English" Because you'll get a bad software. Your programmers should be aware of basic linguistics and know to express themselves in English. And if they don't? This is what code review for. Artyom

Ryou Ezoe

9:59 p.m.

Here is a sad fact. It's so hard to find a Japanese programmer who also knows English. Believe me, there are many Japanese boost users. But they don't contribute in here because they don't know English one bit. Requiring English means this library will be never used by Japanese. I really don't like the current situation too. But it cannot be changed. Please, Don't start plural handling system. It's bad for localization. Plural should not be handled by software which will be localized to all known languages. You can use. I have 2 shirt(s). -- Ryou Ezoe

Fabio Fracassi

10:23 p.m.

On 14/4/2011 23:59, Ryou Ezoe wrote:

...

Please, Don't start plural handling system. It's bad for localization. Plural should not be handled by software which will be localized to all known languages.

Absolutely not, I consider this one core feature of any localization library, because it insulates me from some hard problems (there are languages with more than one plural form, for example) which I really don't want to care about.

...

You can use.

I have 2 shirt(s).

I'd consider any software that uses such messages at best sloppily written and at worst broken or outdated. Regards Fabio

Matus Chochlik

15 Apr 15 Apr

8:23 a.m.

On Thu, Apr 14, 2011 at 11:59 PM, Ryou Ezoe <boostcpp@gmail.com> wrote:

...

Here is a sad fact. It's so hard to find a Japanese programmer who also knows English. Believe me, there are many Japanese boost users. But they don't contribute in here because they don't know English one bit. Requiring English means this library will be never used by Japanese.

I really don't like the current situation too. But it cannot be changed.

Please, Don't start plural handling system. It's bad for localization. Plural should not be handled by software which will be localized to all known languages. You can use.

I have 2 shirt(s).

This does not work with languages with multiple plurals. Been there done that. The results of doing this look really ugly in Slovak and some other Slavonic and I'm sure in many non-Slavonic languages as well. I like the approach used by Boost.Locale. BR, Matus

Artyom

2:20 p.m.

----- Original Message ----

...

From: Ryou Ezoe <boostcpp@gmail.com> To: boost@lists.boost.org Sent: Fri, April 15, 2011 12:59:03 AM Subject: Re: [boost] Fw: [locale] Formal review of Boost.Locale library

Here is a sad fact. It's so hard to find a Japanese programmer who also knows English. Believe me, there are many Japanese boost users. But they don't contribute in here because they don't know English one bit. Requiring English means this library will be never used by Japanese.

I really don't like the current situation too. But it cannot be changed.

I'll try to explain more about the ASCII requirements. One important feature of the catalogs is that if the translation string does not exists in the catalog the original one is shown. This allows translating the software part-by-part and more then that it allows to do it independently of its development. In real world there is a very few software that has 100% of messages translated so such fallback is very important. Indeed you see once in a while an untranslated message in English in most software. But as English is in generally international language most users tolerate it in certain level. What would happen if the message that is accidentally shown is in Japanese and a used have no chance to find what does it mean at all. All literate uses know English at least on some basic level and probably would be able to google for this text... Japanese.. No way! That it is why the original strings should be kept in English. The second reason is a performance. What is the encoding of the source string? Latin1, ASCII, UTF-8, Shift-JIS? Even if it is known and widely used UTF-8 you would have to do charset conversion while with ASCII you just cast. As it is (almost) universal subset of all encodings. So there are plenty good reasons to do this. And if you have troubles with programmers to know English let them write in poor English and then ask copyrighter to review their strings. And if they really struggle use Romaji. Believe me you have more troubles handling source files encoding correctly.

...

Please, Don't start plural handling system. It's bad for localization. Plural should not be handled by software which will be localized to all known languages. You can use.

I have 2 shirt(s).

I'm sorry but almost all but East-Asian languages have plural forms. You have no idea how silly it sounds in Russian to miss it. So no. It is very important part and "I have 2 shirt(s)" is bad localization. Artyom

Ryou Ezoe

3:05 p.m.

Why ASCII? ASCII is not portable. Only portable characters are basic source character set. You can't use $, @ and ` in portable code. If you drop support of pre C++0x compiler, you can use u8, u and U encoding prefix. But still, char can be UTF-8 or any other encoding. If this library accept char const * and std::string, then it will receive null terminated binary string that can be any encoding. Your idea of "Use ASCII in order to localize your software" doesn't work. This library will be ignored just like program_options. -- Ryou Ezoe

Joshua Boyce

4:36 p.m.

Sorry to go on a tangent here, but what exactly is the objection to program_options? On Sat, Apr 16, 2011 at 1:05 AM, Ryou Ezoe <boostcpp@gmail.com> wrote:

...

Why ASCII? ASCII is not portable. Only portable characters are basic source character set. You can't use $, @ and ` in portable code. If you drop support of pre C++0x compiler, you can use u8, u and U encoding prefix. But still, char can be UTF-8 or any other encoding.

If this library accept char const * and std::string, then it will receive null terminated binary string that can be any encoding.

Your idea of "Use ASCII in order to localize your software" doesn't work. This library will be ignored just like program_options.

-- Ryou Ezoe _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ryou Ezoe

4:55 p.m.

program_options handle wide character really stupid way. It accept wchar_t. However, it convert it char for internal storing. The conversion is done by assuming each wchar_t object's integral value represents a ASCII code point. For output, it convert back char to wchar_t in same way. What I want for program_options is, accept wchar_t string, store it as is, then return it as is. Because program_options internally stores string in char without properly handling character encoding, It gives me completely broken result. On Sat, Apr 16, 2011 at 1:36 AM, Joshua Boyce <raptorfactor@raptorfactor.com> wrote:

...

Sorry to go on a tangent here, but what exactly is the objection to program_options?

On Sat, Apr 16, 2011 at 1:05 AM, Ryou Ezoe <boostcpp@gmail.com> wrote:

...
Why ASCII? ASCII is not portable. Only portable characters are basic source character set. You can't use $, @ and ` in portable code. If you drop support of pre C++0x compiler, you can use u8, u and U encoding prefix. But still, char can be UTF-8 or any other encoding.

If this library accept char const * and std::string, then it will receive null terminated binary string that can be any encoding.

Your idea of "Use ASCII in order to localize your software" doesn't work. This library will be ignored just like program_options.

-- Ryou Ezoe _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Edward Diener

3:44 a.m.

On 4/14/2011 5:14 PM, Artyom wrote:

...

...
...
Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew

...
It can be UTF-16 as wel. It's just other encoding can't represents hebrew characters.

...
(with vowel marks)

:-)

The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... I agree with that part.

But the point is string literal with no encoding prefix is evil. if i write

char s[] = "あいうえお" ;

MSVC use shift-jis encoding.

Japanese will sure to write something like translate("日本語").

The point is that in the source you should write:

MessageBoxW(wgettext("Language").c_str(),wgettext("Japanese language").c_str(),MB_OK)

// (ASCII source)

And it would be translated by the dictionaries to

MessageBoxW(L"言語",L"日本語",MB_OK); // In real calls, not the source code

The point if you use Japanese - inline then you don't need to translate strings, and if you do translate then write source code with ASCII strings that would work with any encoding and would be understandable for any software)

This is how all translation systems I know work.

Having original text strings in Japanese and translating them to French or Hebrew is very-very-bad idea as it is much simpler to find somebody to be able to translate the text from English to Hebrew then from Japanese to Hebrew.

This is just plain silly. Telling someone who uses language X that in order to translate their code to language Y they need to first translate their code to English and then translate English to language Y, and that this is somehow superior, is just inane reasoning. Supporting such reasoning by saying that "everyone" does it that way is equally inane for many obvious reasons. It is much better to admit that the "translate" part of a locale library may be flawed, or limited, than to have to justify such illogical nonsense. I am not saying that your following the way that gnu Gettext does translation is not understandable, in that you chose to follow a popular way ot doing something. But I am saying that trying to justify it in the illogical way that you do, given a perfectly reasonable objection, is completely wrong.

Matus Chochlik

8:14 a.m.

[snip/]

...

...
The point if you use Japanese - inline then you don't need to translate strings, and if you do translate then write source code with ASCII strings that would work with any encoding and would be understandable for any software)

This is how all translation systems I know work.

Having original text strings in Japanese and translating them to French or Hebrew is very-very-bad idea as it is much simpler to find somebody to be able to translate the text from English to Hebrew then from Japanese to Hebrew.

This is just plain silly. Telling someone who uses language X that in order to translate their code to language Y they need to first translate their code to English and then translate English to language Y, and that this is somehow superior, is just inane reasoning. Supporting such reasoning by saying that "everyone" does it that way is equally inane for many obvious reasons. It is much better to admit that the "translate" part of a locale library may be flawed, or limited, than to have to justify such illogical nonsense.

This is, however, basically the only reasonable thing to do, with the current state of things and the whole string literal encoding mess in C++. One of the alternatives that people who don't speak English can use, is to transliterate the strings to plain Latin alphabet (7-bit ASCII) i.e. for example instead of doing translate( "Пожалуйста") you do translate("pa-zhal-sta");

...

I am not saying that your following the way that gnu Gettext does translation is not understandable, in that you chose to follow a popular way ot doing something. But I am saying that trying to justify it in the illogical way that you do, given a perfectly reasonable objection, is completely wrong.

What other options do you see, the support for UTF-8 character literals being as it is ? BR, Matus

Chad Nelson

2:41 p.m.

On Fri, 15 Apr 2011 10:14:29 +0200 Matus Chochlik <chochlik@gmail.com> wrote:

...

...
This is just plain silly. Telling someone who uses language X that in order to translate their code to language Y they need to first translate their code to English and then translate English to language Y, and that this is somehow superior, is just inane reasoning. Supporting such reasoning by saying that "everyone" does it that way is equally inane for many obvious reasons. It is much better to admit that the "translate" part of a locale library may be flawed, or limited, than to have to justify such illogical nonsense.

This is, however, basically the only reasonable thing to do, with the current state of things and the whole string literal encoding mess in C++.

+1.

...

One of the alternatives that people who don't speak English can use, is to transliterate the strings to plain Latin alphabet (7-bit ASCII) i.e. for example instead of doing translate( "Пожалуйста") you do translate("pa-zhal-sta");

+1 -- I was going to suggest that myself, until I saw that you'd already done it. :-) -- Chad Nelson Oak Circle Software, Inc. * * *

Artyom

2:41 p.m.

...

...
This is how all translation systems I know work.

...

...
Having original text strings in Japanese and translating them to French or Hebrew is very-very-bad idea as it is much simpler to find somebody to be able to translate the text from English to Hebrew then from Japanese to Hebrew.

This is just plain silly. Telling someone who uses language X that in order to translate their code to language Y they need to first translate their code to English and then translate English to language Y, and that this is somehow superior, is just inane reasoning. Supporting such reasoning by saying that "everyone" does it that way is equally inane for many obvious reasons. It is much better to admit that the "translate" part of a locale library may be flawed, or limited, than to have to justify such illogical nonsense.

I am not saying that your following the way that gnu Gettext does translation is not understandable, in that you chose to follow a popular way ot doing something. But I am saying that trying to justify it in the illogical way that you do, given a perfectly reasonable objection, is completely wrong.

I would address you to this answer as well: http://article.gmane.org/gmane.comp.lib.boost.devel/217983 I hand't remembered all the reasons at 1AM. Especially all encoding and charset related issues There are quite a lot of good reasons to limit yourself to ASCII. I understand that it may be not always "nice" but it is the best guideline for development of application for different cultures. One more important thing is when you have a small customer and do not have enough resources to supports its languages directly, the customer itself can easily localize its application. Given good program like KDE's "lokalize" it would take him from several hours to translate all strings in a small application. But if the original strings are not English this would never happen. The methodology is very important as well. Artyom

Ryou Ezoe

14 Apr 14 Apr

9:15 p.m.

For exmaple, one Japanese may write translate("あ") MSVC use shift-jis for encoding. In the same time, one korean may write translate("궇") which i think(i don't know korean) MSVC use KS X 1001 for encoding. In binary level, this is same. What should we do? Using string literal with no prefix is practically dangerous. On Fri, Apr 15, 2011 at 6:01 AM, Ryou Ezoe <boostcpp@gmail.com> wrote:

...

On Fri, Apr 15, 2011 at 5:55 AM, Artyom <artyomtnk@yahoo.com> wrote:

...
...
well, except last שָלוֹם עוֹלָם part.

Because the sources are in UTF-8 ;-) and it is "Hello World" in Hebrew It can be UTF-16 as wel. It's just other encoding can't represents hebrew characters.

...
(with vowel marks)

:-)

The best is just to use UTF-8 (source code) anywhere - MSVC handles it just fine... I agree with that part.

But the point is string literal with no encoding prefix is evil. if i write

char s[] = "あいうえお" ;

MSVC use shift-jis encoding.

Japanese will sure to write something like translate("日本語").

...
Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

-- Ryou Ezoe

Artyom

9:25 p.m.

...

In binary level, this is same.

...

What should we do? Using string literal with no prefix is practically dangerous.

You should do following. 1. Do not include direct localized text to sources, that is what localization system for. 2. If you have to then use UTF-8 in the sources that it would remain UTF-8 for char * strings as well and would be handled correctly for L"..." prefixes. 3. When you use MSVC add UTF-8 BOM (even thou it is useless but this is the way MSVC distinguishes between local encodings and UTF-8) Bad but this is reality. In fact almost all Unit tests in Boost.Locale use inline UTF-8 and believe me, it made my life significantly simpler. Artyom

Ryou Ezoe

9:35 p.m.

On Fri, Apr 15, 2011 at 6:25 AM, Artyom <artyomtnk@yahoo.com> wrote:

...

...
In binary level, this is same.

...
What should we do? Using string literal with no prefix is practically dangerous.

You should do following.

1. Do not include direct localized text to sources, that is what localization system for. How can we force programmer for not using any character other than basic source character set? It's impossible. They write whatever they can. You need at least warn Not to use character other than basic source character set in the document. But still, people write it. Since this is a localization library. I bet all Japanese expect it can handle Japanese input and give us the mapped translation as the output.

...

2. If you have to then use UTF-8 in the sources that it would remain UTF-8

How many times do I have to explain? MSVC automatically convert encoding for the content of string literal with no encoding prefix. There is no way to prevent it unless you use proper encoding prefix u8 which most compiler doesn't support yet.

...

for char * strings as well and would be handled correctly for L"..." prefixes. But translate expect char.

...

3. When you use MSVC add UTF-8 BOM (even thou it is useless but this is the way MSVC distinguishes between local encodings and UTF-8)

Bad but this is reality.

In fact almost all Unit tests in Boost.Locale use inline UTF-8 and believe me, it made my life significantly simpler.

Your life may be simple. You don't need to worry about complicated Japanese environment. But I am not.

...

Artyom _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Ryou Ezoe

Fabio Fracassi

10:08 p.m.

On 14/4/2011 23:35, Ryou Ezoe wrote:

...

On Fri, Apr 15, 2011 at 6:25 AM, Artyom<artyomtnk@yahoo.com> wrote:

...

...
You should do following.

1. Do not include direct localized text to sources, that is what localization system for. How can we force programmer for not using any character other than basic source character set? It's impossible.

At least very impractical. Although I agree that programmers should use English as a base language for both comments and strings (I still remember the problems that german comments caused Star/Open-Office), i also belive that you can never force all programmers to do so. Another problem is that even if you do use English as your applications "native" language, you might still need characters that are prone to encoding problems, e.g. the © in Copyright messages, or accented Names. (We had both issues just last month) I would expect that a localization library can take care of it for me. I don't yet know how serious an issue that is, at the very least it should be thoroughly documented. Regards Fabio

Fabio Fracassi

9:49 p.m.

On 14/4/2011 23:25, Artyom wrote:

...

3. When you use MSVC add UTF-8 BOM (even thou it is useless but this is the way MSVC distinguishes between local encodings and UTF-8)

Bad but this is reality.

IIRC clang will not compile source files with a BOM, which would effectifly render the code unportable. Regards Fabio

Sergey Cheban

15 Apr 15 Apr

9:05 a.m.

15.04.2011 1:49, Fabio Fracassi пишет:

...

On 14/4/2011 23:25, Artyom wrote:

...
3. When you use MSVC add UTF-8 BOM (even thou it is useless but this is the way MSVC distinguishes between local encodings and UTF-8)

Bad but this is reality.

IIRC clang will not compile source files with a BOM, which would effectifly render the code unportable. Any code that uses the characters that are not in the "basic source character set" (96 characters) is not portable anyway. The EBCDIC-encoded sources probably cannot be compiled by the clang, too.

-- Sergey Cheban

Sebastian Redl

9:43 a.m.

On 15.04.2011 11:05, Sergey Cheban wrote:

...

15.04.2011 1:49, Fabio Fracassi пишет:

...
On 14/4/2011 23:25, Artyom wrote:

...
3. When you use MSVC add UTF-8 BOM (even thou it is useless but this is the way MSVC distinguishes between local encodings and UTF-8)

Bad but this is reality.

IIRC clang will not compile source files with a BOM, which would effectifly render the code unportable. Any code that uses the characters that are not in the "basic source character set" (96 characters) is not portable anyway. The EBCDIC-encoded sources probably cannot be compiled by the clang, too. Also, Clang has been updated to ignore BOMs, so tip-of-tree Clang will compile such things. I don't have any idea what it will do with non-ASCII characters in the source, though.

Sebastian

Sergey Cheban

14 Apr 14 Apr

10:47 p.m.

15.04.2011 1:01, Ryou Ezoe пишет:

...

But the point is string literal with no encoding prefix is evil. if i write

char s[] = "あいうえお" ;

MSVC use shift-jis encoding.

If you want to use the non-wide utf-8 strings in MSVC, try to store the file as utf-8 WITHOUT the BOM mark. Then, the compiler will store the strings in the executable as is, without any conversion. Unfortunately, the wide strings (L"あいうえお") will be broken in this case. Anyway, it's a problem of the MSVC, not boost. -- Sergey Cheban

Artyom

8:11 p.m.

...

...
1. Wide characters are not second class citizens in Boost.Locale everything is fully supported for Wide API. transalte only accept char * or std::string.

Ok, let me explain. Yes, boost::locale::translate receives "char *" as input as it uses ASCII text as key. You write the code originally in English using ASCII and they translated to Wide Strings or to narrow according to the context. So basically your keys or original text should be ASCII (and gettext would warn you if it does not) However translations are anything you wish. If you want to embed Japanese to the text you are welcome to do this but in such case you do not need translate. Only when the message is not found in dictionary it is converted to wide strings by simple casting each ASCII character to wide one. This is how gettext (and most other translations systems) work. There is nothing wrong with this.

...

As I said, using only UTF-8 is impossible in Windows. Because MSVC doesn't support UTF-8 literal yet. Input(parameter of translate) must support UTF-16 as well.

Actually it does... Take a look on libs/locale/examples/w* You need to have UTF-8 BOM and it would work perfectly (I know it is stupid to add BOM but that is how MSVC understands Unicode)

...

...
...
Further reading of the document reveals that auther of this library believes "should use wide strings everywhere" is a myth. http://cppcms.sourceforge.net/boost_locale/html/recommendations_and_myths.ht...

I strongly object about that is a myth.

I'm sorry but in context of cross platform programming wide characters quite useless.

So this library doesn't work well on Windows. Good cross platform.

It is not the problem of the library but the problem of badly defined wchar_t (or windows API) And wide characters are fully supported.

...

...
So I can't recommend using wide characters as they may be UTF-16 or UTF-32, however for Windows only development wide API is fine and fully supported.

You say "fully supported" without taking wide characters as an input?

It takes ASCII as input not wide characters should not be used for translations at all. See above.

...

...
...
Why it doesn't support char set conversion between UTFs?

It does

hmm I can't find it from document.

http://cppcms.sourceforge.net/boost_locale/html/namespaceboost_1_1locale_1_1...

...

...
...
I suspect auther of this library doesn't have any serious experience in Windows programming and languages like Japanese. What kind of joke it doesn't support Win32 API backend in MSVC?

I'm sorry? Win32API backend supported by MSVC, GCC/Mingw and even Cygwin!

It looks like document is bad. http://cppcms.sourceforge.net/boost_locale/html/using_localization_backends....

Two ponts:

...

You need GCC-4.x to use it.

Meaning you can't use GCC-3.4 but 4.x series as the 3.x does not support wide characters. Sorry if it mislead you. MSVC is first class citizen and actually much better supported then GCC.

...

Only UTF-8 encoding is supported.

Ok you misread the documentation. This part that describes locale generation http://cppcms.sourceforge.net/boost_locale/html/locale_gen.html The encoding defines encoding of narrow strings not wide ones. The wide strings encoding is UTF-16 or UTF-32 according to sizeof(wchar_t) If you want to use locale ja_JP.Shift-JIS then you should use either ICU or std backends. Meaning if you want to treat narrow strings as Shift-JIS encoded strings

...

Ryou Ezoe

I'd strongly recommend you to read the documentation and take a look on examples carefully before you make such statements. Artyom

Mathieu Champlon

15 Apr 15 Apr

8:27 a.m.

On 14/04/2011 21:11, Artyom wrote:

...

Yes, boost::locale::translate receives "char *" as input as it uses ASCII text as key.

You write the code originally in English using ASCII and they translated to Wide Strings or to narrow according to the context.

So basically your keys or original text should be ASCII (and gettext would warn you if it does not)

However translations are anything you wish.

If you want to embed Japanese to the text you are welcome to do this but in such case you do not need translate.

Only when the message is not found in dictionary it is converted to wide strings by simple casting each ASCII character to wide one.

This is how gettext (and most other translations systems) work.

There is nothing wrong with this.

Hi, For having worked in Japan for a while I can assure you that most programmers not only do not speak english but they also do not read it. The work-flow you describe is actually impracticable for them. If for instance a Japanese company is building a product for the Japanese, Korean and Chinese markets (and that's it), it's a bit cumbersome to suggest they should hire a programmer who can understand English in order to take care of the translation code. And the same goes for the localization company they will likely subcontract the translations to... MAT.

Sergey Cheban

14 Apr 14 Apr

7:24 p.m.

14.04.2011 20:45, Ryou Ezoe пишет:

> Here is my review.
> I am Japanese.
> I use Microsoft Windows OS.
>
> Short conclusion.
> This library is badly designed and completely useless for Japanese
> language and Windows programmer.
> It must be rejected or drop support of MSVC and Windows OS.
I'm a russian windows prorgammer. My native language is russian, I know 
english and have very basic knowledge about german, arabic, hebrew 
chinese and japanese languages.
I disagree with you.

> This API is very badly designed.
> I really disappointed about use of hard coded char and std::string.
> It's not just output(return type) it matters.
> We, Japanese Windows programmer, natively use wchar_t.
> So input(function parameter) must support it too.
1. The native type for the strings under windows is really WCHAR, not 
wchar_t.
2. According to the c++0x specification, "Type wchar_t is a distinct 
type whose values can represent distinct codes for all members of the 
largest extended character set specified among the supported locales". 
For the MSVC compiler, the size of the wchar_t is 16 bits. So, the 
wchar_t type IS a second class citizen for the MSVC compiler. It is not 
a problem of the boost::locale library.

> Besides, what encoding shoud we use for char in Windows?
> It isn't fixed under the Windows and MSVC.
> MSVC doesn't support UTF-8 literal.
MSVC supports the UTF-8 literals for the source files stored as UTF-8 
with byte order mark (0xEF,0xBB,0xBF).

> So, we need char set conversion between UTF-8 and UTF-16 in order to
> use this library.
> Does this library support this conversion? No.
 > Why it doesn't support char set conversion between UTFs?
Agree. I see no way to convert strings between UTF-X and UTF-Y without 
specifying the locale. But please note the boost.unicode library that is 
under development now. It has the utf_transcoder template 
(http://mathias.gaunard.com/unicode/doc/html/boost/unicode/utf_transcoder.html). 
I propose to wait until the boost.unicode review and add something like 
it to the boost.locale if the boost.unicode will not be accepted.

> Why it hardcode char and std::string?
The wchar_t strings seems to be supported. See boost::locale::wformat 
for example. I see no problems here.

--
Sergey Cheban

Artyom

7:29 p.m.

...

...
Does this library support this conversion? No. Why it doesn't support char set conversion between UTFs?

Agree. I see no way to convert strings between UTF-X and UTF-Y without specifying the locale.

Actually there is: http://cppcms.sourceforge.net/boost_locale/html/group__codepage.html#ga878bd... For example std::wstring w=boost::locale::conf::to_utf<wchar_t>("some text","UTF-8"); std::string n=boost::locale::conf::to_utf<char>(L"some text","UTF-8"); Artyom

Sergey Cheban

9:33 p.m.

14.04.2011 23:29, Artyom пишет:
>>
>
>>> Does this library support this conversion? No.
>>> Why it  doesn't support char set conversion between UTFs?
>>
>> Agree. I see no way to  convert strings between UTF-X and UTF-Y without
>> specifying the locale.
> Actually there is:
>     std::wstring w=boost::locale::conf::to_utf<wchar_t>("some text","UTF-8");
>     std::string n=boost::locale::conf::to_utf<char>(L"some text","UTF-8");
It seems to me that this method has non-zero overhead:
1. The "UTF-8" charset string is passed to the to_utf() as std::string 
const &.
2. The wconv_to_utf object is created on the heap, initialized with 
open() and used to convert the string. Btw, why don't you create it on 
the stack?
3. When initializing, the wconv_to_utf object tries to normalize the 
charset (one more string allocation) and then to find it in the 
supported charsets table.
4. Finally, the 65001 codepage is passed to the MultiByteToWideChar. The 
MultiByteToWideChar() probably has it's own overhead related to finding 
the appropriate converter for the specified codepage.

I think we need something more lightweight.

--
Sergey Cheban

Denis Shevchenko

13 Apr 13 Apr

11:54 a.m.

New subject: FDIS C++0x available!

Hello all! FDIS C++0x publicly available: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3290.pdf. File size 10.2 Mb. - Denis

Joshua Juran

14 Apr 14 Apr

7:11 p.m.

New subject: New threads are not Replies (FDIS C++0x available!)

On Apr 13, 2011, at 4:54 AM, Denis Shevchenko wrote:

...

References: <20110407090144.44dcb027@ubuntu> <963365.20313.qm@web36706.mail.mud.yahoo.com> <408287.41352.qm@web112906.mail.gq1.yahoo.com> <558785.93255.qm@web36702.mail.mud.yahoo.com> <007201cbf9b5$444301f0$ccc905d0$@hetp.u-net.com> In-Reply-To: <007201cbf9b5$444301f0$ccc905d0$@hetp.u-net.com>

Please don't Reply when creating a new message. I nearly deleted your message, unread, along with the thread you replied to. Josh

5205

Age (days ago)

5213

Last active (days ago)

List overview

Download

52 comments

19 participants

participants (19)

Artyom
Chad Nelson
Daniel James
Denis Shevchenko
Edward Diener
Fabio Fracassi
Gevorg Voskanyan
Gordon Woodhull
Hal Finkel
Joshua Boyce
Joshua Juran
Mathieu Champlon
Matus Chochlik
Paul A. Bristow
Robert Kawulak
Ryou Ezoe
Scott McMurray
Sebastian Redl
Sergey Cheban