
----- Original Message ----
From: Edward Diener <eldiener@tropicsoft.com>
1. Documentation
The layout of the main page is decent, but I would have expected a discussion there, or as a first topic, of what Locale brings that the C++ standard locale does not have. I was disappointed not to find such a discussion.
Actually standard locale provides many things in limited way, each topic discussed in each section, I don't think that it was nessary to provide explanation for each one of them specifically. More detains can be seen there: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#rationale_why
[snip]
a. Introduction to C++ Standard Library localization support
The common critical problems of the C++ locale portion of the standard library seems spurious to me. The problems mentioned are really about implementations or programmer usage, not the C++ locale library itself. The only valid problem mentioned I find there is that the C++ standard library did not attempt to standardize any locale names. This makes using C++ locales based on locale names non-portable.
There are much more mentioned: 1. I've seen lots of libraries broken because of setting global locale 2. Almost all standard C++ libraries have bugs in locale implementation, for example both GCC's libstd++ and SunStuio's standard library may generate invalid UTF-8?! 3. Some things defected by design 4. Some things badly implemented 5. Some things (like message formatting) are just... not existent. 6. Most standard libraries provide only C and POSIX locale... Are there few? There all mentioned. I assume you hadn't worked to much with locales of standard C++ library because there are lots of issues.
Unfortunately the issues there make a very weak argument for the Locale library itself.
The library unifies and fixes existing problems.
b. Locale generation
I would have liked it if the doc here specified where one finds valid lists of language, country, encoding, and variant which make up a locale name. Without this information, the one valid problem mentioned regarding C++ locale is also a problem with Locale.
Actually I explicitly referred to standards ISO-639 and ISO-3166. These lists are updated once in a while and I don't think Boost.Locale library should list them.
The note about wide strings and 8-bit encoding makes no sense to me at all. If I am using a wide string encoding, why would I not be using wide string iostreams ?
You may not specify wide strings encoding it would be assumed it is US-ASCII. But I strongly not recommend doing it. And I recommend always using UTF-8 and it is mentioned in the documentation.
c. Collation
There is no explanation about what 'collation' is about. This is very disappointing, as it makes the rest of the discussion difficult to follow.
Especially for you and thous who are not familiar with Localization terminology there is a glossary: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#glossary So quick glance in it would answer for your question.
This is one reason why I dislike documentation which attempts to teach by example. It always seems to assume that if it throws examples at the reader before anything about the classes/templates in the examples have been mentioned, that this is somehow an effective way of learning a library. Instead it just creates confusion and serves unfortunately as a way by which a library implementer does not have to explain how the classes in his library actually work or relate to each other.
It takes some time to learn what is std::locale and how it works. I had written small introduction to std::locale but I do expect developer and user to open some documentation of the standard C++ library in order to understand things deeply. I can't cover every possible topic in this tutorial as I expect user who comes to localize the software to be able to learn about it not only from the given tutorial but also from other sources. Same as library that provides TCP/IP and Networking support assumes that you know a little about sockets, network addresses ports and so on.
[snip] d. Conversions
"You may notice that there are existing functions to_upper and to_lower in the Boost.StringAlgo library. The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions."
I do not understand how these conversion functions use a locale. The example gives: boost::locale::to_upper(gruben) used in a stream. Is this function using the locale imbued in the iostream ?
Actually a click on the function names in tutorial brings you to the reference documentation that shows that they receive locale as parameter. In most applications that use single language in their interface the global locale is expected to be defined which would make everything much easier.
[snip]
e. Numbers, Time and Currency formatting and parsing
A bunch of ICU flags are mentioned but with no indication about how these are supposed to be used by iostreams. These flags look like they are supposed to be used by C-like format printf statements but since Locale uses iostreams I can not understand their purpose with Locale.
First of all there are not ICU flags. These are stream manipulators that I assume you are familiar with, same as std::hex or std::setprecision I don't know how familiar your are with stream manipulators but they behave the same as standard's library manipulators.
f. Messages Formatting (Translation)
Gnu gettext should be explained when it interfaces with Locale Just telling someone to learn Gnu gettext is not adequate. Other than that the explanation is pretty thorough.
The point there is a huge knowledge base about GNU Gettext. I can write a many-many pages about it. I try to keep it simple and clean. You can always learn more from huge amount of tutorials around. You will have to master yourself with translation programs like poedit or Lokalize to work with dictionaries you will have to learn best practices. Localization is very wide topic and it is not trivial to explain it in the scope of this document, that is why external sources are required. Same as Boost.Asio should not explain on what socket is.
g. Character Set Conversions
An explanation of what character sets are, and what character set conversions entail, should be the beginning of this documentation.
Character set conversion is more utility function, boost.locale a way beyond this.
h. Localized Text Formatting
"Each format specifier is enclosed within {} brackets.."
These are not "brackets" but "braces". Brackets are '[]'.
Noted, thanks.
i. In general
It is confusing to me how generated locales affect the functionality of the different sections presented under 'Using Boost.Locale'. In a number of situations I am looking at classes or functions and I have no idea how these pick up a locale. I do understand that when used with iostreams the locale is determined by the locale imbued in the iostream. But outside of iostreams I do not understand from the documentation what locale is being used.
All reference documentation refers in many places to something like std::string foobar(std::string c,std::locale const &loc=std::locale()); Which means get default locale - which is global one. This isn't a concept of the Boost.Locale but rather the concept of the standard C++ library
If it is the C++ global locale, the documentation should say so. This entire issue about how locales are actually being used in various parts of the library should be explained as part of an overall explanation of the library. I find this good overall explanation of the library the major flaw in the documentation.
It is something that is part of the standard library not Boost.Locale Maybe it not always clear, but this is why you need sometimes to get "dirty" and actually write some code to get used to the concepts. Same as you will never understand how Boost.Asio works till you write some code with it.
[snip]
In general I think that using the global locale is a bad programming practice when one specifically intends to work with locales. Unfortunately it was hard for me to understand how individual locales are used with each of the parts of the library from the documentation. But I will assume for the time being, because it seems the only correct design, that each part of the library which is documented can work with some non-global locale which is created and passed around as necessary.
Actually 99% or programs around use global locale, most of interactive programs have one user that speaks in one language, thus setting the locale globally more then makes sense. For example GtkMM and Gtk at all supports only gloabl locale and it servers it quite well. The sitation when you need to have sereral locales in the program is generally client server solutions. For example in the CppCMS web framework I developed Boost.Locale for it uses locale in special context and the output stream. It really domain dependent and developer should decide how to move locale around.
The only design flaw which I could discover in the library was in message translation.
If so, I hope I'll convinse you that the desision Boost.Locale did is the right one and would give the best results for software localization. Sometimes it is hard to understand why the best practices are the way they are till you get burned, especially for topic like localization were each persone see's only his side of the story (his language his culture) and this is natural.
The fact that translation always begin from English ( or perhaps some other narrow character language ) to something else is horrendous. I can understand that the Locale implementer wanted to use something popular and that already exists, but an idea so tremendously flawed in its conception either needs to be changed, if possible, or discarded for something better. I do understand that translation is just one part of this large library, but I hope that the implementer undestands how ridiculous it is to assume that non-English programmers are going to be willing to translate from English to language X rather than from their own language to language X.
I'll explain it the way I explained it before, there are many reasons to have English as core/source language rather then other native language of the developer. Living aside technical notes (I'll talk about them later) I'll explain why English source strings is the best pratice. a) Most of the software around is being only partially translated, it is frequent that the translations are out of sync with major development line, Beta versions come usually with only limited translation support. Now consider yourself beta test of a program developed in Japan by programmers who does not know English well and you try to open a file and you see a message: "File this is bad format" // Bad English Or see: "これは不正なファイル形式です。" // Good Japanese (actually // translated with google :-) Opps?! I hope now it is more clear. Even for most of us who do not speak English well are already familiar to see English as international language and would be able to handle partially translated software. But with this "natural method" it would be "all greek to us" b) In many cases, especially when it comes to GNU Gettext your customer or just a volonteer can take a dictionary template, sit for about an hour or two with "Poedit" and give acceptable quality translation of medium size program. Load it test it and send it back. This is actually happens 95% of time on open source world and it happens in closed source world as well. Reason - it is easy, accesable to everyone and you do not have to be a programmer to do this. You even do not have to be a professional tanslator with a degree in Lingustics to translate messages from English to your own language. That is why it is the best practice, that is why all translation systems around use same technique for this. It is not rediculas, it is not strange it is reality and actually is not so bad reality. Now technical reasons: 1. There is a total mess with encodings between different compilers. It would be quite hard to relate on charset of the localized strings in source. For windows it would likely be one of 12XX or 9xx codepages For Unix it would likely to be UTF-8 And it is actually impossible to make both MSVC and GCC to see same UTF-8 encoded string in same source L"שלום-سلام-pease" Because MSVC would want a stupid BOM from you and all other "sane" compilers would not accept BOM in sources at all. 2. You want to be able to conver the string to the target string very fast if it is missed in dictionary, so if you use wide strings and you can't use wide Unicode strings in sources because of the problem above you will have to do charset conversion. When for English and ASCII it would be just byte by byte casting. You don't want to do charset conversion for every string around in runtime. 3. All translation systems (not Gettext only) assume ASCII keys as input. And you do not want to create yet another new translation file format as you will need to: a) Port it to other languages as projects may want to use unified system b) Develop nice GUI tools like Lokalize or Poedit (that had been under development for years) c) Try to convinse all users around that your reinvented wheel is better then gettext (and it wouldn't be better) I hope not it is clear enough. (I'll keep this description for futute reviewers, as I had never thought that somebody would actually ask about it)
I am assuming that all other parts of the library support both narrow character encodings and wide character encodings fully, and that at least UTF-16 is always supported using wide characters. It was really hard for me to make out from the docs whether or not this was the case.
Small notice wchar_t != UTF-16, in fact wide strings are UTF-16 only on Windows on all other platforms they are UTF-32. That is why I almost never relate to UTF-16 directly but rather to wide or narrow strings. For best portability, it is best to use UTF-8 and narrow strings.
I believe a great deal of work was put into the library, and that this work is invaluable in bringing locale usage into C++ in a better way than it is currently supported in C++ locale.
But I would have to vote "No" that the library should be accepted into Boost at the current time, with some provisos which would most likely gain my own change to a "Yes" vote in the future.
1) The documentation should explain the differences and improvements of Locale over C++ locale in a good general way.
It is mentioned in rationale and all over the tutorial several times. If you need some specific questions or problems I can add them to tutorial.
2) The documentation should explain the use of locales for each of the topics.
3) A number of topics should discuss what they are about in general, and more time should be given to discuss how the classes/templates relate to each other.
I can't explain all Unicode/Localization related topics in the tutorial in same way TCP/IP library can't explain socket concepts from the begginning.
4) Message translation should be reconsidered. I don't mean to say that the way it is done is enough to have the library rejected, but I can not believe that it is not a flawed system no matter what the popularity of Gnu gettext might be.
It would not happen because it is a matter of the best practice and correct approach. These things hadn't been invented by myself but rather by many experienced people working in this area for many years. Same as you would not say your student to use gets even it is possible to use in certain situations just because it is wrong practice.
My major trouble with the library, which has led to my "No" vote, is that I can not really understand how to use the library from the documentation or what the library really offers over and above C++ locales. I realize the library programmer may not himself be a native English speaker, but if I can not really understand a library from its documentation in a way that makes sense to me I can not vote for its inclusion into Boost. I strongly suspect that if I were to understand the functionality of the library through a more rigorous explanation of its topics, and each topics relationship to a locale class and various encodings, I mught well vote for its inclusion into Boost. But for now, and in the state which the documentation resides for me, I can not do so. So I hope this review will be undersstood at least partially as a request to improve the docs as much as anything else.
This way or other, thank you for the review and I hope you reconsider your vote after the deep reason I had gave you. Artyom