[locale] Review of the proposed Boost.Locale library

Hello, this is my review of the Locale library proposed for inclusion in Boost. I vote to accept the proposed Boost.Locale as a Boost library. - What is your evaluation of the design? The library is nicely structured into individual components, as far as I can infer from the documentation and having a glance at some of the source files. I like that it is possible to use different backends, utilizing the large ICU library when available. - What is your evaluation of the implementation? I was having only a superficial look at some of the source files, so I can not comment much. What I have seen looks clean to me. While reading the source code I think I could benefit from even more code comments. There are a lot already, but on the other hand there exists for example a source file libs/locale/src/shared/mo_lambda.cpp apparently containing a whole state machine or something without any word how it is used. I also stumbled over an odd comment in file libs/locale/src/std/numeric.cpp in line 163: "// workaround common bug - substitute NBSP with ordinary space". As a reader who did not already write the code I can not infer much from comments such as this one - I mean "What is the common bug then?" If this comment refers to the situation that e.g. the locale implementation of GCCs standard library implementation generates invalid UTF-8 when using NBSP as thousands seperator, then the comment should say that. Well, my point about comments is a minor issue of course. I do not suggest that the author goes over all the code again just to add or change comments. I am a bit worried over the size of the libraries binary. On GNU/Linux x86_64 the built libboost_locale.so is almost 10MB large and codesize according to the "size" utility is almost 1MB. This comes in addition to the underlying ICU library. Is there any potential for reducing codesize? Due to the same concern about codesize I have also one concrete suggestion relating to the comparator template of the collation component. I suggest to not make the default comparision level a template parameter, instead to store it as a member. This way, when instantiating container templates the number of instantiations can be kept down. Or am I mistaken in thinking this would lead to more compact code? - What is your evaluation of the documentation? It is nicely structured. I could easily find the documentation for any specific component I was using and the documentation is also very helpful if you have a task at hand and want to find out which component to use (and whether such a component is available in the proposed Boost.Locale). There are some issues I have: Phrasing is sometime odd. For example right in the beginning, under the headline "What is Boost.Locale" there is a sentence which reads "C++ offers a very good base for localization via the existing C++ locale facets std::num_put, std::ctype, std::collate etc. But these are very limited and sometimes buggy by design." I can only guess what that means. Superficially it seems like a contradiction to me, something which is excellent can not be buggy by design, not even sometimes. Okay, minor issue again, it just stands out because it is only the fourth or fifth sentence or so. There are some typos (I guess) where it is important to be correct: Under "Messages Formatting (Translation)", subheading "Indirect Message Translation" all functions are documented as "The source text is not copied.". From the emphasis in the original document I think I can conclude that some of these functions copy and some do not. The reference section of the documentation should document the requirements on the CharType which many functions/classes are parameterized upon (can it be char/wchar_t only?). The reference documentation for token_iterator and break_iterator should not have individual entries for operator++, operator--, operator== and other operators which are always required for iterators, unless the documentation adds information. Instead, it should just be stated to which iterator concepts these templates conform. Documentation for the dereferening operators (operator*) of these templates states that the underlying sequences iterated over can not be modified with these operators. As this is the case, could a usage hint be incorporated stating that users should always use const_iterators as arguments to these templates? If a user sometimes employs for example std::string::iterator and sometimes std::string::const_iterator this would lead to unnecessary code bloat. Also, operator-> is missing from these templates. - What is your evaluation of the potential usefulness of the library? It is potentially very useful. The only objection I could conceive of is that ICU already provides much of the functionality of Boost.Locale. This library also duplicates functionality of Matthew Wilsons excellent Fastformat library (http://fastformat.sourceforge.net/), though the latter does not feature localization support as complete as that of the proposed Boost.Locale. - Did you try to use the library? With what compiler? Did you have any problems? I used the CMake way to build this library on an Opensuse system on x86_64. Compiler is GCC 4.5.1 (or a Svn snapshot roughly corresponding). The Boost version built against is 1.46.1, as delivered by Opensuse. I did not encounter any problems building the library. I played around with some of the components of the library and ran the supplied test cases. There is one inconsistency I noted, the following program generates output "Currency: $100.11" when the environment variable LANG is set to "en_US.UTF-8" and variable LC_MONETARY is set to "de_DE.UTF-8". Shouldn't the library honor LC_MONETARY? If LANG is set to "de_DE.UTF-8" "Currency: 100,11 €" is put out. Program: ------------------------------------------------- #include <ios> #include <iostream> #include <ostream> #include <locale> #include "boost/locale.hpp" int main() { namespace loc = boost::locale; using std::cout; loc::generator gen; cout.imbue(gen("")); cout << "Currency: " << loc::as::currency << 100.11 << '\n'; } ------------------------------------------------- - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? Reading the tuturial section of the Documentation. Glancing over the reference section. I also had a look at some of the source files, but only superficially. - Are you knowledgeable about the problem domain? Only marginally. With best regards, Volker Lukas

From: Volker Lukas <vlukas@gmx.de>
I vote to accept the proposed Boost.Locale as a Boost library.
I'm glad to hear.
- What is your evaluation of the implementation? I was having only a superficial look at some of the source files, so I can not comment much. What I have seen looks clean to me. While reading the source code I think I could benefit from even more code comments. There are a lot already, but on the other hand there exists for example a source file libs/locale/src/shared/mo_lambda.cpp apparently containing a whole state machine or something without any word how it is used.
Good point. It is C expression parser. I think I need to comment it better.
I also stumbled over an odd comment in file libs/locale/src/std/numeric.cpp in line 163: "// workaround common bug - substitute NBSP with ordinary space". As a reader who did not already write the code I can not infer much from comments such as this one - I mean "What is the common bug then?" If this comment refers to the situation that e.g. the locale implementation of GCCs standard library implementation generates invalid UTF-8 when using NBSP as thousands seperator,
Yes, I mentioned this bug in tutorial.
then the comment should say that. Well, my point about comments is a minor issue of course.
Noted
I do not suggest that the author goes over all the code again just to add or change comments.
I'm still glad to hear about possible problems.
I am a bit worried over the size of the libraries binary. On GNU/Linux x86_64 the built libboost_locale.so is almost 10MB large and codesize according to the "size" utility is almost 1MB. This comes in addition to the underlying ICU library. Is there any potential for reducing codesize?
You likely compiled it with debug information (this is default for CMake build), without debug information it should be relatively small if I'm not mistaken about 700K. Also you can always reduce its size by turning unused backends off. I always kept in mind an embedded use of it on Linux platforms (which is important for CppCMS) so it can be not so big, it is very configurable.
Due to the same concern about codesize I have also one concrete suggestion relating to the comparator template of the collation component. I suggest to not make the default comparision level a template parameter, instead to store it as a member. This way, when instantiating container templates the number of instantiations can be kept down. Or am I mistaken in thinking this would lead to more compact code?
Not really, because most of the code is actually the backend that deals with comparison. And finally it goes as a parameter to the underlying ICU or Win32API function.
[snip]
There are some issues I have: Phrasing is sometime odd. For example right in the beginning, under the headline "What is Boost.Locale" there is a sentence which reads "C++ offers a very good base for localization via the existing C++ locale facets std::num_put, std::ctype, std::collate etc. But these are very limited and sometimes buggy by design." I can only guess what that means. Superficially it seems like a contradiction to me, something which is excellent can not be buggy by design, not even sometimes. Okay, minor issue again, it just stands out because it is only the fourth or fifth sentence or so.
Actually I mean the base is good (std::locale, facets idea, integration with iostreams) but implementation and actual existing facets quite bad designed.
There are some typos (I guess) where it is important to be correct: Under "Messages Formatting (Translation)", subheading "Indirect Message Translation" all functions are documented as "The source text is not copied.". From the emphasis in the original document I think I can conclude that some of these functions copy and some do not.
Already Noted.
The reference section of the documentation should document the requirements on the CharType which many functions/classes are parameterized upon (can it be char/wchar_t only?).
Yes it can only be char and wchar_t, in experimental C++0x mode (not really supported due to compiler/std library bugs) it may be char16_t and char32_t. It is all over the code, I probably may mention it more explicitly.
The reference documentation for token_iterator and break_iterator should not have individual entries for operator++, operator--, operator== and other operators which are always required for iterators, unless the documentation adds information. Instead, it should just be stated to which iterator concepts these templates conform.
Actually it is how doxygen works... In any case good point I'll check it.
Documentation for the dereferening operators (operator*) of these templates states that the underlying sequences iterated over can not be modified with these operators. As this is the case, could a usage hint be incorporated stating that users should always use const_iterators as arguments to these templates? If a user sometimes employs for example std::string::iterator and sometimes std::string::const_iterator this would lead to unnecessary code bloat. Also, operator-> is missing from these templates.
Ok, noted I'll take a look on this.
[snip]
I played around with some of the components of the library and ran the supplied test cases. There is one inconsistency I noted, the following program generates output "Currency: $100.11" when the environment variable LANG is set to "en_US.UTF-8" and variable LC_MONETARY is set to "de_DE.UTF-8". Shouldn't the library honor LC_MONETARY? If LANG is set to "de_DE.UTF-8" "Currency: 100,11 €" is put out. Program: ------------------------------------------------- #include <ios> #include <iostream> #include <ostream> #include <locale>
#include "boost/locale.hpp"
int main() { namespace loc = boost::locale; using std::cout;
loc::generator gen; cout.imbue(gen(""));
cout << "Currency: " << loc::as::currency << 100.11 << '\n'; } -------------------------------------------------
Actually, it does not follows the POSIX separation to subclasses. It uses LANG, LC_ALL and LC_CTYPE in that order to get the current locale.
With best regards, Volker Lukas
Thank you very much for the review. Artyom
participants (2)
-
Artyom
-
Volker Lukas