[locale] Review of the proposed Boost.Locale library

Hi all, this is my review of the proposed Boost.Locale library: DISCLAIMER: I'm pretty familiar with the issues of application localization for the Central/East European region mainly with Slavic languages, mostly those using Latin characters, (Slovak, Czech, Polish), also a little bit with languages using some variant of Cyrillic characters (Russian, Bulgarian) and with West European languages like German, i.e. languages having somewhat different capitalization and collation rules than English has, some of which have also multiple plural forms in certain inflection cases. But I have no experience whatsoever with localization in other environments: Middle-East, Asian, etc. so I might have missed some important issues arising in such cases. - Please explicitly state in your review whether the library should be accepted. Yes, it should be accepted. The general review checklist: - What is your evaluation of the design? Straightforward and easy to use, similar to other localization libraries I've worked with. - What is your evaluation of the implementation? Since most of the problems I've spotted (and many more) were already addressed in Steven's very thorough review of the code and in other reviews I'm not going to reiterate them. AFAIK the Boost Coding Guidelines require max 80 characters per line in the source code. The sources of Boost.Locale do not follow this guideline which makes them slightly less readable, one needs to scroll a lot horizontally. Otherwise I leave the quality of the implementation to be judged by more competent people on this list. Concerning the external dependencies, ICU, tools from GNU gettext (msgfmt, ...), etc.: I don't think that it would be a good idea to re-implement them. A lot of effort has been put into their development, they are well supported, maintained and available for every major platform so I don't see any advantage in reinventing the wheel. - What is your evaluation of the documentation? The documentation is readable and clear to someone familiar with l10n/i18n however things like collation, capitalization, case folding, etc. should be better explained for users who are not familiar with them. The previous/next page links are missing which is little inconvenient. In the Message Formatting/Message Translation section there is a typo: ------------------------------------------------------------------------------ std::cout << boost::locale::tanslate("Hello World!") << std::endl; ------------------------------------------------------------------------------ In the examples and code snippets in the docs I think that "using namespace std;" should be removed and instead of "using namespace boost::locale;" there should be something like "namespace bl = boost::locale;" which would make much clearer which parts of the sample code are referring to Boost.Locale's code and which to std library; Also (but I might be mistaken here since I'm not a native German speaker) the example in the "Conversions" section names a variable 'gruben' (this variable stores the word "grüßen") a German would probably call this variable 'grussen' since 'ß' != 'b'. - What is your evaluation of the potential usefulness of the library? This library is very useful for people working on portable applications that are to be localized for non-English environments. - Did you try to use the library? With what compiler? Did you have any problems? Yes (I built the library), - on Debian Linux (64bit) with ICU 4.6 - using cmake - with GCC 4.6.0 non-C++0x mode: no problems - with GCC 4.6.0 C++0x mode: auto_ptr - related warnings - using bjam (Boost 1.46.1) - with GCC 4.6.0 non-C++0x mode: no problems - on Windows Vista Basic (32-bit) + cygwin + ICU 4.5 - using bjam (Boost 1.46.1) - with GCC 4.3.4: no problems - on Windows Vista (32-bit) + ICU 4.6 (bin package for Win32) - using bjam (Boost 1.46.1) - with MSVC 9: the library built, but for some reason bjam didn't use ICU even with -sICU_PATH specified, and I didn't have the time to investigate why. Also I've built and linked the examples and several small apps of my own trying out some features of the library (message translation, collation), on Debian 64-bit + GCC. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I've read the documentation, some parts of the discussion and some of the other reviews on this list, built the library and examples (see above) + written several toy applications, I've had only a quick glance at the sources. Put together probably somewhere around 12 hours. - Are you knowledgeable about the problem domain? I've worked on several projects with some kind of l10n/i18n using several different utilities (GNU gettext, wxWidgets, an in-house solution). But I certainly don't consider myself an expert on localization for the reasons stated above. Best regards, Matus Chochlik

I'm glad to hear.
Small point, there are lots of code in boost that does not follow 80 characters limit. Also I understand the desire to put somewhere a limit (derived from the size of a punch card back to 60th) But sometimes it just make it more then less readable.
Noted.
Noted.
Good point.
Yes, you are right grussen is better.
It may be due to fact that you used provided ICU packages (for MSVC10 and they release only) It is always better to build ICU by yourself. They provide visual studio projects. I probably need to state this more explicitly in tutorial - how to build ICU with MSVC.
Thank you very much for the review. Artyom

On 19.04.2011 1:07, Artyom wrote:
i think this restriction should be relaxed (to 100 - 120 symbols per line) in presents of widespread wide-format displays
But sometimes it just make it more then less readable.
-- - Do you speak English? Мужик с глубоким вздохом: - Yes I do. А хули толку?

Artyom wrote:
Perhaps for you, but then for others that arrange windows to show 80 columns of fixed width characters, longer lines become less readable. Some choose to use a widescreen monitor's full width for a single file, so they want longer line lengths. I like the ability to see files side-by-side, while others won't find that compelling. It's hard to satisfy everyone. For now, the Boost standard *is* 80 columns (see <http://www.boost.org/development/requirements.html>). You're not alone in wanting it to be different, but until it changes, you should follow it. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On 19.04.2011, at 09:19, Max Sobolev wrote:
It is. At the same time, though, I find that 95% of the German programmers I know use English variable names, and the remaining 5% are generally students who I expect to lose the habit of using German names eventually. Sebastian
participants (5)
-
Artyom
-
Matus Chochlik
-
Max Sobolev
-
Sebastian Redl
-
Stewart, Robert