[locale] Review: Tutorial

index.html: "Correct case conversion, case folding and normalization." Please insert a link to a resource explaining the difference between case conversion and case folding. std_locales.html: In the first code snippet "std::use_facet<std::ctype>(some_locale)" is wrong. Should be: std::use_facet<std::ctype<char> >(some_locale) "imbued into an iostream" doesn't look OK, especially with 'iostream' in fixed-width font. "imbued into an I/O stream object" perhaps. Such usage of 'iostream' is present on most other pages as well. A link near the "cout.imbue(std::locale("en_US.UTF-8")); ..." example to the "Locale names are not standardized. ..." paragraph would help. Would be good to mention std::has_facet. What is paper_size::inches? I guess it should be measure::inches. The comment below needs to be corrected too. class measure is missing the static std::locale::id id; member print_distance example should use the 'out' object instead of std::cout. "// Set default locale as global" This comment is misleading. The default locale in C++ programs is the 'classic' locale. std::locale("") is the user's preferred locale. Or perhaps we can use the system's default locale term, but I think "user's preferred" is better. formatting_and_parsing.html: Strictly speaking, "the iostream STL library" is not correct formulation. Should be e.g. "the standard I/O streams". why "double now=time(0);" and not "time_t now=time(0);" ? s/tommorrow/tomorrow/ s/display in local time format/display in local time zone/ s/as::locale_time/as::local_time/ s/Locale time is/Local time is/ conversions.html: Add code example calling to_lower and fold_case on "grüßen" to help users understand the difference between the two. messages_formatting.html: s/Is change to/Is changed to/ Is there a default search path for dictionaries (/usr/share/locale under *nix), or should an application always call add_messages_path() ? "The source text is not copied." Sometimes 'not' is highlighted, and sometimes 'is'. So what does it try to say after all? That the provided string argument must not die before the translated string is fetched? s/tanslate/translate/ s/Japaneese/Japanese/ ja << msg missing semicolon s/main()/int main()/ std::wstring msg = translate("Do you want to open the file?").str<wchar_t>(some_locale) missing semicolon s/GNU Gettext catalogs has/GNU Gettext catalogs have/ "is \c n 1 or not" looks like a doxygen formatting problem. Such problems exist in some other places as well. s/Occured/Occurred/ s/managment/management/ "Is there any reason to prefer the Boost.Locale implementation to the original GNU Gettext runtime library? In either case I would probably need some of the GNU tools." Maybe add a c. point about license charset_handling.html: s/does not provides/does not provide/ localized_text_formatting.html: s/when we create a messages/when we create messages/ localized_text_formatting.html: wcout << wformat(L"Today {1,date} I would meet {2} at home") % time(0) % name <<endl missing semicolon int a_and_b = a+b; please change to int a_plus_b = a+b; dates_times_timezones.html: s/they fully aware/they are fully aware/ Or rather, what does 'they' refer to here? s/than setting "current day of week"/then setting "current day of week"/ add << between some_point and " is the " s/We recommended you look/We recommend you to look/ And what date-time range does the ICU backend support? I'd like to echo other reviewers that printing date/time should not print a number by default. s/alway/always/ s/and over that build dates/and over that builds dates/ working_with_multiple_locales.html: s/get.locale_cache_enabled(true);/gen.locale_cache_enabled(true);/ s/std::locale ar=get("ar_EG.UTF-8");/std::locale ar=gen("ar_EG.UTF-8");/ s/as parameters in various functions/as parameters to various functions/ using_localization_backends.html: s/Case handing/Case handling/ recommendations_and_myths.html: s/it probably the most/it is probably the most/ s/and may other applications/and many other applications/ building_boost_locale.html: s/If your icu build in placed/If your icu build is placed/ s/if you relay on auto-linking/if you rely on auto-linking/ appendix.html: "is depend" substitute with "is dependent" or "depends" s/not bring read performance advantage/not bring real performance advantage/ s/would work only of standard facets/would work only if standard facets/ s/Boost locale fully supports/Boost Locale fully supports/ s/may broke/may break/ s/tje/the/ "including all Western, Cyrillic, Hebrew, Thai, Arabic and CJK encodings." Shouldn't be "characters"? s/std::sting/std::string/ s/not indented to be used in production/not intended to be used in production/ General remarks: Usage of comments in code snippets is inconsistent in the sense that sometimes the comment comes after the code is addresses, the other times it comes before. One of those styles should be used all over the place - preferably comments coming first as that is the common practice in other Boost libraries. Would like to see examples of using Boost.Locale facets directly. Would like to see examples of user-defined I/O inserters/extractors that respect the current state of Boost.Locale formatting as altered by manipulators. E.g. a "class money" example which uses the Boost.Locale currency formatting info for its I/O, or a "class bigint" example which uses the Boost.Locale number formatting info for its I/O. Bonus points if they fallback to std facets if corresponding Boost.Locale info is missing from the stream! Thanks, Gevorg

From: Gevorg Voskanyan <v_gevorg@yahoo.com>
index.html:
"Correct case conversion, case folding and normalization." Please insert a link
to a resource explaining the difference between case conversion and case folding.
Noted.
std_locales.html:
[snip]
Noted, thanks
"// Set default locale as global" This comment is misleading. The default locale
in C++ programs is the 'classic' locale. std::locale("") is the user's preferred
locale. Or perhaps we can use the system's default locale term, but I think "user's preferred" is better.
May be it is better the "system locale"
formatting_and_parsing.html:
Strictly speaking, "the iostream STL library" is not correct formulation. Should
be e.g. "the standard I/O streams".
why "double now=time(0);" and not "time_t now=time(0);" ?
No specific reason.
s/tommorrow/tomorrow/
s/display in local time format/display in local time zone/
s/as::locale_time/as::local_time/ s/Locale time is/Local time is/
Thanks
conversions.html:
Add code example calling to_lower and fold_case on "grüßen" to help users understand the difference between the two.
Very good point, thanks, very good example!
messages_formatting.html:
s/Is change to/Is changed to/
Is there a default search path for dictionaries (/usr/share/locale under *nix),
or should an application always call add_messages_path() ?
Always, because it depends on actual installation prefix. Usually it is done with some compile time constant.
"The source text is not copied." Sometimes 'not' is highlighted, and sometimes
'is'. So what does it try to say after all? That the provided string argument
must not die before the translated string is fetched?
Already noted there is a small error. std::string is copied no need to keep it, while "char const *" should not die - usual static text.
s/tanslate/translate/
s/Japaneese/Japanese/
ja << msg missing semicolon
s/main()/int main()/
std::wstring msg = translate("Do you want to open the file?").str<wchar_t>(some_locale) missing semicolon
s/GNU Gettext catalogs has/GNU Gettext catalogs have/
"is \c n 1 or not" looks like a doxygen formatting problem. Such problems exist
in some other places as well.
s/Occured/Occurred/
s/managment/management/
Noted, Will be fixed.
"Is there any reason to prefer the Boost.Locale implementation to the original
GNU Gettext runtime library? In either case I would probably need some of the
GNU tools." Maybe add a c. point about license
Good point, strange I had forgot the license, probably I mentioned it in some other place.
charset_handling.html:
s/does not provides/does not provide/
localized_text_formatting.html:
s/when we create a messages/when we create messages/
localized_text_formatting.html:
wcout << wformat(L"Today {1,date} I would meet {2} at home") % time(0) % name
<<endl missing semicolon
int a_and_b = a+b; please change to int a_plus_b = a+b;
dates_times_timezones.html:
s/they fully aware/they are fully aware/ Or rather, what does 'they' refer to
here?
s/than setting "current day of week"/then setting "current day of week"/
add << between some_point and " is the "
s/We recommended you look/We recommend you to look/
Noted, thanks,
And what date-time range does the ICU backend support?
Actually time before BC and long after, virtually limited by range of double.
I'd like to echo other reviewers that printing date/time should not print a number by default.
Yes, will be fixed.
s/alway/always/
s/and over that build dates/and over that builds dates/
working_with_multiple_locales.html:
s/get.locale_cache_enabled(true);/gen.locale_cache_enabled(true);/ s/std::locale ar=get("ar_EG.UTF-8");/std::locale ar=gen("ar_EG.UTF-8");/
s/as parameters in various functions/as parameters to various functions/
using_localization_backends.html:
s/Case handing/Case handling/
recommendations_and_myths.html:
s/it probably the most/it is probably the most/
s/and may other applications/and many other applications/
building_boost_locale.html:
s/If your icu build in placed/If your icu build is placed/
s/if you relay on auto-linking/if you rely on auto-linking/
appendix.html:
"is depend" substitute with "is dependent" or "depends"
s/not bring read performance advantage/not bring real performance advantage/
s/would work only of standard facets/would work only if standard facets/
s/Boost locale fully supports/Boost Locale fully supports/
s/may broke/may break/
s/tje/the/
Noted, will be fixed.
"including all Western, Cyrillic, Hebrew, Thai, Arabic and CJK encodings." Shouldn't be "characters"?
s/std::sting/std::string/
s/not indented to be used in production/not intended to be used in
Yes production/
General remarks:
Usage of comments in code snippets is inconsistent in the sense that sometimes
the comment comes after the code is addresses, the other times it comes before.
One of those styles should be used all over the place - preferably comments coming first as that is the common practice in other Boost libraries.
Ok, I'll try to make them more consistent
Would like to see examples of using Boost.Locale facets directly.
Yes, good point, yet most facets are quite low level and can be used indirectly more easily
Would like to see examples of user-defined I/O inserters/extractors that respect
the current state of Boost.Locale formatting as altered by manipulators. E.g. a
"class money" example which uses the Boost.Locale currency formatting info for
its I/O,
Good point.
or a "class bigint" example which uses the Boost.Locale number formatting info for its I/O. Bonus points if they fallback to std facets if corresponding Boost.Locale info is missing from the stream!
Actually it is not quite possible, at least with ICU backend as it supports only double, int32 and int64 types, so it wouldn't be possible to do it. It can be done somehow for non-ICU backends that provide std::numpunct, (it is not enough for ICU case as it is too primitive to handle all required options) Thanks, Very good points. Artyom

Artyom wrote:
Gevorg Voskanyan wrote:
why "double now=time(0);" and not "time_t now=time(0);" ?
No specific reason.
Looks kinda strange, not sure if it's appropriate to be kept that way in the tutorial. After all, nobody will ask "why time_t?", but some might ask "why double?".
"The source text is not copied." Sometimes 'not' is highlighted, and sometimes
'is'. So what does it try to say after all? That the provided string
argument
must not die before the translated string is fetched?
Already noted there is a small error. std::string is copied no need to keep it, while "char const *" should not die - usual static text.
Makes sense, but in addition to "is *not* copied" statements, a big warning about this needs to be inserted, and make it obvious that it applies to all those involved functions.
And what date-time range does the ICU backend support?
Actually time before BC and long after, virtually limited by range of double.
Good, but needs to be mentioned in the doc.
or a "class bigint" example which uses the Boost.Locale number formatting info for its I/O. Bonus points if they fallback to std facets if corresponding Boost.Locale info is missing from the stream!
Actually it is not quite possible, at least with ICU backend as it supports only double, int32 and int64 types, so it wouldn't be possible to do it.
I'd consider it a severe limitation if that is really impossible. Can't I print my arbitrary big integers in the properly localized way? ICU obviously knows the rules for formatting numbers in the locales it supports, and uses that info to format double, int32, int64. Isn't there a way to query that necessary info from ICU to format numbers outside of it?
It can be done somehow for non-ICU backends that provide std::numpunct,
Are you saying here that a std::numpunct-like facet can't be implemented with the ICU backend?
(it is not enough for ICU case as it is too primitive to handle all required options)
Yes, I understand that std::numpunct interface is not good enough for all locales. What I would like to see in Boost.Locale is a facet much like numpunct but such that it is correct for all locales, and uses ICU to achieve that when that backend is enabled. Of course if that is possible at all?
Thanks, Very good points. Artyom
Thanks, Gevorg

From: Gevorg Voskanyan <v_gevorg@yahoo.com>
or a "class bigint" example which uses the Boost.Locale number formatting info for its I/O. Bonus points if they fallback to std facets if corresponding Boost.Locale info is missing from the stream!
Actually it is not quite possible, at least with ICU backend as it supports only double, int32 and int64 types, so it wouldn't be possible to do it.
I'd consider it a severe limitation if that is really impossible. Can't I print
my arbitrary big integers in the properly localized way? ICU obviously knows the
rules for formatting numbers in the locales it supports, and uses that info to
format double, int32, int64. Isn't there a way to query that necessary info from
ICU to format numbers outside of it?
It can be done somehow for non-ICU backends that provide std::numpunct,
Are you saying here that a std::numpunct-like facet can't be implemented with
the ICU backend?
(it is not enough for ICU case as it is too primitive to handle all required options)
Yes, I understand that std::numpunct interface is not good enough for all locales. What I would like to see in Boost.Locale is a facet much like numpunct
but such that it is correct for all locales, and uses ICU to achieve that when
that backend is enabled. Of course if that is possible at all?
The biggest problem is that ICU has rule based number formatting and unless you parse rules specifically you don't get what you need. Basically it is not obvious to get the separator rules and other things. So in order to do it it requires re-implement icu::DecimalFormat. Artyom

Artyom wrote:
The biggest problem is that ICU has rule based number formatting and unless you parse rules specifically you don't get what you need.
Basically it is not obvious to get the separator rules and other things.
So in order to do it it requires re-implement icu::DecimalFormat.
Artyom
Hmm, I see. So that's possible to do with ICU but not easy. Thanks, that is a useful information for me. Best Regards, Gevorg
participants (2)
-
Artyom
-
Gevorg Voskanyan