[chrono] i/o to locale or not to locale

Hi all, currently Chrono provides a limited localization interface for duration and time_point I/O that is not satisfactory. The current implementation Boost.Chrono V1.2 (Boost 1.48) is based on the alternative A1 (see below) in which the hour,minute,second and the epoch can be localized. This could give the impression that chrono i/o is localized, but as Aryton as signaled it is not enough. I'm working on redesigning the chrono i/o in order to make a proposal with Howard Hinnant to the C++ committee. Two main alternatives A1- follow the formatting of recognized international body as ISO. A2- Add localization of specific units and epoch to the preceding format. It seems that the localized alternative seems to have less chances to be adopted, as no other library in the standard is providing these kind of facilities, and locale message facility is not completely satisfying. Things could change if we had a concrete proposal for improving the standard locale library (in which Aryton seems to be working IUC). More detail on the two alternatives: A1- Provide formatted i/o following the ISO 80000 http://en.wikipedia.org/wiki/ISO/IEC_80000, http://en.wikipedia.org/wiki/ISO_31-1, The format for duration could be <value> <unit> With unit been day, hour, minute, second, decisecond, centisecond, ..., decasecond, hectosecond, ... Note that the unit is always in singular form and in English. Next follows several valid inputs for duration 3 day 9 hour 1 second 0 second -1 second 3.77 millisecond 34 [1/30]second time_points could be formatted as <duration> <since epoch> where <since epoch> is a string that identifies the starting point of the epoch's clock, e.g. since boot, since process startup, .... The string associated to each clock are fixed and configurable specializing the trait clock_string Next follows several valid inputs for time_point. 123456 second since boot A1.a We can also consider ISO 8601. IMO, an acceptable format for durations could be a variation of the extended format independently of the date. <hours>:<minutes>:<seconds>[.fractional_seconds] Note that <hours> here is not limited to 24, and gives the total number of hours. A1.b An additional alternative could be to represent a duration as following a variant of the period representation: P[n]Y[n]M[n]DT[n]H[n]M[n]S in which the year and month are represented by days. E.g. P[n]DT[n]H[n]M[n]S[n]F P1234DT12H30M5S555F A2 - Make durations and time point i/o locale dependent. Based on the previous alternative, add the possibility to add specific locale facets that could format a duration following the pattern <value> <unit> <unit> <value> <hours><:><minutes><:><seconds>[<.>fractional_seconds] Here <value> and <unit> will be formatted following the specific locale and in this case the unit could have specific plural forms. For example in French the user could specialize the facets so that she could get 1 jour 3 jours 9 heures 1 seconde 0 secondes -1 seconde 3.77 milliseconds 34 [1/30] de seconde and in Spanish 1 segundo 0 segundo -1 segundo Add the possibility to add specific locale facets that could format a time_point following the pattern <duration> <epoch> <epoch> <duration> system_clock::time_point is a special case that should be formatted as a localized date, relative to the epoch 1900/01/01. A3 - Add on top of A2 some _byname facets. The fist alternative is in line with the money standard locale category that uses standard symbols, as $ or USD. It has the drawback that the <value> could be localized while the <unit> is not, e.g. 1,234 second Note the use of ','. The third alternative is IMO closer to the user expectations, but needs a better locale messages/by_name facet. The second alternative is a pragmatic one that left the possibility to build A3 when a satisfying messages/by_name facet is available. Boost.Chrono 2.0 in the trunk is a prototype of A2, that doesn't provides facets by locale name, only a default English implementation. That means that for the time been, the user needs to imbue a locale with her specific facet to get the expected localization (See french.cpp example in the SVN repository) What the Boost community think about this? Towards which alternative should Boost.Chrono move? Should Boost.Chrono wait until a concrete standard proposal is accepted? Thanks in advance, Vicente

----- Original Message -----
From: Vicente J. Botet Escriba <vicente.botet@wanadoo.fr> To: boost@lists.boost.org Cc: Sent: Friday, November 11, 2011 8:13 PM Subject: [boost] [chrono] i/o to locale or not to locale
Hi all,
currently Chrono provides a limited localization interface for duration and time_point I/O that is not satisfactory. The current implementation Boost.Chrono V1.2 (Boost 1.48) is based on the alternative A1 (see below) in which the hour,minute,second and the epoch can be localized. This could give the impression that chrono i/o is localized, but as Aryton as signaled it is not enough.
I would suggest you to do a little study about two points: 1. Feasibility of localization (do you have data) 2. Correctness of localization do you have enough information. 1. Feasibility of localization (do you have data) ------------------------------------------------- I'd suggest to take a look on CLDR and what data it provides. ICU supports this kind of measure. For example: http://icu-project.org/apiref/icu4c/classTimeUnitFormat.html And so CLDR. It provides, years, months, days, weeks, hours, minutes and seconds. However it **does not** provide sub second ranges (ms, us, ns) and so on. Generally all provided formatting is just a simple plural form formatting and you can use it. So I suggest to start from there and look on them as reference. As probably they do the right thing (mostly). So if your localization support would do at least what ICU does you are probably in the right direction. 2. Correctness of localization do you have enough information. -------------------------------------------------------------- The second is little bit more problematic. AFAIR you represent a time as a distance in time... So for example 30 days = 30 * 24 * 3600 seconds in months units can be - 1 month and 2 days in non leap in February - 1 month and 1 days in leap year in February - 1 month in March - 30 days in April. So depending on specific period even if you display a single unit it may depend on specific time point. More then that it can be for example 1 month and 1 hour if daylight savings time had changed during this period.... So... the question is what are you trying to do. Can you please explain it very explicitly: - What is your measure? - What are you trying to display? Can you describe this please.
I'm working on redesigning the chrono i/o in order to make a proposal with Howard Hinnant to the C++ committee. Two main alternatives
A1- follow the formatting of recognized international body as ISO. A2- Add localization of specific units and epoch to the preceding format.
What about 3rd option - display it as a number? Just as is and let user localize it on his own? std::cout << boost::format("%1% us") % period << std::endl; Do you really need to print a unit or may be the developer should add the unit?
A2 - Make durations and time point i/o locale dependent.
Based on the previous alternative, add the possibility to add specific locale facets that could format a duration following the pattern
<value> <unit> <unit> <value> <hours><:><minutes><:><seconds>[<.>fractional_seconds]
You make it too specific. Order is generally not good enough. It should be pattern not "rule" 1 second -> 1秒 in Chinese (no space)
Here <value> and <unit> will be formatted following the specific locale and in this case the unit could have specific plural forms.
For example in French the user could specialize the facets so that she could get
1 jour 3 jours 9 heures 1 seconde 0 secondes -1 seconde 3.77 milliseconds 34 [1/30] de seconde
and in Spanish
1 segundo 0 segundo -1 segundo
Add the possibility to add specific locale facets that could format a time_point following the pattern
<duration> <epoch> <epoch> <duration>
system_clock::time_point is a special case that should be formatted as a localized date, relative to the epoch 1900/01/01.
There already is time/date formatting even in existing standard (std::time_put facet) that BTW works fine in many compilers.
A3 - Add on top of A2 some _byname facets.
The fist alternative is in line with the money standard locale category that uses standard symbols, as $ or USD. It has the drawback that the <value> could be localized while the <unit> is not, e.g.
1,234 second
Note the use of ','.
Please note current "money locale" is quite badly designed and quite broken. Don't look at current facets design. Most of them BROKEN. Take a look on CLDR as reference.
The third alternative is IMO closer to the user expectations, but needs a better locale messages/by_name facet.
The second alternative is a pragmatic one that left the possibility to build A3 when a satisfying messages/by_name facet is available.
there is about ~50 languages and for each of them you need about 8 pattens x about 2-3 plural forms - so you'll have few KB or raw data. You can really put this in the code and use it as is. So you don't have to wait for better messages proposal. Just to it right.
Boost.Chrono 2.0 in the trunk is a prototype of A2, that doesn't provides facets by locale name, only a default English implementation. That means that for the time been, the user needs to imbue a locale with her specific facet to get the expected localization (See french.cpp example in the SVN repository)
What the Boost community think about this? Towards which alternative should Boost.Chrono move? Should Boost.Chrono wait until a concrete standard proposal is accepted?
Thanks in advance, Vicente
If you want to provide something please base it on CLDR data and patterns don't try to invent your own methods. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Le 12/11/11 12:01, Artyom Beilis a écrit :
----- Original Message -----
From: Vicente J. Botet Escriba<vicente.botet@wanadoo.fr> To: boost@lists.boost.org Cc: Sent: Friday, November 11, 2011 8:13 PM Subject: [boost] [chrono] i/o to locale or not to locale
Hi all,
currently Chrono provides a limited localization interface for duration and time_point I/O that is not satisfactory. The current implementation Boost.Chrono V1.2 (Boost 1.48) is based on the alternative A1 (see below) in which the hour,minute,second and the epoch can be localized. This could give the impression that chrono i/o is localized, but as Aryton as signaled it is not enough.
I would suggest you to do a little study about two points:
1. Feasibility of localization (do you have data) 2. Correctness of localization do you have enough information.
1. Feasibility of localization (do you have data) -------------------------------------------------
I'd suggest to take a look on CLDR and what data it provides. ICU supports this kind of measure. For example:
http://icu-project.org/apiref/icu4c/classTimeUnitFormat.html
And so CLDR. It provides, years, months, days, weeks, hours, minutes and seconds. However it **does not** provide sub second ranges (ms, us, ns) and so on.
Generally all provided formatting is just a simple plural form formatting and you can use it.
So I suggest to start from there and look on them as reference. As probably they do the right thing (mostly).
So if your localization support would do at least what ICU does you are probably in the right direction.
Thanks for these pointers. If the Boost community think that locale duration and time_point is the way to go I will take a look, for sure.
2. Correctness of localization do you have enough information. --------------------------------------------------------------
The second is little bit more problematic.
AFAIR you represent a time as a distance in time... So for example
30 days = 30 * 24 * 3600 seconds in months units can be
- 1 month and 2 days in non leap in February - 1 month and 1 days in leap year in February - 1 month in March - 30 days in April. Note that Boost.Chrono is not a Date library, so independent of the calendar. Weeks, months, years are out of the scope.
So depending on specific period even if you display a single unit it may depend on specific time point.
More then that it can be for example 1 month and 1 hour if daylight savings time had changed during this period....
So... the question is what are you trying to do. Can you please explain it very explicitly:
- What is your measure? - What are you trying to display?
Can you describe this please. Chrono clocks are related to an epoch associated to a calendar, with daylight savings and things like that.
Clock in chrono can measure the time a process is running, the host is up, the duration spent by the CPU between two pints in a program, ... I want just to be able to provide a I/O that satisfy the user expectations. What I'm expecting in this thread is to know the user expectations after the comments of the Boost community.
I'm working on redesigning the chrono i/o in order to make a proposal with Howard Hinnant to the C++ committee. Two main alternatives
A1- follow the formatting of recognized international body as ISO. A2- Add localization of specific units and epoch to the preceding format.
What about 3rd option - display it as a number? Just as is and let user localize it on his own?
std::cout<< boost::format("%1% us") % period<< std::endl;
Do you really need to print a unit or may be the developer should add the unit?
Duration is a quantity, and as such it merits IMO to have the associated units. Of course, the simple thing is to don't provide I/O at all. However I was expecting the user to get something more than a number, but maybe I'm wrong. I guess that if the standard provided a money cals the i/O of this class will use the money facets, don't you think?
A2 - Make durations and time point i/o locale dependent.
Based on the previous alternative, add the possibility to add specific locale facets that could format a duration following the pattern
<value> <unit> <unit> <value> <hours><:><minutes><:><seconds>[<.>fractional_seconds]
You make it too specific. Order is generally not good enough. It should be pattern not "rule"
1 second -> 1秒 in Chinese (no space) You are right. The pattern will contain also <none> <space> as it is done for the moneypunct facet.
Here<value> and<unit> will be formatted following the specific locale and in this case the unit could have specific plural forms.
For example in French the user could specialize the facets so that she could get
1 jour 3 jours 9 heures 1 seconde 0 secondes -1 seconde 3.77 milliseconds 34 [1/30] de seconde
and in Spanish
1 segundo 0 segundo -1 segundo
Add the possibility to add specific locale facets that could format a time_point following the pattern
<duration> <epoch> <epoch> <duration>
system_clock::time_point is a special case that should be formatted as a localized date, relative to the epoch 1900/01/01.
There already is time/date formatting even in existing standard (std::time_put facet) that BTW works fine in many compilers.
I guess that you know I know that. Boost.Chrono V2 uses time_get/time_put for system_clock::time_point.
A3 - Add on top of A2 some _byname facets.
The fist alternative is in line with the money standard locale category that uses standard symbols, as $ or USD. It has the drawback that the<value> could be localized while the <unit> is not, e.g.
1,234 second
Note the use of ','.
Please note current "money locale" is quite badly designed and quite broken. Don't look at current facets design. Most of them BROKEN. Take a look on CLDR as reference.
Why it is bad designed and what is broken with the money category?
The third alternative is IMO closer to the user expectations, but needs a better locale messages/by_name facet.
The second alternative is a pragmatic one that left the possibility to build A3 when a satisfying messages/by_name facet is available. there is about ~50 languages and for each of them you need about 8 pattens x about 2-3 plural forms - so you'll have few KB or raw data. You can really put this in the code and use it as is. So you don't have to wait for better messages proposal. Just to it right.
Maybe you are right, and the dependency to a messages facet with external catalogs can and should be avoided. The design that I have used in V2 allows both approaches.
Boost.Chrono 2.0 in the trunk is a prototype of A2, that doesn't provides facets by locale name, only a default English implementation. That means that for the time been, the user needs to imbue a locale with her specific facet to get the expected localization (See french.cpp example in the SVN repository)
What the Boost community think about this? Towards which alternative should Boost.Chrono move? Should Boost.Chrono wait until a concrete standard proposal is accepted?
Thanks in advance, Vicente
If you want to provide something please base it on CLDR data and patterns don't try to invent your own methods.
For the time been I try to use whatever is in the standard, and for the missing pieces, I try to follow the same design. If the locale/facet design is bad, it should be changed, but it is clear for me that it is not reasonable that a single library go in another direction. I know that you have another view of how locale should work. I expect that you will made a proposal to the standard committee so a better stander locale will be available to all the C++ community. Best, Vicente

----- Original Message -----
From: Vicente J. Botet Escriba <vicente.botet@wanadoo.fr>
[snip] Do you really need to print a unit or may be the developer should add the unit? Duration is a quantity, and as such it merits IMO to have the associated units. Of course, the simple thing is to don't provide I/O at all. However I was expecting the user to get something more than a number, but maybe I'm wrong.
I see your point.
You make it too specific. Order is generally not good enough. It should be pattern not "rule"
1 second -> 1秒 in Chinese (no space) You are right. The pattern will contain also <none> <space> as it is
done for the moneypunct facet.
No, you are still missing my point. What if it is NBSP? What if in some locale it requires some symbol before like "(" in some money? What I'm trying to say that you limit your self to some small set of predefined cases. This is correct for some but may be not correct for other cases. The correct is to use string pattern. This is how CLDR defines localization parameters. And CLDR is **The** reference. See my further notes about facets.
1,234 second
Note the use of ','.
Please note current "money locale" is quite badly designed and quite broken. Don't look at current facets design. Most of them BROKEN. Take a look on CLDR as reference. Why it is bad designed and what is broken with the money category?
First number format and monetary: 1. char can't describe separator (it may be multi-char sequence like UTF-8) so for example you can't define NBSP in UTF-8 in current facets design. It even had lead to bugs where Sun Studio and GCC created invalid UTF-8 in some locales! 2. Does not allow to define digit format (not all cultures use ASCII digits) 3. It does not provide rounding. For example in some countries minimal unit is not "1 cent" but "5 cents" However there are much more. Bottom line Use Patterns as string whenever you can. If you want to do the **right thing** refer to CLDR. Don't refer to current facets. They are linguistically broken and (badly) designed after very old POSIX localization API.
If you want to provide something please base it on CLDR data and patterns don't try to invent your own methods. For the time been I try to use whatever is in the standard, and for the missing pieces, I try to follow the same design. If the locale/facet design is bad, it should be changed, but it is clear for me that it is not reasonable that a single library go in another direction.
I know that you have another view of how locale should work. I expect that you will made a proposal to the standard committee so a better stander locale will be available to all the C++ community.
Best, Vicente
My point if you do something new then do it right don't introduce old BAD designs. Make the design as flexible as possible. Best, Artyom

Le 12/11/11 16:58, Artyom Beilis a écrit :
----- Original Message -----
From: Vicente J. Botet Escriba<vicente.botet@wanadoo.fr>
You make it too specific. Order is generally not good enough. It should be pattern not "rule"
1 second -> 1秒 in Chinese (no space) You are right. The pattern will contain also<none> <space> as it is
done for the moneypunct facet. No, you are still missing my point. What if it is NBSP? What if in some locale it requires some symbol before like "(" in some money?
What I'm trying to say that you limit your self to some small set of predefined cases. This is correct for some but may be not correct for other cases.
The correct is to use string pattern. This is how CLDR defines localization parameters. And CLDR is **The** reference.
The implementation uses for the moment a string pattern. I have not added yet byname facets but they can be added on top of the current facets. The documentation is on going (see .qbk file) Please could you take a look on trunk to see what could be wrong? Any other interested in an evolution of chrono i/o? Best, Vicente

I have some deadlines so I'll be able to read it deeply next Saturday and I'll gladly do it. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/
________________________________ From: Vicente J. Botet Escriba <vicente.botet@wanadoo.fr> To: boost@lists.boost.org Sent: Saturday, December 10, 2011 11:22 AM Subject: Re: [boost] [chrono] i/o to locale or not to locale
Le 12/11/11 16:58, Artyom Beilis a écrit :
----- Original Message -----
From: Vicente J. Botet Escriba<vicente.botet@wanadoo.fr>
You make it too specific. Order is generally not good enough. It should be pattern not "rule"
1 second -> 1秒 in Chinese (no space) You are right. The pattern will contain also<none> <space> as it is
done for the moneypunct facet. No, you are still missing my point. What if it is NBSP? What if in some locale it requires some symbol before like "(" in some money?
What I'm trying to say that you limit your self to some small set of predefined cases. This is correct for some but may be not correct for other cases.
The correct is to use string pattern. This is how CLDR defines localization parameters. And CLDR is **The** reference.
The implementation uses for the moment a string pattern. I have not added yet byname facets but they can be added on top of the current facets. The documentation is on going (see .qbk file) Please could you take a look on trunk to see what could be wrong?
Any other interested in an evolution of chrono i/o?
Best, Vicente
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Artyom Beilis
-
Vicente J. Botet Escriba