Re: Lexical cast suggestion

Sorry for not keeping the thread. Got some strange errors from when using followup in gmane
I think that it's not such a big problem. If you want to use locales, then your application should start with:
std::locale::global(std::locale(""));
after that, I think lexical_cast will use the default user's locale. Or you can replace "" above with something else that your application wants. The only problem is that it's not possible to use several locales in one application. For program_options, it's not very important.
Why do you need to specify locales for call to 'lexical_cast'?
There are two different cases here. The first one is when you are communicating with the user (e.g. via a GUI). You will then ofcourse use the system locale to format output and parse input. Since c++ defaults to the "C" locale you need to add the line you mention above to get streams, sorting etc to work properly. But, it is also common to do things in an application that doesn't follow the system globale like parsing and writing xml or csv files, generating SQL statements etc. (program options might fall into theis category as well) For the two cases above you need to use different locales. Since lexical_cast doesn't allow you to specify which locale to use it can only be used for one of the cases above. The current implementation uses the global locale which basicly means that it should only be used for communication with the user. Since there seem to be a strong opinion against specifiying locale for lexical_cast I'm hoping that the documentation is updated to warn about lexical_cast for other things than communicating with the user. I learned it myself the hard way. When it comes to program options the author has decided to use lexical_cast and the global locale to verify input. It might be a good choice but it means that you can't move configuration files between computers with different locales. Better documentation might be sufficient here as well but there might also be other problems (like space as thousand separators on integers).

Martin Adrian wrote:
Why do you need to specify locales for call to 'lexical_cast'? .... But, it is also common to do things in an application that doesn't follow the system globale like parsing and writing xml or csv files, generating SQL statements etc. (program options might fall into theis category as well)
I think in many cases only encoding (utf8,koi8-r, etc) of external sources matters, for example email headers do not specify locale, only encoding. But anyway, encoding is part of locale, too.
When it comes to program options the author
To clarify, "the author" is I.
has decided to use lexical_cast and the global locale to verify input. It might be a good choice but it means that you can't move configuration files between computers with different locales.
It's already the case, because config file do not contain the information about the used locale. Further, even if it did, the set of locales on different computers is not the same, and on different OS even the naming may be different, so moving configuration files is hard already. This might mean I need to set classic locale for parsing, to make sure config files are really portable. So far, I'm not sure how lexical_cast (or other function) can support locales. The way you suggest -- additional parameter, if fine, but then you need to pass the locale parameter via several levels of function calls: for program_options it's po::store, value_semantic::parse, po::validator and then finally lexical_cast. That's not convenient, so probably global "lexical_cast_locale" is better. But for MT environment you need to store that global state in TSS. - Volodya

It's already the case, because config file do not contain the information about the used locale. Further, even if it did, the set of locales on different computers is not the same, and on different OS even the naming may be different, so moving configuration files is hard already. This might mean I need to set classic locale for parsing, to make sure config files are really portable.
I am not familiar with the code layout for program options but with a quick look it seems like you can do it the the same way as in the streams? Add a locale reference member to the value_semantics class and an imbue method to the options_description class which specifies the default. The user can then specify exactly how he wants the options: max portablity: desc.imbue(std::locale::classic()) localized: desc.imbue(std::locale()) // should be default desc.add_options()...
That's not convenient, so probably global "lexical_cast_locale" is better. But for MT environment you need to store that global state in TSS.
The global locale can already be used so there is no need for another global locale. const std::locale old = std::locale::global(std::locale::classic()); double x = lexical_cast<double>(str); std::locale::global(old);

Martin wrote:
It's already the case, because config file do not contain the information about the used locale. Further, even if it did, the set of locales on different computers is not the same, and on different OS even the naming may be different, so moving configuration files is hard already. This might mean I need to set classic locale for parsing, to make sure config files are really portable.
I am not familiar with the code layout for program options but with a quick look it seems like you can do it the the same way as in the streams? Add a locale reference member to the value_semantics class and an imbue method to the options_description class which specifies the default.
So every library which uses lexical_cast would have to provide the 'imbue' and store the locale everywhere? I think per-thread global variable is better.
That's not convenient, so probably global "lexical_cast_locale" is better. But for MT environment you need to store that global state in TSS.
The global locale can already be used so there is no need for another global locale.
Nope, there's only one global locale, it's not per-thread. If it's per-thread, you can do std::locale::tss_global(std::locale("koi8-r")); // do some work with program_options std::locale::tss_global(std::locale(""));
const std::locale old = std::locale::global(std::locale::classic()); double x = lexical_cast<double>(str); std::locale::global(old);
But that does not prevent other threads to change the locale again one CPU tick before you you call lexical_cast. - Volodya

So every library which uses lexical_cast would have to provide the 'imbue' and store the locale everywhere? I think per-thread global variable is better.
I think you missed my point. All operations that involves string handling (sorting, parsing etc) need to do that within a certain context which is called locale. However, all string handling desen't necessary work with the same locale. That is why the STL streams, string_algo etc all allow you to specify the locale to use per instance or function. If you introduce another global (or per-thread global) variable it will not solve anything since lexical_cast might be used in different libraries where I want to use different locales. A (thread-) global variable calld program_options_locale could work but it would be just as easy to make it a class-member. You don't allow differnt locales in your library and neither does lecxical_cast so they work nice together. But if you want to support different locales in your library you can't use lexical_cast and that is my whole point with this discussion. Lexical_cast is supposed to make a common task easy and reduce the risk of errors but instead it gives you a false sense of security and motivates others to also ignore locale support.

Martin wrote:
So every library which uses lexical_cast would have to provide the 'imbue' and store the locale everywhere? I think per-thread global variable is better.
I think you missed my point.
Yes, and you've partly missed my ;-)
All operations that involves string handling (sorting, parsing etc) need to do that within a certain context which is called locale. However, all string handling desen't necessary work with the same locale. That is why the STL streams, string_algo etc all allow you to specify the locale to use per instance or function.
If you introduce another global (or per-thread global) variable it will not solve anything since lexical_cast might be used in different libraries where I want to use different locales.
The point of mine that you've missed is that if you're using one library at a time, then you can use per-tss locale so that while you're using one library, the locale is right. The points of yours that I've missed is that you might really want to use two libraries at a time. For example, program_options might have on-demand LDAP source with different locales, so when user tries to get an option, parsing should be done with the right locale, while the user has no idea which one.
A (thread-) global variable calld program_options_locale could work but it would be just as easy to make it a class-member.
You don't allow differnt locales in your library and neither does lecxical_cast so they work nice together. But if you want to support different locales in your library you can't use lexical_cast and that is my whole point with this discussion.
Agreed.
Lexical_cast is supposed to make a common task easy and reduce the risk of errors but instead it gives you a false sense of security and motivates others to also ignore locale support.
:-( Ultimately, I might drop the use of lexical_cast. - Volodya

Vladimir Prus wrote:
So every library which uses lexical_cast would have to provide the 'imbue' and store the locale everywhere? I think per-thread global variable is better.
Not if you want the libraries to use different locales.
Nope, there's only one global locale, it's not per-thread.
This is unspecified. Some implementations maintain a per-thread locale.
participants (4)
-
Martin
-
Martin Adrian
-
Peter Dimov
-
Vladimir Prus