
On 30.12.2011 14:48, Beman Dawes wrote:
Class path locale initialization has suffered from a data race for several releases. See https://svn.boost.org/trac/boost/ticket/6320 for an example of code that suffers as a result.
The problem was introduced when locale initialization was changed from namespace scope initialization to function scope initialization. For Windows and Mac OS X, the fix is simply to change back to namespace scope initialization.
For non-BSD based POSIX systems such as Linux, the problem is more complex. These system need std::locale(""), "the locale-specific native environment". Considerations:
* std::locale("") will throw if environmental variables are configured incorrectly. For example, setting LANG=foo on my Ubuntu system causes std::locale("") to throw.
* std::locale("") is only needed if conversions between wide and narrow character paths occur in the program, so it would be unfortunate to have programs throw that don't actually do any such conversion.
* With GCC, std::locale("") at namespace scope will throw before main() has started! That prevents catching the exception in the user code, and was what led to moving the initialization to a function scope static. Initialization as a function scope static also meant that the exception only occurred if user code actually performed wide - narrow conversions.
I can see two possible fixes:
(A) Use function scope locale initialization, using boost/detail/lightweight_mutex.hpp to prevent data races.
(B) Use namespace scope locale initialization, defaulting the codecvt facet to UTF-8 if std::locale("") throws.
The advantage of (B) is that path always initializes without throwing, and that's what users seem to expect. The initialization is correct for all those whose environments are configured correctly, and for those uses who want UTF-8 even if their environments are misconfigured. The POSIX users who prefer an exception on a misconfigured environment can always add a std::locale("") at the start of main().
The problem with solution (B) is IMHO not that it lies, but that it /covers up/ a problem. The problem -- misconfiguration -- is still there but the user is made unaware of it. That's ungood. So I would favor (A). The problem with that is then efficiency, or perceived inefficiency. But so what. I say, go for correctness, and don't fret about the nano-efficiency. It could be different if the question was about some new clean thing, then it would warrant some redesign (mutable globals in the age of multi-processing isn't that bright an idea, really). But for just supporting the old unclean stuff -- don't fret about the nano-efficiency. Cheers, - Alf