[thread] Proposal for thread safe initialization of local static and global objects

The Problem: ============ Initialisation of local static objects is potentially not thread-safe. E.g.: class bar { public: bar() { i = 42; }; private: int i; }; void foo() { static bar abar; } The standard says (6.7/4): " ... Otherwise such an object is initialized the first time control passes through its declaration; ... If control re-enters the declaration (recursively) while the object is beeing initialzed, the behaviour is undefined." From this it follows that the constructor call poses a race condition, when a second thread is making a call to foo, while the first still is in the constructor of bar. Side Note: ========== My research was triggered by a bug in the spirit library: "boost/spirit/core/non_terminal/impl/grammar.ipp" Ref: http://lists.boost.org/Archives/boost/2006/04/102794.php Ref: http://sourceforge.net/tracker/index.php?func=detail&aid=1509978&group_id=7586&atid=107586 In a templated function template<typename DerivedT, typename ContextT, typename ScannerT> inline typename DerivedT::template definition<ScannerT> & get_definition(grammar<DerivedT, ContextT> const* self) { ... typedef grammar<DerivedT, ContextT> self_t; typedef impl::grammar_helper<self_t, DerivedT, ScannerT> helper_t; typedef typename helper_t::helper_weak_ptr_t ptr_t; static boost::thread_specific_ptr<ptr_t> tld_helper; if (!tld_helper.get()) tld_helper.reset(new ptr_t); ptr_t &helper = *tld_helper; ... } a local static thread_specific pointer is beeing used for thread safety. To make this really thread safe the pointer normally should be declared at global (namespace) scope. This is not possible in this case, since the pointer type is parameterized on a template parameter. Solution(s): ============ 1)-- The simplest, straight forward solution, would be, to guard the static declaration by a global mutex. I.e.: mutex::mx; template<C> void foo(C arg) { mutex::scoped_lock lk(mx); static bar<C> abar; lk.unlock(); ... } However this has some drawbacks: *) All templated (unrelated) functions share a single mutex. *) All function calls need to pass through a single point of synchronization, altough not necessary, once the object is initialized. 2)-- Make use of a statically initializeable mutex. template<C> void foo(C arg) { static_mutex mx = STATIC_MUTEX_INITIALIZER; mutex::scoped_lock lk(mx); static bar<C> abar; lk.unlock(); ... } While this removes the first of the above drawbacks, the second still applies. (A static_mutex can be found e.g. in boost/regex/pending/static_mutex.hpp) 3)-- Use a static pointer to the object and initialize on first use, by making use of boost::call_once. Based on this idea I wrote a little class wrapper that can be used as following: template<C> void foo(C arg) { static once_class<bar<C> > static_abar = ONCE_CLASS_INIT; bar<C>& abar(static_abar); ... abar.blah(); ... } Notes: Uses of this wrapper, not only cover the above case, but can be used to make any object statically initializeable. The only precondition is that it has a default ctor. In particular it makes it unnecessary to create static init versions of the various mutex types. I.e.: void foo() { static once_class<boost::mutex> mx = ONCE_CLASS_INIT; boost::mutex::scoped_lock lk(mx); .... } Or void foo() { static once_class<boost::recursive_mutex> mx = ONCE_CLASS_INIT; boost::mutex::scoped_lock lk(mx); .... } Conclusion: =========== Solution 3) does not suffer from the single point of synchronization drawback and can make any default constructible object statically initialzeable in a thread-safe manner. This of course can also be used to declare a (global) mutex that needs to be available from within global ctor's. I have attached a commented implementation of this class wrapper together with a sample usage. I would be glad to get feedback on this design, before adding it to the boost::thread library. Regards, Roland // Copyright (C) 2006 Roland Schwarz // // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) #ifndef ONCE_CLASS_HPP__ #define ONCE_CLASS_HPP__ #include <boost/thread/once.hpp> #include <boost/thread/mutex.hpp> #include <boost/thread/tss.hpp> #include <vector> #define ONCE_CLASS_INIT {BOOST_ONCE_INIT,0} // once_class _is_ an aggregate, so it can be statically // initialized. Since it also has no destructors, the // compiler need not schedule a call to it (which might // create a race condition). template<class A> struct once_class { // The conversion operator allows direct access to the // wrapped class, and constructs it on demand in a // thread safe manner. // N.B.: This does not make the class thread safe, // only construction is. You must supply your own // synchronization to make it thread safe. operator A&() { boost::call_once(&init_class, class_flag); // Using a thread global variable to pass arguments, // since call_once does not allow passing arguments // directly. //once_class_arg->reset(reinterpret_cast<void**>(&p)); arg->reset(&instance); boost::call_once(&init_instance,instance_flag); return *instance; }; boost::once_flag instance_flag; A* instance; private: static void init_instance() { //A** ppm = reinterpret_cast<A**>(once_class_arg->get()); A** ppa = arg->get(); *ppa = new A; boost::mutex::scoped_lock lk(*guard); instances->push_back(*ppa); }; static void init_class() { arg = new boost::thread_specific_ptr<A*>(no_cleanup); guard = new boost::mutex; instances = new std::vector<A*>; // schedule removal // NOTE: a once_class must not be used from any functions // that will run from an atexit handler! atexit(&cleanup); }; static void cleanup() { std::vector<A*>::size_type i; for (i=0; i<instances->size(); ++i) { delete (*instances)[i]; } delete instances; delete guard; delete arg; }; static void no_cleanup(A**) {}; static boost::once_flag class_flag; static boost::thread_specific_ptr<A*>* arg; static boost::mutex* guard; static std::vector<A*>* instances; }; // Making the following pointers, allows usage of // once_class from within "static" ctor calls. // (But I would not recommend such usage.) template<class A> boost::once_flag once_class<A>::class_flag = BOOST_ONCE_INIT; template<class A> boost::thread_specific_ptr<A*>* once_class<A>::arg = 0; template<class A> boost::mutex* once_class<A>::guard = 0; template<class A> std::vector<A*>* once_class<A>::instances = 0; #endif // ONCE_CLASS_HPP__ // Copyright (C) 2006 Roland Schwarz // // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) #include "once_class.hpp" #include <boost/thread/mutex.hpp> #include <boost/thread/thread.hpp> #include <iostream> using namespace std; class test { public: test() { boost::mutex::scoped_lock lk(m); cout << "constructor" << endl; }; ~test() { boost::mutex::scoped_lock lk(m); cout << "destructor" << endl; }; void operator()(){ boost::mutex::scoped_lock lk(m); cout << "called" << endl; }; private: boost::mutex m; }; void foo() { static once_class<test> static_bar = ONCE_CLASS_INIT; test& bar(static_bar); bar(); } int main(int argc, char *argv[]) { boost::thread th1(foo); boost::thread th2(foo); th1.join(); th2.join(); }

Roland Schwarz wrote:
Initialisation of local static objects is potentially not thread-safe.
From this it follows that the constructor call poses a race condition, when a second thread is making a call to foo, while the first still is in the constructor of bar.
indeed.
In a templated function
template<typename DerivedT, typename ContextT, typename ScannerT> inline typename DerivedT::template definition<ScannerT> & get_definition(grammar<DerivedT, ContextT> const* self) { ... typedef grammar<DerivedT, ContextT> self_t; typedef impl::grammar_helper<self_t, DerivedT, ScannerT> helper_t; typedef typename helper_t::helper_weak_ptr_t ptr_t;
static boost::thread_specific_ptr<ptr_t> tld_helper; if (!tld_helper.get()) tld_helper.reset(new ptr_t); ptr_t &helper = *tld_helper; ... }
a local static thread_specific pointer is beeing used for thread safety. To make this really thread safe the pointer normally should be declared at global (namespace) scope. This is not possible in this case, since the pointer type is parameterized on a template parameter.
What's wrong with simple: template <typename T> struct wrapper { static T data; }; template <typename T> T wrapper<T>::data; template<C> void foo(C arg) { bar<C>& abar = wrapper<bar<C> >::data; } ... which will trigger static (thread-safe) initialization. B.

Bronek Kozicki wrote:
What's wrong with simple:
template <typename T> struct wrapper { static T data; };
template <typename T> T wrapper<T>::data;
template<C> void foo(C arg) { bar<C>& abar = wrapper<bar<C> >::data; }
... which will trigger static (thread-safe) initialization.
Nothing. I like this very much! Funningly, I used this already under the hoods in my once_class without noticing its potential. I also vaguely remember having seen somethin similar from Slapeta who called the wrapper static_< >. Again without noticing that it is a potential solution to fixing the spirit bug. Some notes though: 1) Lazy initialization on first use is not possible with this approach. 2) It does not address the question of how to construct the static if it is being used from another static ctor run. (I know that this isn't recommended practice anyways, but a library implemntor might really need this behaviour.) Perhaps both wrappers should be available to give the user a choice? Would it be possible to wrap both under a common interface? 3) class_once also in my current implementation has a problem with exceptions, thrown by ctors. once_call as I found out shouldnt throw (from a comment of Bill Kempf). I am considering calling terminate in this case. Reasonable? Regards, Roland

Roland Schwarz wrote:
3) class_once also in my current implementation has a problem with exceptions, thrown by ctors. once_call as I found out shouldnt throw (from a comment of Bill Kempf). I am considering calling terminate in this case. Reasonable?
No. An exception thrown by the once function needs to be propagated outwards as-is, with the once flag unchanged.

Bronek Kozicki <brok@spamcop.net> writes:
What's wrong with simple:
template <typename T> struct wrapper { static T data; };
template <typename T> T wrapper<T>::data;
template<C> void foo(C arg) { bar<C>& abar = wrapper<bar<C> >::data; }
... which will trigger static (thread-safe) initialization.
Yes, but you only get one global instance for each type T, which defeats the point of making it a function-local static. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Yes, but you only get one global instance for each type T, which defeats the point of making it a function-local static.
Oops, I missed that one. You are correct. Thanks for having pointed this out! Roland

On 10/6/06, Roland Schwarz <roland.schwarz@chello.at> wrote:
Anthony Williams wrote:
Yes, but you only get one global instance for each type T, which defeats the point of making it a function-local static.
Oops, I missed that one. You are correct. Thanks for having pointed this out!
It doenst defeat the point when used in templated functions where the type depends on T, like the spirit case.
Roland
Best regards, -- Felipe Magno de Almeida

Felipe Magno de Almeida wrote:
It doenst defeat the point when used in templated functions where the type depends on T, like the spirit case.
If spirit implementors are careful enough to not accidentally reuse the type in another function template: yes. Roland

Bronek Kozicki wrote:
What's wrong with simple:
template <typename T> struct wrapper { static T data; };
template <typename T> T wrapper<T>::data;
template<C> void foo(C arg) { bar<C>& abar = wrapper<bar<C> >::data; }
... which will trigger static (thread-safe) initialization.
This doesn't solve the initialization order problem, the primary reason for using local statics in the first place.

Quoting Peter Dimov <pdimov@mmltd.net>:
... which will trigger static (thread-safe) initialization.
This doesn't solve the initialization order problem, the primary reason for using local statics in the first place.
yup; Sam Saariste pointed it out in private discussion. What about having two wrappers: 1. just like above, except for instead of direct data access I'd define inline static function data() returning reference, class name would be more meaningful (static_init_global) and it would accept additional two template parameters to help guarantee uniqueness : type and pointer to that type (users would use it eg. passing function type and address of that function) 2. another with identical signature (eg. two type parameters and a pointer, and inline static function data() returning reference to first type) called static_init_ordered that would implement singleton (presumably using double checked idiom on platforms where it works) The other one would guarantee initialization order at the extra performance cost and could also forward arguments to the constructor. Alternatively, one could use preprocessor and instead of delegating work to template, use some fancy macro with proper synchronization etc. B.

Roland, I've only scanned over your code - and I think this is a good idea, and a step in the right direction- but I did notice that you needed two calls to call_once? One of the gripes I've had about call_once was that it was rather a heavyweight call: is this still the case? Do you have any feeling for how efficient this approach is compared to say a statically initialised mutex (assuming that's what you want) ? Thanks, John.

John Maddock wrote:
... but I did notice that you needed two calls to call_once? One of the gripes I've had about call_once was that it was rather a heavyweight call: is this still the case?
Hmm, the once it has been running it does a simple if (compare_exchange(&flag, 1, 1) == 0) { } and returns. At least on windows I do not expect much overhead. Does this address your question? Also this is implementation specific, and might be optimized later. My main question was about interface and semanitics.
Do you have any feeling for how efficient this approach is compared to say a statically initialised mutex (assuming that's what you want) ?
If you mean boost/regex/pending/static_mutex.hpp, I would say it is the same. If you compare it to Broneks wrapper the comparison is not really appropriate, since that approach does initialization on application start up before main(). Roland

Roland Schwarz wrote:
John Maddock wrote:
... but I did notice that you needed two calls to call_once? One of the gripes I've had about call_once was that it was rather a heavyweight call: is this still the case?
Hmm, the once it has been running it does a simple if (compare_exchange(&flag, 1, 1) == 0) { }
OK, that's not so bad. Actually I think my main gripe is that compilers don't do this stuff automatically, which neither of us has much control over :-(
and returns. At least on windows I do not expect much overhead. Does this address your question? Also this is implementation specific, and might be optimized later. My main question was about interface and semanitics.
Understood, I'll try and take anothe look. John.

Roland Schwarz <roland.schwarz@chello.at> writes:
Initialisation of local static objects is potentially not thread-safe.
From this it follows that the constructor call poses a race condition, when a second thread is making a call to foo, while the first still is in the constructor of bar.
3)-- Use a static pointer to the object and initialize on first use, by making use of boost::call_once. Based on this idea I wrote a little class wrapper that can be used as following:
template<C> void foo(C arg) { static once_class<bar<C> > static_abar = ONCE_CLASS_INIT; bar<C>& abar(static_abar);
... abar.blah(); ...
}
Solution 3) does not suffer from the single point of synchronization drawback and can make any default constructible object statically initialzeable in a thread-safe manner. This of course can also be used to declare a (global) mutex that needs to be available from within global ctor's.
I have attached a commented implementation of this class wrapper together with a sample usage.
I would be glad to get feedback on this design, before adding it to the boost::thread library.
On Windows we don't need two calls to call_once, as my implementation allows passing an arbitrary functor, which can therefore have state (e.g. use bind to pass parameters). Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says: "The behavior of pthread_once() is undefined if once_control has automatic storage duration or is not initialized by PTHREAD_ONCE_INIT." So we need to document this restriction on the use of boost::once_flag, unless you can find a way round it. This restriction doesn't apply on Windows, at least not with my implementation. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
On Windows we don't need two calls to call_once, as my implementation allows passing an arbitrary functor, which can therefore have state (e.g. use bind to pass parameters).
Hmm, I remember there was a sound reason why Bill Kempf ommitted parameters deliberately. Not sure about this, I will need to try to find the respective discussion thread.
Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says:
Indeed I have shamelessly stolen the idea from looking at Bill Kempfs call_once implementation for pthreads. He is using this there also, altough I think, superfluously.
"The behavior of pthread_once() is undefined if once_control has automatic storage duration or is not initialized by PTHREAD_ONCE_INIT."
So we need to document this restriction on the use of boost::once_flag, unless you can find a way round it. This restriction doesn't apply on Windows, at least not with my implementation.
I suspect this is because you are relying on zero initialization, right? Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
On Windows we don't need two calls to call_once, as my implementation allows passing an arbitrary functor, which can therefore have state (e.g. use bind to pass parameters).
Hmm, I remember there was a sound reason why Bill Kempf ommitted parameters deliberately. Not sure about this, I will need to try to find the respective discussion thread.
They are hard to do on POSIX, since pthread_once doesn't allow it.
Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says:
Indeed I have shamelessly stolen the idea from looking at Bill Kempfs call_once implementation for pthreads. He is using this there also, altough I think, superfluously.
As it stands, this is superfluous, and doesn't actually get round the issue.
"The behavior of pthread_once() is undefined if once_control has automatic storage duration or is not initialized by PTHREAD_ONCE_INIT."
So we need to document this restriction on the use of boost::once_flag, unless you can find a way round it. This restriction doesn't apply on Windows, at least not with my implementation.
I suspect this is because you are relying on zero initialization, right?
Yes, my once_flag is just a long, BOOST_ONCE_INIT is 0. That works fine at any scope/storage duration. Obviously, if you exit the enclosing scope, so the once var is destroyed, and then reenter the scope, so it is reconstructed, then the once call will be made again. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Roland Schwarz wrote:
Anthony Williams wrote:
On Windows we don't need two calls to call_once, as my implementation allows passing an arbitrary functor, which can therefore have state (e.g. use bind to pass parameters).
Hmm, I remember there was a sound reason why Bill Kempf ommitted parameters deliberately. Not sure about this, I will need to try to find the respective discussion thread.
No, there is none. He just followed pthread_once without realizing (at first) that he won't be able to use it directly on all platforms.
Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says:
Indeed I have shamelessly stolen the idea from looking at Bill Kempfs call_once implementation for pthreads. He is using this there also, altough I think, superfluously.
Not superfluously. pthread_once accepts an extern "C" function. Test on a platform where extern "C" functions are incompatible with extern "C++" functions (Comeau C++) and you'll see why the second call is needed. Incidentally, that second call also enables arbitrary function objects in addition to ordinary C++ functions. :-)

Anthony Williams wrote:
On Windows we don't need two calls to call_once, as my implementation allows passing an arbitrary functor, which can therefore have state (e.g. use bind to pass parameters).
I have been rethinking this. But it won't help. To solve the initialization order problem we need the first call_one, to initialize the list of destructor calls. The second will initialize the instance. A call_once with parameters would avoid solely the "reset" call to pass the instance pointer. Note: The list of destructors is essential. Initially I tried to embed a destructor within the class_once. While this is a valid aggregate that can be statically initialized it still suffers from a race condition. The compiler somehow has to emit code that will schedule a call to the destructor of once_class. Obviously this call can be scheduled when the code passes to the declaration of the object. (I hit this in the MSVC case.) So the only remedy I can see is to use aggregates without any destructors and maintain a separate list that will be stepped at atexit time. With respect to your original suggestion, I found the post where Bill Kempf already mentioned the idea: http://lists.boost.org/Archives/boost/2002/08/33122.php
Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says:
"The behavior of pthread_once() is undefined if once_control has automatic storage duration or is not initialized by PTHREAD_ONCE_INIT."
So we need to document this restriction on the use of boost::once_flag, unless you can find a way round it. This restriction doesn't apply on Windows, at least not with my implementation.
On the second reading I am not any more sure which "restriction" you are talking about for Windows: 1) may not have automatic storage duration 2) must be explicitly initialized by PTHREAD_ONCE_INIT Please don't get me wrong, but I would prefer having a statically initializeable mutex, over having call_once. I'd rather like to purge call_once entirely from the interface. I suggested the wrapper primarily to make existing mutex types statically initializeable and accidentally noted that this can be extended to any default constuctible class. My intent was to avoid the need to come up with distinct mutex classes (which have to be written and maintained separately) for that purpose. I also do see absolutely no problem in requiring the user to explicitly initialize. This also underpins its static nature. Roland

Roland Schwarz wrote:
I have been rethinking this. But it won't help.
On third reading ;-) I can see, that indeed only one call_once will do. The second could be moved into the init_instance if parameters are available. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Roland Schwarz wrote:
I have been rethinking this. But it won't help.
On third reading ;-) I can see, that indeed only one call_once will do. The second could be moved into the init_instance if parameters are available.
Yes. I was just about to point that out. I don't like the atexit call for registering the destructors, but I can't think of anything better just now. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
I don't like the atexit call for registering the destructors,
Why? You can't rely on the presence of any globally contructed objects during atexit. So why should once_class object be different? Also the MSVC compiler is doing just that for local static objects dtors: scheduling a call to atexit. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
I don't like the atexit call for registering the destructors,
Why? You can't rely on the presence of any globally contructed objects during atexit. So why should once_class object be different?
Also the MSVC compiler is doing just that for local static objects dtors: scheduling a call to atexit.
The MSVC compiler will schedule a call for each static object, so they are destroyed in the correct reverse order, and correctly interleaved with other calls to atexit, as per 3.6.3p3 of the C++ Standard. Having a single call that destroys all the objects means they are not interleaved as one might expect. One could have a call for each object that just destroyed the last object constructed, and destroyed the list once it was empty; that would be better. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
The MSVC compiler will schedule a call for each static object, so they are destroyed in the correct reverse order, and correctly interleaved with other calls to atexit, as per 3.6.3p3 of the C++ Standard.
Strictly speaking we do not have static objects but dynamically created objects on the heap. So we simply cannot be not conformant to the standard. But I think your intent is: since we make the object appear like they are static, they should behave similar to real static objects. Correct? Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
The MSVC compiler will schedule a call for each static object, so they are destroyed in the correct reverse order, and correctly interleaved with other calls to atexit, as per 3.6.3p3 of the C++ Standard.
Strictly speaking we do not have static objects but dynamically created objects on the heap. So we simply cannot be not conformant to the standard.
But I think your intent is: since we make the object appear like they are static, they should behave similar to real static objects. Correct?
Yes, that's what I meant. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
But I think your intent is: since we make the object appear like they are static, they should behave similar to real static objects. Correct?
Yes, that's what I meant.
Umm, than as I would say in germanish "we get into devils kitchen". 1) We will need to assume that the call to atexit is thread safe 2) We will need to be able to have the compiler somehow generate a distinct function for every separate instance. And this only when the declaration has been done with the static keyword in front, else we would need to destroy when the object gets out of focus. And even if we could manage this, the whole concept of destruction order of static objects seems void in the presence of threads anyways. You can see this e.g. from the fact that local static is an illegal construct when seen from the perspective of the standard. Even when protected by a mutex the destruction order is arbitrary anyways, so I tend to say that even such a usage is very problematic. (Put aside the question whether atexit calls are thread safe.) So as I see it the only way out is to treat them as as heap objects with dynamic lifetime (as what they really are) and their own rules, of existence, while giving them certain properties that make them usable as if they were static. Call them lazy_static<> perhaps, or to make their dynamic nature more visible to the user: lazy_ptr<> ? Roland

Roland Schwarz wrote:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
But I think your intent is: since we make the object appear like they are static, they should behave similar to real static objects. Correct?
Yes, that's what I meant.
Umm, than as I would say in germanish "we get into devils kitchen".
There is one interesting idiom that gets around the problem: X* instance() { static X x; return &x; } void f() // where the static would have been { call_once( instance ); X * px = instance(); // use *px } It's made a bit easier if call_once is allowed to return a value... although last time this was brought up this appeared a bit problematic to implement. void f() // where the static would have been { X * px = call_once( instance ); // use *px } I wonder whether something similar can be "componentized" given an extended call_once.

Here's a win32-only static once_init implementation. This could be used as a general purpose static-init template, or as a basis for making boost::mutex work both as a static or non-static object, with a default constructor. Comments please, Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk ////////// #include <cstring> #include <cstddef> #include "boost/aligned_storage.hpp" #include "boost/thread/win32/interlocked_read.hpp" #include <boost/static_assert.hpp> #include <boost/thread/win32/thread_primitives.hpp> #include <boost/detail/interlocked.hpp> #include "boost/thread/mutex.hpp" #include <windows.h> #include <iostream> #include "boost/thread/thread.hpp" #ifdef BOOST_NO_STDC_NAMESPACE namespace std { using ::strlen; using ::memcpy; using ::ptrdiff_t; } #endif template<typename T> struct once_init { once_init() { init_once(); } ~once_init() { (*this)->~T(); CloseHandle(init_event); } void lazy_init() { if(!::boost::detail::interlocked_read(&init_event)) { init_once(); } } T* operator->() { lazy_init(); return reinterpret_cast<T*>(data.address()); } private: void* init_event; boost::aligned_storage<sizeof(T)> data; BOOST_STATIC_CONSTANT(unsigned,name_fixed_length=48); BOOST_STATIC_CONSTANT(unsigned,event_name_length=name_fixed_length+sizeof(void*)*2+sizeof(unsigned long)*2+1); template <class I> void int_to_string(I p, char* buf) { unsigned i=0; for(; i < sizeof(I)*2; ++i) { buf[i] = 'A' + static_cast<char>((p >> (i*4)) & 0x0f); } buf[i] = 0; } void create_event_name(char (&name)[event_name_length]) { static const char fixed_name[]="{F21E1417-EF70-4665-9758-5DBC70AE2FCC}-init-flag"; BOOST_STATIC_ASSERT(sizeof(fixed_name) == (name_fixed_length+1)); std::memcpy(name,fixed_name,sizeof(fixed_name)); BOOST_ASSERT(sizeof(name) == std::strlen(name) + sizeof(void*)*2 + sizeof(unsigned long)*2 + 1); BOOST_STATIC_ASSERT(sizeof(void*) == sizeof(std::ptrdiff_t)); int_to_string(reinterpret_cast<std::ptrdiff_t>(this), name + name_fixed_length); int_to_string(GetCurrentProcessId(), name + name_fixed_length + sizeof(void*)*2); BOOST_ASSERT(sizeof(name) == std::strlen(name) + 1); } void init_once() { char event_name[event_name_length]; create_event_name(event_name); void* const event=CreateEventA(NULL,true,false,event_name); BOOST_ASSERT(event); if(GetLastError()==ERROR_ALREADY_EXISTS) { WaitForSingleObject(event,BOOST_INFINITE); } else { new(reinterpret_cast<T*>(data.address())) T(); BOOST_INTERLOCKED_EXCHANGE_POINTER(&init_event,event); SetEvent(event); } } }; struct A { A() { std::cout<<GetCurrentThreadId()<<"initialize "<<this<<std::endl; Sleep(1000); } ~A() { std::cout<<GetCurrentThreadId()<<"destroy "<<this<<std::endl; } void f() { std::cout<<GetCurrentThreadId()<<"f "<<this<<std::endl; } int i; }; void f() { static once_init<A> a; a->f(); } int main() { boost::thread t1(f); boost::thread t2(f); t2.join(); t1.join(); }

Anthony Williams wrote:
Here's a win32-only static once_init implementation. This could be used as a general purpose static-init template, or as a basis for making boost::mutex work both as a static or non-static object, with a default constructor.
Comments please,
Anthony, I did not yet have had the time to walk through all details of your proposal, but the main problem why this isn't thread safe is: void f() { static once_init<A> a; a->f(); } class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe. I am even unsure if void foo() { static int n = 42; m = n; } is thread safe, since: 6.7/4: " ... A local object of POD type with static storage duration initialized with constant-expressions is initialized before its block is first entered. ..." To me this reads, as if the compiler would be allowed to put the initialization into the function entry code. This would rule out local static objects completely for MT safe programs. Perhaps we should try to discuss this at comp.lang.c++.pthreads ? Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
Here's a win32-only static once_init implementation. This could be used as a general purpose static-init template, or as a basis for making boost::mutex work both as a static or non-static object, with a default constructor.
Comments please,
Anthony, I did not yet have had the time to walk through all details of your proposal, but the main problem why this isn't thread safe is:
void f() { static once_init<A> a; a->f();
}
class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe.
Yes, agreed. That's why the operator-> also does a lazy_init(). This is completely platform- and implementation-dependent code, precisely the sort of thing that needs to be wrapped in a library, and not written by the general public. On MSVC 7.1, the compiler will call the constructor in the first thread that starts the function, and not any others. It will also not wait for the constructor to finish. For objects of static storage duration, the memory is initialized to 0 upon startup. Since the constructor doesn't read the data, and only sets it with the final InterlockedExchange, the constructor doesn't care what the initial data value is. However, the lazy_init does care --- it uses an interlocked read to ensure that it correctly reads the value in memory. If it is non-zero, then the constructor must have run already, else we're racing against the constructor, which is running in another thread. If the constructor has run, we're fine. If we're racing, we run the same init routine as the constructor, which is designed to only run once. The init routine tries to create a manual-reset event object, initially in the non-signalled state. This is a named object, so the same one is returned from every call, but GetLastError() returns ERROR_ALREADY_EXISTS if it already exists. If it already exists, we wait for it, else we're the first thread to run the init routine (we won the race), so we actually do the initialization, store the event handle for use by the destructor, and as the flag that construction is done, and signal the event, so the other racing threads can resume, safe in the knowledge that everything is fine. For the destructor, the thread that actually runs the constructor will schedule a destructor call with atexit (on MSVC), so the destructor runs once, too. I should have explained the logic when I posted it, but I was short of time. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe.
Yes, agreed. That's why the operator-> also does a lazy_init().
This is completely platform- and implementation-dependent code, precisely the sort of thing that needs to be wrapped in a library, and not written by the general public.
I am not convinced yet, that this code is thread safe even on MSVC: You can happen to have multiple ctor calls waiting on WaitForSingleObject(event,BOOST_INFINITE); in init_once, true? Then when the the event is signalled (after the object has been constructed) all pending ctors (possibly just having waited) return. Now if you look at the assembly code level, you will see that each will schedule a call to atexit, which makes your code end up with multiple calls to dtor. This is bad. isn't it? (Not to mention, that I think I found out that the call to atexit itself isn't thread safe :-(, at least for posix, don't know for MSVC yet.)
On MSVC 7.1, the compiler will call the constructor in the first thread that starts the function, and not any others.
Sorry I cannot see this. Why wouldn't the others beeing called while construction is underways? Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe.
Yes, agreed. That's why the operator-> also does a lazy_init().
This is completely platform- and implementation-dependent code, precisely the sort of thing that needs to be wrapped in a library, and not written by the general public.
I am not convinced yet, that this code is thread safe even on MSVC: You can happen to have multiple ctor calls waiting on WaitForSingleObject(event,BOOST_INFINITE); in init_once, true?
Then when the the event is signalled (after the object has been constructed) all pending ctors (possibly just having waited) return. Now if you look at the assembly code level, you will see that each will schedule a call to atexit, which makes your code end up with multiple calls to dtor. This is bad. isn't it? (Not to mention, that I think I found out that the call to atexit itself isn't thread safe :-(, at least for posix, don't know for MSVC yet.)
Only the thread(s) that runs the constructor will schedule atexit. The set of the "constructor has been run" flag happens before the constructor is run, so this should generally be avoided. It's not an atomic instruction, though, so it's possible that multiple constructor runs will happen. That's OK in itself, because the constructor runs the same init routine as the lazy_init, and will only actually do the init once, even if called multiple times concurrently. More concerning is the multiple atexit calls, since that means that the destructor also has to be capable of being run multiple times. They won't be concurrent, though, so there's no locking worry. Just setting the init_event handle to NULL after closing it will do fine. By checking the CRT source, I can see that atexit is thread-safe in MT builds on MSVC.
On MSVC 7.1, the compiler will call the constructor in the first thread that starts the function, and not any others.
Sorry I cannot see this. Why wouldn't the others beeing called while construction is underways?
The code looks like this: mov eax, DWORD PTR ?$S1@?1??f@@YAXXZ@4IA and eax, 1 jne SHORT $L86349 mov ecx, DWORD PTR ?$S1@?1??f@@YAXXZ@4IA or ecx, 1 mov DWORD PTR ?$S1@?1??f@@YAXXZ@4IA, ecx which sets the flag once it's been decided to call the constructor before it actually calls it, so it doesn't get called again. Of course, as I pointed out above, this is not atomic, so isn't reliable. Thanks for making me look closer. Anyway, I'm sure this is fine if the destructor is changed to: ~once_init() { if(init_event) { (*this)->~T(); CloseHandle(init_event); init_event=0; } } Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe. Yes, agreed. That's why the operator-> also does a lazy_init().
This is completely platform- and implementation-dependent code, precisely the sort of thing that needs to be wrapped in a library, and not written by the general public. I am not convinced yet, that this code is thread safe even on MSVC: You can happen to have multiple ctor calls waiting on WaitForSingleObject(event,BOOST_INFINITE); in init_once, true?
Then when the the event is signalled (after the object has been constructed) all pending ctors (possibly just having waited) return. Now if you look at the assembly code level, you will see that each will schedule a call to atexit, which makes your code end up with multiple calls to dtor. This is bad. isn't it? (Not to mention, that I think I found out that the call to atexit itself isn't thread safe :-(, at least for posix, don't know for MSVC yet.)
Only the thread(s) that runs the constructor will schedule atexit.
But at least in: void f() { static once_init<A> a; a->f(); } both threads (may) call the contructor of once_init<A>. So it is possible, that the destructor of once_init<A> is scheduled multiple times for atexit. Or even atexit may be called multiple times simultaneously. And with: ~once_init() { (*this)->~T(); CloseHandle(init_event); } ~T() also.
Anyway, I'm sure this is fine if the destructor is changed to:
~once_init() { if(init_event) { (*this)->~T(); CloseHandle(init_event); init_event=0; } }
Not really, the compiler is allowed to rearrange anything in it as long as the observable behavior (according to a single threaded machine) is the same. Observable are only the calls to "library I/O functions" and accesses to volatile data. This means, that the compiler is allowed to rearrange anything inside the if clause as long as CloseHandle is called with the right value (and the other IO functions inside ~T() are called in the right order and with the right parameters). And even if you make init_event volatile it does not change much: init_event=0 is after CloseHandle but not necessarily after ~T(). Raymond Häb

Raymond Haeb <ray.haeb@gmx.de> writes:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
class once_init does have a user declared constructor and a destructor! And this definitely is _not_thread safe. Yes, agreed. That's why the operator-> also does a lazy_init().
This is completely platform- and implementation-dependent code, precisely the sort of thing that needs to be wrapped in a library, and not written by the general public. I am not convinced yet, that this code is thread safe even on MSVC: You can happen to have multiple ctor calls waiting on WaitForSingleObject(event,BOOST_INFINITE); in init_once, true?
Then when the the event is signalled (after the object has been constructed) all pending ctors (possibly just having waited) return. Now if you look at the assembly code level, you will see that each will schedule a call to atexit, which makes your code end up with multiple calls to dtor. This is bad. isn't it? (Not to mention, that I think I found out that the call to atexit itself isn't thread safe :-(, at least for posix, don't know for MSVC yet.)
Only the thread(s) that runs the constructor will schedule atexit.
both threads (may) call the contructor of once_init<A>. So it is possible, that the destructor of once_init<A> is scheduled multiple times for atexit. Or even atexit may be called multiple times simultaneously.
Yes, that's true, which I accepted in the email that you're replying to. Running the constructor multiple times (even concurrently) is fine, given the way it's designed. The multiple destructor calls registered with atexit is more of a problem with my initial code, but I stand by my statement:
Anyway, I'm sure this is fine if the destructor is changed to:
~once_init() { if(init_event) { (*this)->~T(); CloseHandle(init_event); init_event=0; } }
Not really, the compiler is allowed to rearrange anything in it as long as the observable behavior (according to a single threaded machine) is the same. Observable are only the calls to "library I/O functions" and accesses to volatile data. This means, that the compiler is allowed to rearrange anything inside the if clause as long as CloseHandle is called with the right value (and the other IO functions inside ~T() are called in the right order and with the right parameters). And even if you make init_event volatile it does not change much: init_event=0 is after CloseHandle but not necessarily after ~T().
This destructor will only be run from one thread (the thread processing the atexit calls). It may be run multiple times, but the if(init_event) and the init_event=0 take care of that, even if they are reordered within the destructor. This code is intended to be platform- and implementation-dependent. I have investigated how MSVC 7.1 and MSVC 8.0 work in this instance, and believe that my code will work on those compilers. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Roland Schwarz <roland.schwarz@chello.at> writes:
Your code demonstrates how we could provide the same facility with pthreads, as an extension to call_once. Incidentally, whilst we're thinking about this, the POSIX spec says:
"The behavior of pthread_once() is undefined if once_control has automatic storage duration or is not initialized by PTHREAD_ONCE_INIT."
So we need to document this restriction on the use of boost::once_flag, unless you can find a way round it. This restriction doesn't apply on Windows, at least not with my implementation.
On the second reading I am not any more sure which "restriction" you are talking about for Windows: 1) may not have automatic storage duration 2) must be explicitly initialized by PTHREAD_ONCE_INIT
Sorry, I didn't make it clear. It is the automatic storage duration restriction which is not present on Windows. It's also hard to use a pthread_once_t as a class member, since you can't be guaranteed to use PTHREAD_ONCE_INIT in the member-init-list (e.g. if it's an aggregate initializer), and you're not allowed to leave it uninitialized, and initialize it later. You could only use it in a class which is itself an aggregate, with an appropriate aggregate initializer. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Sorry, I didn't make it clear. It is the automatic storage duration restriction which is not present on Windows.
I see. But what is your intent? Are you trying to come up with a single mutex, that is both statically (lazy) intializeable (i.e. of type POD) _and_ at the same time default constructible? This simply is not feasible. If your intent is to work around the pthread_once this is as I see it, much easier. Just use a static mutex and condition to emulate call_once. Aren't the ability to statically initialize a mutex, and availability of call_once just two sides of the same medal? Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
Sorry, I didn't make it clear. It is the automatic storage duration restriction which is not present on Windows.
I see.
But what is your intent?
I was just pointing out that the implementation of call_once for pthreads as a simple wrapper around pthread_once_t has a not-documented-by-boost restriction on usage which is not present for the Windows implementation.
Are you trying to come up with a single mutex, that is both statically (lazy) intializeable (i.e. of type POD) _and_ at the same time default constructible?
It would be nice.
This simply is not feasible.
Currently, I tend to agree.
If your intent is to work around the pthread_once this is as I see it, much easier. Just use a static mutex and condition to emulate call_once.
Aren't the ability to statically initialize a mutex, and availability of call_once just two sides of the same medal?
Sort of. You can use the same mechanisms for doing both. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
participants (7)
-
Anthony Williams
-
Bronek Kozicki
-
Felipe Magno de Almeida
-
John Maddock
-
Peter Dimov
-
Raymond Haeb
-
Roland Schwarz