
Le 12/01/13 18:29, Andrey Semashev a écrit :
On Saturday 12 January 2013 17:49:38 Vicente J. Botet Escriba wrote:
Le 12/01/13 17:09, Andrey Semashev a écrit :
On Saturday 12 January 2013 15:48:21 Vicente J. Botet Escriba wrote:
On Saturday 12 January 2013 12:08:10 Vicente J. Botet Escriba wrote:
Le 12/01/13 11:51, Andrey Semashev a écrit : > Anyway, can Boost.Thread be modified in such a way so that > Boost.Atomic > use is not exposed to the user? E.g. so that call_once invokes a > compiled > function implemented within Boost.Thread library that uses > Boost.Atomic > to modify the once flag. yes, this will be great. I don't know Boost.Atomic details to try to do this. Andrey do you mind to provide a patch that doesn't needs to link with boost_atomic? I attached the patch (for posix only). It appeared a bit hacky and I'm not sure if you're ok with it. Thanks for the quick patch. Shouldn't the patch concern only boost/thread/pthread/once.hpp and
Le 12/01/13 14:26, Andrey Semashev a écrit : libs/thread/src/pthread/once.hpp? Could you send the resulting files also? There is no libs/thread/src/pthread/once.hpp file as far as I can see. RIght. I meant
libs/thread/src/pthread/once.cpp It is patched.
The boost/thread/pthread/once.hpp file is no longer needed and can be removed (as well as boost/thread/win32/once.hpp when win32 version is implemented). The complete public code is platform-independent and resides in boost/thread/once.hpp after the patch is applied. IIRC the problem was on the Posix implementation, so a specific patch for the pthread files will be desirable. Ok, I intended to port both Windows and POSIX implementations but we could do it just for POSIX variant.
I attached the updated patch to the ticket. The Jamfile will also have to be updated to add the dependency on Boost.Atomic. However, the Jamfile in Boost.Thread is rather complicated so I didn't do that.
Anyway, could you send the resulting files to make easier the review? The files are attached. These should be boost/thread/posix/once.hpp and libs/thread/src/posix/once.cpp, other files are intact.
Hi, I have studied your patch and I like how you have separated the test on whether the function has been called, the commit and the rollback. I believe I will adopt this separation. I have studied also the current algorithm implementation and the algorithm on which it was based on [1]. The efficiency of the algorithm in [1], depends on, I quote " [thread_local] There is a way to access thread-specific data or thread-local storage. The technique is useful only if such an access is faster than a memory barrier. For the disassembly above, the compiler used its ability to access thread-local storage using variables declared with __thread, so the thread-local access is a normal "mov" instruction with a different segment register. [atomicity] There exists an integral type T (such as sig_atomic_t) such that loads and stores of a variable of type T are atomic. It is not possible to read from such a variable any value that was not written to the variable, even if multiple reads and writes occur on many processors. Another way to say this is that variables of type T do not exhibit word tearing when accessed with full-width accesses. [*once_bound] The number of pthread_once_t variables in a programme is bounded and can be counted in an instance of type T without overflow. If T is 32-bits, this implementation requires that the number of pthread_once_t variables in a programme not exceed 2**31-3. (Recall that pthread_once_t variables are intended to have static storage class, so it would be remarkable for such a huge number to exist.) [monotonicity] Accesses to a single variable V of type T by a single thread X appear to occur in programme order. For example, if V is initially 0, then X writes 1, and then 2 to V, no thread (including but not limited to X) can read a value from V and then subsequently read a lower value from V. (Notice that this does not prevent arbitrary load and store reordering; it constrains ordering only between actions on a single memory location. This assumption is reasonable on all architectures that I currently know about. I suspect that the Java and CLR memory models require this assumption also.) " The current implementation uses pthread_getspecific/pthread_setspecific which I'm not sure "is faster than a memory barrier". The type T is unsigned long which doesn't satisfy [atomicity] and [monotonicity] requirements on all platforms. The Boost.Thread should be changed so that the current algorithm is used only when the requirements are meet (this will mean at least to change the implementation of the thread specific data and that the datatype is adapted to the platform). So, to be portable we need an alternative algorithm, which could be the one you (Andrey) proposed in the patch. As the patch uses Boost.Atomic, this alternative will be really efficient only when atomic<uchar_t> is lock free. libc++ implements it using an algorithm that is close to the double checked locking. It reuse the mutex needed to wait on the condition variable to protect the whole data and in addition making a non protected test on the flag state (see ********) template<class _Callable> void call_once(once_flag& __flag, _Callable __func) { if (__flag.__state_ != ~0ul) // ********** { __call_once_param<_Callable> __p(__func); __call_once(__flag.__state_, &__p, &__call_once_proxy<_Callable>); // Here the flag is tested again with the mutex locked. } } This algorithm is efficient but I suspect that this algorithm is not always correct for the same reasons the Boost.Thread current algorithm is not. But on which platforms the libc++ implementation is correct? Howard, could you comment if you are reading this post. For the time been I will play with the Andrey, the Howard (libc++) implementation and update the current implementation for the thread specific data. I will add some compiler flags that will direct to one of these implementations and make the best choice depending on the platform/compiler. Many thank Andrey, Vicente [1] (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html#Appendix). [2] http://llvm.org/svn/llvm-project/libcxx/trunk/include/mutex and http://llvm.org/svn/llvm-project/libcxx/trunk/src/mutex.cpp P.S. It is surprising that gcc implementation in libstdc++v3 needs to initialize a lot of data before using __gthread_once and seems to don't manage the case when the user function throw an exception. extern "C" { void __once_proxy() { #ifndef _GLIBCXX_HAVE_TLS function<void()> __once_call = std::move(__once_functor); if (unique_lock<mutex>* __lock = __get_once_functor_lock_ptr()) { // caller is using new ABI and provided lock ptr __get_once_functor_lock_ptr() = 0; __lock->unlock(); } else __get_once_functor_lock().unlock(); // global lock #endif __once_call(); } } /// call_once template<typename _Callable, typename... _Args> void call_once(once_flag& __once, _Callable&& __f, _Args&&... __args) { #ifdef _GLIBCXX_HAVE_TLS auto __bound_functor = std::__bind_simple(std::forward<_Callable>(__f), std::forward<_Args>(__args)...); __once_callable = &__bound_functor; __once_call = &__once_call_impl<decltype(__bound_functor)>; #else unique_lock<mutex> __functor_lock(__get_once_mutex()); auto __callable = std::__bind_simple(std::forward<_Callable>(__f), std::forward<_Args>(__args)...); __once_functor = [&]() { __callable(); }; __set_once_functor_lock_ptr(&__functor_lock); #endif int __e = __gthread_once(&(__once._M_once), &__once_proxy); #ifndef _GLIBCXX_HAVE_TLS if (__functor_lock) __set_once_functor_lock_ptr(0); #endif if (__e) __throw_system_error(__e); }