Re: [boost] [thread] Can Boost.Thread use Boost.Atomic without falling on a compatibility issue?

13 Jan 2013

      Le 12/01/13 18:29, Andrey Semashev a écrit :
...
On Saturday 12 January 2013 17:49:38 Vicente J. Botet Escriba wrote:
...
Le 12/01/13 17:09, Andrey Semashev a écrit :
...
On Saturday 12 January 2013 15:48:21 Vicente J. Botet Escriba wrote:
...
...
On Saturday 12 January 2013 12:08:10 Vicente J. Botet Escriba wrote:
...
Le 12/01/13 11:51, Andrey Semashev a écrit :
> Anyway, can Boost.Thread be modified in such a way so that
> Boost.Atomic
> use is not exposed to the user? E.g. so that call_once invokes a
> compiled
> function implemented within Boost.Thread library that uses
> Boost.Atomic
> to modify the once flag.
yes, this will be great. I don't know Boost.Atomic details to try to do
this. Andrey do you mind to provide a patch that doesn't needs to link
with boost_atomic?
I attached the patch (for posix only). It appeared a bit hacky and I'm
not
sure if you're ok with it.
Thanks for the quick patch.
Shouldn't the patch concern only boost/thread/pthread/once.hpp and
Le 12/01/13 14:26, Andrey Semashev a écrit :
libs/thread/src/pthread/once.hpp?
Could you send the resulting files also?
There is no libs/thread/src/pthread/once.hpp file as far as I can see.
RIght. I meant
libs/thread/src/pthread/once.cpp
It is patched.
...
...
The
boost/thread/pthread/once.hpp file is no longer needed and can be removed
(as well as boost/thread/win32/once.hpp when win32 version is
implemented). The complete public code is platform-independent and
resides in
boost/thread/once.hpp after the patch is applied.
IIRC the problem was on the Posix implementation, so a specific patch
for the pthread files will be desirable.
Ok, I intended to port both Windows and POSIX implementations but we could do
it just for POSIX variant.
I attached the updated patch to the ticket. The Jamfile will also have to be
updated to add the dependency on Boost.Atomic. However, the Jamfile in
Boost.Thread is rather complicated so I didn't do that.
...
Anyway, could you send the resulting files to make easier the review?
The files are attached. These should be boost/thread/posix/once.hpp and
libs/thread/src/posix/once.cpp, other files are intact.
Hi,

I have studied your patch and I like how you have separated the test on 
whether the function has been called, the commit and the rollback. I 
believe I will adopt this separation.
I have studied also the current algorithm implementation and the 
algorithm on which it was based on [1].

The efficiency of the algorithm in [1], depends on, I quote "
[thread_local]

    There is a way to access thread-specific data or thread-local
    storage. The technique is useful only if such an access is faster
    than a memory barrier. For the disassembly above, the compiler used
    its ability to access thread-local storage using variables declared
    with __thread, so the thread-local access is a normal "mov"
    instruction with a different segment register.
[atomicity]
    There exists an integral type T (such as sig_atomic_t) such that
    loads and stores of a variable of type T are atomic. It is not
    possible to read from such a variable any value that was not written
    to the variable, even if multiple reads and writes occur on many
    processors. Another way to say this is that variables of type T do
    not exhibit word tearing when accessed with full-width accesses. 
[*once_bound]
    The number of pthread_once_t variables in a programme is bounded and
    can be counted in an instance of type T without overflow. If T is
    32-bits, this implementation requires that the number of
    pthread_once_t variables in a programme not exceed 2**31-3. (Recall
    that pthread_once_t variables are intended to have static storage
    class, so it would be remarkable for such a huge number to exist.) 
[monotonicity]
    Accesses to a single variable V of type T by a single thread X
    appear to occur in programme order. For example, if V is initially
    0, then X writes 1, and then 2 to V, no thread (including but not
    limited to X) can read a value from V and then subsequently read a
    lower value from V. (Notice that this does not prevent arbitrary
    load and store reordering; it constrains ordering only between
    actions on a single memory location. This assumption is reasonable
    on all architectures that I currently know about. I suspect that the
    Java and CLR memory models require this assumption also.) 
"

The current implementation uses pthread_getspecific/pthread_setspecific 
which I'm not sure "is faster than a memory barrier".
The type T is unsigned long which doesn't satisfy [atomicity] and 
[monotonicity] requirements on all platforms.

The Boost.Thread should be changed so that the current algorithm is used 
only when the requirements are meet (this will mean at least to change 
the implementation of the thread specific data and that the datatype is 
adapted to the platform). So, to be portable we need an alternative 
algorithm, which could be the one you (Andrey) proposed in the patch. As 
the patch uses Boost.Atomic, this alternative will be really efficient 
only when atomic<uchar_t> is lock free.

libc++ implements it using an algorithm that is close to the double 
checked locking. It reuse the mutex needed to wait on the condition 
variable to protect the whole data and in addition making a non 
protected test on the flag state (see ********)

template<class _Callable> void call_once(once_flag& __flag, _Callable 
__func)
{
     if (__flag.__state_ != ~0ul) // **********
     {
         __call_once_param<_Callable> __p(__func);
         __call_once(__flag.__state_, &__p, 
&__call_once_proxy<_Callable>); // Here the flag is tested again with 
the mutex locked.
     }
}

This algorithm is efficient but I suspect that this algorithm is not 
always correct for the same reasons the Boost.Thread current algorithm 
is not. But on which platforms the libc++ implementation is correct? 
Howard, could you comment if you are reading this post.

For the time been I will play with the Andrey, the Howard (libc++) 
implementation and update the current implementation for the thread 
specific data. I will add some compiler flags that will direct to one of 
these implementations and make the best choice depending on the 
platform/compiler.

Many thank Andrey,
Vicente

[1] 
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html#Appendix).
[2] http://llvm.org/svn/llvm-project/libcxx/trunk/include/mutex and 
http://llvm.org/svn/llvm-project/libcxx/trunk/src/mutex.cpp

P.S. It is surprising that gcc implementation in libstdc++v3 needs to 
initialize a lot of data before using __gthread_once and seems to don't 
manage the case when the user function throw an exception.

   extern "C"
   {
     void __once_proxy()
     {
#ifndef _GLIBCXX_HAVE_TLS
       function<void()> __once_call = std::move(__once_functor);
       if (unique_lock<mutex>* __lock = __get_once_functor_lock_ptr())
       {
         // caller is using new ABI and provided lock ptr
         __get_once_functor_lock_ptr() = 0;
         __lock->unlock();
       }
       else
         __get_once_functor_lock().unlock();  // global lock
#endif
       __once_call();
     }
   }

   /// call_once
   template<typename _Callable, typename... _Args>
     void
     call_once(once_flag& __once, _Callable&& __f, _Args&&... __args)
     {
#ifdef _GLIBCXX_HAVE_TLS
       auto __bound_functor = 
std::__bind_simple(std::forward<_Callable>(__f),
           std::forward<_Args>(__args)...);
       __once_callable = &__bound_functor;
       __once_call = &__once_call_impl<decltype(__bound_functor)>;
#else
       unique_lock<mutex> __functor_lock(__get_once_mutex());
       auto __callable = std::__bind_simple(std::forward<_Callable>(__f),
           std::forward<_Args>(__args)...);
       __once_functor = [&]() { __callable(); };
       __set_once_functor_lock_ptr(&__functor_lock);
#endif

       int __e = __gthread_once(&(__once._M_once), &__once_proxy);

#ifndef _GLIBCXX_HAVE_TLS
       if (__functor_lock)
         __set_once_functor_lock_ptr(0);
#endif

       if (__e)
     __throw_system_error(__e);
     }