Re: [boost] Boost.Threads, N2178, N2184, et al

23 Mar 2007

      "Peter Dimov" <pdimov@mmltd.net> writes:
...
Anthony Williams wrote:
...
"Peter Dimov" <pdimov@mmltd.net> writes:
...
I don't see call_once in jss_thread.zip, by the way; maybe you
forgot to put it into the archive?
Oops. Thanks for spotting that. I've added it to the archive, and
updated it to take multiple arguments in passing.
Some comments on that:
Thanks for taking the time to look at this.
...
template<typename Function>
    void call_once(once_flag& flag,Function f)
    {
        // Try for a quick win: if the proceedure has already been called
        // just skip through:
        long const function_complete_flag_value=0xc15730e2;
if(::jss::detail::interlocked_read(&flag)!=function_complete_flag_value)
        {
            char mutex_name[::jss::detail::once_mutex_name_length];
            void* const 
mutex_handle(::jss::detail::create_once_mutex(mutex_name,&flag));
            BOOST_ASSERT(mutex_handle);
            detail::win32::handle_holder const closer(mutex_handle);
            detail::win32_mutex_scoped_lock const lock(mutex_handle);
if(::jss::detail::interlocked_read(&flag)!=function_complete_flag_value)
            {
                f();
                JSS_INTERLOCKED_EXCHANGE(&flag,function_complete_flag_value);
            }
        }
    }
The first load needs to be a load_acquire; the second can be ordinary since 
it's done under a lock. The store needs to be store_release.
I didn't want to think about acquire/release semantics when I wrote that, so I
just went for "ordered" ops.

Agreed that the second read can be ordinary. Actually I think the store can be
ordinary too since it's also done under a lock, and the unlock has (or should
have, anyway) release semantics.

I agree that the first read needs to be load_acquire, though: without the
acquire, there's no synchronization in the case that the flag has been set,
and there's nothing to "release".
...
An interlocked_read is stronger ('ordered') and more expensive than needed 
on a hardware level, but is 'relaxed' on a compiler level under MSVC 7.1 
(the optimizer moves code around it). It's 'ordered' for the compiler as 
well under 8.0; the intrinsics have been changed to be compiler barriers as 
well. InterlockedExchange is similar.
Have you got a reference for that? I would be interested to read about the
details; MSDN is sketchy.
...
A load_acquire can be implemented as a volatile read under 8.0, and a 
volatile read followed by _ReadWriteBarrier under 7.1.
Why don't you need the barrier on 8.0? You need something there in order to
prevent the CPU from doing out-of-order reads (and stores), even if the
compiler won't reorder things. In fact, looking at the assembly code
generated, I believe you need more than a _ReadWriteBarrier in both cases, as
it seems to be purely a compiler barrier, and not a CPU barrier.

On x86, I think a load_acquire needs to either be a simple load followed by an
MFENCE, or a fully ordered RMW operation. The compiler Interlocked intrinics
will generate the latter, but I don't know how to do the former short of
writing inline assembly.

Anthony
-- 
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL