
"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
I don't see call_once in jss_thread.zip, by the way; maybe you forgot to put it into the archive?
Oops. Thanks for spotting that. I've added it to the archive, and updated it to take multiple arguments in passing.
Some comments on that:
Thanks for taking the time to look at this.
template<typename Function> void call_once(once_flag& flag,Function f) { // Try for a quick win: if the proceedure has already been called // just skip through: long const function_complete_flag_value=0xc15730e2;
if(::jss::detail::interlocked_read(&flag)!=function_complete_flag_value) { char mutex_name[::jss::detail::once_mutex_name_length]; void* const mutex_handle(::jss::detail::create_once_mutex(mutex_name,&flag)); BOOST_ASSERT(mutex_handle); detail::win32::handle_holder const closer(mutex_handle); detail::win32_mutex_scoped_lock const lock(mutex_handle);
if(::jss::detail::interlocked_read(&flag)!=function_complete_flag_value) { f(); JSS_INTERLOCKED_EXCHANGE(&flag,function_complete_flag_value); } } }
The first load needs to be a load_acquire; the second can be ordinary since it's done under a lock. The store needs to be store_release.
I didn't want to think about acquire/release semantics when I wrote that, so I just went for "ordered" ops. Agreed that the second read can be ordinary. Actually I think the store can be ordinary too since it's also done under a lock, and the unlock has (or should have, anyway) release semantics. I agree that the first read needs to be load_acquire, though: without the acquire, there's no synchronization in the case that the flag has been set, and there's nothing to "release".
An interlocked_read is stronger ('ordered') and more expensive than needed on a hardware level, but is 'relaxed' on a compiler level under MSVC 7.1 (the optimizer moves code around it). It's 'ordered' for the compiler as well under 8.0; the intrinsics have been changed to be compiler barriers as well. InterlockedExchange is similar.
Have you got a reference for that? I would be interested to read about the details; MSDN is sketchy.
A load_acquire can be implemented as a volatile read under 8.0, and a volatile read followed by _ReadWriteBarrier under 7.1.
Why don't you need the barrier on 8.0? You need something there in order to prevent the CPU from doing out-of-order reads (and stores), even if the compiler won't reorder things. In fact, looking at the assembly code generated, I believe you need more than a _ReadWriteBarrier in both cases, as it seems to be purely a compiler barrier, and not a CPU barrier. On x86, I think a load_acquire needs to either be a simple load followed by an MFENCE, or a fully ordered RMW operation. The compiler Interlocked intrinics will generate the latter, but I don't know how to do the former short of writing inline assembly. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL