Re: [boost] Patch to boost/detail/interlocked.hpp

23 Sep 2005

      "Peter Dimov" <pdimov@mmltd.net> writes:
...
Anthony Williams wrote:
...
"Peter Dimov" <pdimov@mmltd.net> writes:
...
BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp
(macro vs inline aside). The aim of this header is only to provide
the Interlocked* functions as specified and documented by Microsoft
without including <windows.h>; it is not meant to introduce new
unspecified and undocumented functionality.
Fair enough. I'll move them elsewhere. I used macros rather than
inline functions, for consistency with the rest of the INTERLOCKED
stuff. Maybe inline functions are more appropriate, since these are
users of the INTERLOCKED functions rather than direct mappings.
Moved to boost/thread/detail/interlocked_read_win32.hpp.
Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the 
thread_rewrite branch closely but a quick glance didn't reveal anything. The 
semantics of InterlockedRead are probably a fully-fenced read? Few lock-free 
algorithms need that.
It's used in thread/detail/lightweight_mutex_win32.hpp,
thread/detail/read_write_mutex_win32.hpp and thread/detail/condition_win32.hpp

I'm using it to ensure that a read from a variable is either before or after
an interlocked_exchange or interlocked_increment, not midway through. I
figured that if one use of a variable was interlocked, others better had be
too.

Maybe I'm wrong. I haven't thought about it *that* hard.
...
...
...
Finally, I believe that for correct double-checked locking you only
need a load with acquire barrier on the fast path - which maps to an
ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a
fully locked cmpxchg you're introducing a performance penalty (the
philosophical debate of whether InterlockedCompareExchange is
guaranteed to enforce memory ordering when the comparison fails
aside.)
Is there an intrinsic function for that? I couldn't find one, which
is why I left it at InterlockedCompareExchange. I guess it could use
InterlockedCompareExchangeAcquire, which reduces the locking penalty.
No, there is no documented way to implement ld.acq using the Windows API. A 
volatile read appears to work properly on all Windows targets/compilers, and 
there are probably thousands of lines of existing code that depend on it, 
but this wasn't specified anywhere.
The newer MSVC 8 documentation finally promises that a volatile read has 
acquire semantics and that a volatile store has release semantics, even on 
IA-64, and the compiler also seems to understand these reordering 
constraints.
http://msdn2.microsoft.com/en-us/library/12a04hfd
The Intel compiler seems to have an option, serialize-volatile, that appears 
to be on by default; so it seems to also enforce acq/rel volatiles.
As I see it, the implementation options are (1) use a volatile read, live 
dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly 
(painful), (3) use a fully-locked implementation and suffer the performance 
consequences - my preference is InterlockedExchangeAdd with zero.
Either way, the actual helper function should be named atomic_load_acq and 
specified to promise acquire semantics, in my opinion.
Thank you for the details. I don't fancy either of the first two options, as I
don't know IA-64 or AMD64 assembly, and I don't feel safe relying on volatile
semantics unless it's really guaranteed correct on all supported compilers.

Is InterlockedExchangeAdd faster/more reliable in some way than
InterlockedCompareExchange?

Anthony
-- 
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk