
"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp (macro vs inline aside). The aim of this header is only to provide the Interlocked* functions as specified and documented by Microsoft without including <windows.h>; it is not meant to introduce new unspecified and undocumented functionality.
Fair enough. I'll move them elsewhere. I used macros rather than inline functions, for consistency with the rest of the INTERLOCKED stuff. Maybe inline functions are more appropriate, since these are users of the INTERLOCKED functions rather than direct mappings.
Moved to boost/thread/detail/interlocked_read_win32.hpp.
Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the thread_rewrite branch closely but a quick glance didn't reveal anything. The semantics of InterlockedRead are probably a fully-fenced read? Few lock-free algorithms need that.
It's used in thread/detail/lightweight_mutex_win32.hpp, thread/detail/read_write_mutex_win32.hpp and thread/detail/condition_win32.hpp I'm using it to ensure that a read from a variable is either before or after an interlocked_exchange or interlocked_increment, not midway through. I figured that if one use of a variable was interlocked, others better had be too. Maybe I'm wrong. I haven't thought about it *that* hard.
Finally, I believe that for correct double-checked locking you only need a load with acquire barrier on the fast path - which maps to an ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a fully locked cmpxchg you're introducing a performance penalty (the philosophical debate of whether InterlockedCompareExchange is guaranteed to enforce memory ordering when the comparison fails aside.)
Is there an intrinsic function for that? I couldn't find one, which is why I left it at InterlockedCompareExchange. I guess it could use InterlockedCompareExchangeAcquire, which reduces the locking penalty.
No, there is no documented way to implement ld.acq using the Windows API. A volatile read appears to work properly on all Windows targets/compilers, and there are probably thousands of lines of existing code that depend on it, but this wasn't specified anywhere.
The newer MSVC 8 documentation finally promises that a volatile read has acquire semantics and that a volatile store has release semantics, even on IA-64, and the compiler also seems to understand these reordering constraints.
http://msdn2.microsoft.com/en-us/library/12a04hfd
The Intel compiler seems to have an option, serialize-volatile, that appears to be on by default; so it seems to also enforce acq/rel volatiles.
As I see it, the implementation options are (1) use a volatile read, live dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly (painful), (3) use a fully-locked implementation and suffer the performance consequences - my preference is InterlockedExchangeAdd with zero.
Either way, the actual helper function should be named atomic_load_acq and specified to promise acquire semantics, in my opinion.
Thank you for the details. I don't fancy either of the first two options, as I don't know IA-64 or AMD64 assembly, and I don't feel safe relying on volatile semantics unless it's really guaranteed correct on all supported compilers. Is InterlockedExchangeAdd faster/more reliable in some way than InterlockedCompareExchange? Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk