
Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
An interlocked_read is stronger ('ordered') and more expensive than needed on a hardware level, but is 'relaxed' on a compiler level under MSVC 7.1 (the optimizer moves code around it). It's 'ordered' for the compiler as well under 8.0; the intrinsics have been changed to be compiler barriers as well. InterlockedExchange is similar.
Have you got a reference for that? I would be interested to read about the details; MSDN is sketchy.
The documentation for the intrinsics now states that they act as a compiler barrier for 8.0. http://msdn2.microsoft.com/en-us/library/1s26w950.aspx The documentation that shipped with VC 7.1 did not, and in fact I have observed the optimizer moving code across an interlocked intrinsic when I developed the prototype of N2195.
A load_acquire can be implemented as a volatile read under 8.0, and a volatile read followed by _ReadWriteBarrier under 7.1.
Why don't you need the barrier on 8.0? You need something there in order to prevent the CPU from doing out-of-order reads (and stores), even if the compiler won't reorder things. In fact, looking at the assembly code generated, I believe you need more than a _ReadWriteBarrier in both cases, as it seems to be purely a compiler barrier, and not a CPU barrier.
On x86 all loads already have acquire semantics by default, and all stores have release semantics. MSVC 8.0 extends a volatile load/store to have acquire/release semantics (both hardware and compiler) on every platform, including IA64. http://msdn2.microsoft.com/en-us/library/12a04hfd.aspx