Stephan T. Lavavej wrote:
[Niall Douglas]
I think the Intel paper was referring to MSVC only, which is an unusual compiler in that its atomics all turn into InterlockedXXX functions irrespective of what you ask for. In other words, all seq_cst.
Absolutely untrue for VC-ARM. Also untrue for VC 2015 x86/x64, where <atomic> avoids Interlocked machinery when an ordinary load/store with a compiler barrier will suffice due to the strength of the architecture's guarantees. (The versions are blurring together, so I forget when we made this change.)
If I remember correctly, on x86/x64 all atomic loads can be a plain MOV, even seq_cst. The acquire is implicit, and the sequential consistency is guaranteed by seq_cst stores being XCHG. (Relaxed/release stores are also a plain MOV.) The paper doesn't mean that though. It says that the typical spinlock acquisition is: // atomic_flag f_; while( f_.test_and_set( std::memory_order_acquire ) ) { /* maybe yield */ } and that it's better to do this: // atomic_bool f_; do { while( f_.load( std::memory_order_relaxed ) ) { /* maybe yield */ } } while( f_.exchange( true, std::memory_order_acquire ) ); instead.