Re: [boost] [lockfree] review

29 Aug 2011

      Peter Dimov <pdimov <at> pdimov.com> writes:
...
Alexander Terekhov wrote:
...
Consider also that
"Load Seq_Cst: MOV (from memory)
Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE"
is an overkill for typical use cases...
But that's not a problem because everyone who understands should use 
explicit constraints, even if they happen to be memory_order_seq_cst. 
Relying on the SC default is bad practice because it can (and, to be on the 
safe side, should) be interpreted to mean that the author just hasn't 
figured out the minimum requirements.
I would have stated this differently, though probably with the same result.  
At least when writing application-level code, I would always rely on the 
default initially, and not worry about ordering.  I would explicitly specify 
the ordering only when it turns out that memory_order_seq_cst introduces a 
performance problem.

If nothing else, this would allow me to separate out debugging of memory model 
issues.

My experience is that very few people manage to get memory ordering right.  My 
PPoPP 07 and MSPC 11 papers both have examples of commonly used mutex 
implementations getting it wrong in various interesting ways.  We didn't 
understand what the specs actually required, but on top of that some of the 
implementations got it wrong in ways that were clearly independent of any 
misunderstanding of the spec.  Given that the experts can't figure it out for 
what should be the easy cases, I'd much rather most people just stick the 
sequentially consistent default.

This is entirely consistent with Peter's claim that using the sequentially 
consistent default means I haven't thought about it.  But in many cases I 
really don't want to think about it, and that may be a fine state of affairs.  
For example, if I use an atomic counter, it's very likely that either:

1. It's not performance critical, I'm using atomics because they're more 
direct than mutexes in this case, or because I need the signal 
handler/interrupt safety, and the SC version is fine, or
2. It is performance critical, and I probably want to think hard about 
alternate solutions the keep thread-local counts.

In both cases, it's unlikely that memory ordering will significantly impact 
application performance.  Of course this doesn't apply to all use cases.

Hans

Re: [boost] [lockfree] review

Hans Boehm