
Peter Dimov <pdimov <at> pdimov.com> writes:
Alexander Terekhov wrote:
Consider also that
"Load Seq_Cst: MOV (from memory) Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE"
is an overkill for typical use cases...
But that's not a problem because everyone who understands should use explicit constraints, even if they happen to be memory_order_seq_cst. Relying on the SC default is bad practice because it can (and, to be on the safe side, should) be interpreted to mean that the author just hasn't figured out the minimum requirements.
I would have stated this differently, though probably with the same result. At least when writing application-level code, I would always rely on the default initially, and not worry about ordering. I would explicitly specify the ordering only when it turns out that memory_order_seq_cst introduces a performance problem. If nothing else, this would allow me to separate out debugging of memory model issues. My experience is that very few people manage to get memory ordering right. My PPoPP 07 and MSPC 11 papers both have examples of commonly used mutex implementations getting it wrong in various interesting ways. We didn't understand what the specs actually required, but on top of that some of the implementations got it wrong in ways that were clearly independent of any misunderstanding of the spec. Given that the experts can't figure it out for what should be the easy cases, I'd much rather most people just stick the sequentially consistent default. This is entirely consistent with Peter's claim that using the sequentially consistent default means I haven't thought about it. But in many cases I really don't want to think about it, and that may be a fine state of affairs. For example, if I use an atomic counter, it's very likely that either: 1. It's not performance critical, I'm using atomics because they're more direct than mutexes in this case, or because I need the signal handler/interrupt safety, and the SC version is fine, or 2. It is performance critical, and I probably want to think hard about alternate solutions the keep thread-local counts. In both cases, it's unlikely that memory ordering will significantly impact application performance. Of course this doesn't apply to all use cases. Hans