
[Dave Abrahams pointed me at this thread.] Alexander Terekhov <terekhov <at> web.de> writes:
Dave Abrahams wrote:
on Thu Aug 25 2011, Alexander Terekhov <terekhov-AT-web.de> wrote:
Ilya Sokolov wrote:
AFAIK, in the example above another thread can't observe reordering without introducing UB.
I meant "observed" as in "check the sequence of writes in the generated code".
Do we care about what the generated code looks like? IIUC what counts is the sequence of memory states that another another thread can see.
But int can be simply changed to relaxed atomic<int> so that another thread can observe reordering without UB.
Sure.
I meant that for the example upthread, with an int instead of relaxed atomic<int>, another thread's code triggering UB does not prevent us from observing reordered writes in the writing thread's code (in generated code), by looking at it.
Changing int to relaxed atomic<int> simply removes UB but does not change the relevant code in the writing thread.
regards, alexander.
I agree with Alexander and most of the recent postings. If I write mx.lock(); x = 1; mx.unlock(); my.lock(); y = 1; my.unlock(); There is nothing to prevent the stores to x and y from becoming visible out of order at the hardware level. Enforcing ordering here would add needless and substantial cost. But a data-race-free program cannot observe such reordering. (One reference for that is Sarita Adve's and my PLDI 08 paper. Unofficial TR version at http://www.hpl.hp.com/techreports/2008/HPL-2008- 56.html. I believe Mark Batty et al have a much more complete and formal proof.) If we change x and y to be atomic variables, and the stores to be memory_order_relaxed, then an observer can tell the difference. I.e. if x and y are initially zero, and another thread does r1 = y; r2 = x; (or the acquire versions), it can see r1 = 1 and r2 = 0, since there are no synchronizes-with relationships between threads, and thus nothing to enforce ordering. If you use memory_order_relaxed, or even the acquire/release forms, things do get MUCH more complicated. The equivalent rules for other languages are an interesting story. I believe Posix implentors assume the above rules. I have a PPoPP 2007 paper that explains why that's not exactly what Posix actually states. OpenMP 3.0 has stricter rules which are also not consistently obeyed, and are somewhat fixed in OpenMP 3.1. (See my MSPC 11 paper for some details.) Java has arguably the same rules as C++ in this respect. In my mind, the main way in which sequential consistency differs from the cheaper acquire/release semantics that Alexander wants is that a) Dekker's algorithm (which by itself isn't that practically interesting), plus a number of real lock-free algorithms (which do matter) don't work. b) In my mind, acquire/release is a far more difficult model to explain to non- experts than the model for C++ with only sequentially consistent atomics. You can no longer reason about interleaving thread actions, either for purposes of determining the existence of a data-race, nor when defining what data-race- free programs mean. This was extensively discussed in C++ meetings starting in about 2005. The current solution with default-sequentially-consistent atomics, but explicit support for weaker ordering, was the resulting committee compromise. Hans