Re: [boost] [lockfree] review

27 Aug 2011

      [Dave Abrahams pointed me at this thread.]

Alexander Terekhov <terekhov <at> web.de> writes:
...
Dave Abrahams wrote:
...
on Thu Aug 25 2011, Alexander Terekhov <terekhov-AT-web.de> wrote:
...
Ilya Sokolov wrote:
...
AFAIK, in the example above another thread can't observe reordering
without introducing UB.
I meant "observed" as in "check the sequence of writes in the generated
code".
Do we care about what the generated code looks like?  IIUC what counts
is the sequence of memory states that another another thread can see.
...
But int can be simply changed to relaxed atomic<int> so that another
thread can observe reordering without UB.
Sure.
I meant that for the example upthread, with an int instead of relaxed
atomic<int>, another thread's code triggering UB does not prevent us
from observing reordered writes in the writing thread's code (in
generated code), by looking at it.
Changing int to relaxed atomic<int> simply removes UB but does not
change the relevant code in the writing thread.
regards,
alexander.
I agree with Alexander and most of the recent postings.  If I write

mx.lock();
x = 1;
mx.unlock();
my.lock();
y = 1;
my.unlock();

There is nothing to prevent the stores to x and y from becoming visible out of 
order at the hardware level.  Enforcing ordering here would add needless and 
substantial cost.  But a data-race-free program cannot observe such 
reordering.  (One reference for that is Sarita Adve's and my PLDI 08 paper. 
Unofficial TR version at http://www.hpl.hp.com/techreports/2008/HPL-2008-
56.html.  I believe Mark Batty et al have a much more complete and formal 
proof.)

If we change x and y to be atomic variables, and the stores to be 
memory_order_relaxed, then an observer can tell the difference.  I.e. if x and 
y are initially zero, and another thread does

r1 = y;
r2 = x;

(or the acquire versions), it can see r1 = 1 and r2 = 0, since there are no 
synchronizes-with relationships between threads, and thus nothing to enforce 
ordering.  If you use memory_order_relaxed, or even the acquire/release forms, 
things do get MUCH more complicated.

The equivalent rules for other languages are an interesting story.  I believe 
Posix implentors assume the above rules.  I have a PPoPP 2007 paper that 
explains why that's not exactly what Posix actually states.  OpenMP 3.0 has 
stricter rules which are also not consistently obeyed, and are somewhat fixed 
in OpenMP 3.1.  (See my MSPC 11 paper for some details.)  Java has arguably 
the same rules as C++ in this respect.

In my mind, the main way in which sequential consistency differs from the 
cheaper acquire/release semantics that Alexander wants is that

a) Dekker's algorithm (which by itself isn't that practically interesting), 
plus a number of real lock-free algorithms (which do matter) don't work.

b) In my mind, acquire/release is a far more difficult model to explain to non-
experts than the model for C++ with only sequentially consistent atomics.  You 
can no longer reason about interleaving thread actions, either for purposes of 
determining the existence of a data-race, nor when defining what data-race-
free programs mean.

This was extensively discussed in C++ meetings starting in about 2005.  The 
current solution with default-sequentially-consistent atomics, but explicit 
support for weaker ordering, was the resulting committee compromise.

Hans

Re: [boost] [lockfree] review

Boehm, Hans