Re: [boost] [lockfree] review

8 Aug 2011


      on Sat Aug 06 2011, Gottlob Frege <gottlobfrege-AT-gmail.com> wrote:
...
On Mon, Aug 1, 2011 at 10:49 AM, Dave Abrahams <dave@boostpro.com> wrote:
...
on Mon Aug 01 2011, Gordon Woodhull <gordon-AT-woodhull.com> wrote:
...
On Jul 31, 2011, at 1:05 PM, Dave Abrahams wrote:
...
I don't get it.  Please, show me an example.
I am definitely not an expert.  I'll quote a diagram from this
fascinating paper by Adve and Boehm [1], using Dekker's mutual
exclusion algorithm as an example
Initially X = Y = 0
Thread 1
X = 1;
r1 = Y;
Thread 2
Y = 1;
r2 = X;
Can r1 = r2 = 0?
Sure they can, that's a race condition: Thread1 modifies X while thread
2 reads it.  Anything could happen.  Why would anyone expect otherwise.
Most people are surprised that *both* r1 and r2 can be 0 at the same time.
"At the same time" is ill-defined until you put in some kind of
synchronization.  If you synchronize those two threads at the end I
expect all their effects to have taken place.  But they would still have
a race condition so I wouldn't expect any particular result.
...
We have been trained that our code (single threaded) happens
sequentially. So
Y = 1;
r2 = X;
means "set Y **THEN** read X" - *in that order*
But it doesn't mean that!
...
now, really, we don't know or care about the true order (from the
point of view of this CPU) until we later read it (from this CPU), but
that's how we imagine it.  In the single-threaded world we grew up in.
Who is "we," Kimosabe?

I *really* don't think about code that way.  I generally try to avoid
writing code that makes me think about the order in which things happen.
...
So imagine the race you mention - thread1 is writing X while thread2
is reading it.  In the traditional view - ie assuming a thread of code
happens in the same order that it was written - this means:
A. thread1 has not yet read Y       (thread1 is currently writing X,
and reading Y happens next)
B. thread2 has finished setting Y   (because it is currently reading
X, which happens after setting Y)
----
C. therefore thread1 *must* read Y after it was set, therefore it will
see Y==1, therefore r1 == 1.
But my point is that even if this reasoning held, it is a hopelessly
un-scalable way to think about parallel execution.  I can't reason that
way about the behavior of a complex system without having a nervous
breakdown, and I don't believe it's tractable for most other people
either.  That's why I'm saying that sequential consistency is not a
game-changer in terms of reasoning about parallelism.
...
If you grew up coding in different circumstances (you once mentioned
doing lots of music/sound stuff early on - which tends to be very
lock-free-ish), then maybe you don't make the traditional "N happens
before N+1" assumptions.
I don't think I ever had to confront the issue.
...
Having said all that, it doesn't mean thinking about re-orderings
helps anything.  That I agree with.  The key isn't "here's more to
think about" (reordering), it is "here's less to think about" - ie
less to assume - stop thinking a single thread happens exactly one
line after the other.  But this seems to be hard to remember NOT to
think about, since we've been thinking that way for years.
It is better to think about how writing shared variables exposes
state, and whether that correctly exposes only invariants. You may
already be thinking this way.
Yes, that's exactly what I'm saying.
...
eg:
Imagine the invariant is assert(y >= x).  Imagine p is local, and the
invariant only need to hold once p is exposed globally.
p->x = p->y = 0;  // invariant OK
...
p->x = 10;          // invariant temporarily broken, but local, so OK
p->y = 20;          // invariant restored
// can now expose:
globalP = p; // globally expose state here, after invariant is ensured.
Thinking of "invariant exposure" makes it simple to realize that the
globalP = p is the point of exposure (or "publication").  For
threading and our memory model, this is also exactly where the barrier
is needed (if you want to think about barriers) and exactly where the
atomic operation is needed (or where the mutex can be unlocked).
Yes.
...
So just make globalP atomic.  No need to think about reordering.
Only if the atomic comes with the barriers that ensure the writes to *p
have happened before globalP is modified... which is what sequential
consistency (the default for atomics in C++11) provides.
...
But if people ask why the point of exposure is so important, or why
globalP needs to be atomic or how it could possibly fail if globalP
was just a pointer, you need to explain that p->y = 20 can happen (or
be seen) after globalP = p.  Which is like the r1 == r2 == 0 example -
it goes against tradition.
No you don't.  All you need to say is that *every* datastructure (even a
pointer like globalP) has an invariant that is broken during
modification.  The implication is that unless globalP is atomic the
other thread might see a broken value of it and then all bets are off.
...
Most people would think that once globalP has been set, then
globalP->y = 20. Since that's the order the lines of code are written
in. But that's only true if globalP is an atomic.
You don't even need to think as far as globalP->y if you assume globalP
can be totally broken unless access is synchronized.
...
Once they believe and/or understand that "invariant exposure happens
on shared variables therefore shared variables must be atomic", then
reordering can be forgotten.  You can then think of your code as
blocks that happen between read/writes of atomics (ie atomics are the
sequence points).  Just like critical sections are blocks of code.
You have blocks defined by published-invariant sequence points.  Or
something like that.
Something like that.  *If* programming with atomics is for mortals.
...
So I agree "sequential consistency" doesn't help much.  Nor does
reordering.  Think about publishing invariants.
I think about "not exposing broken invariants."
...
(Actually what "sequential consistency" of the atomics means is that
you can now reason about the large code blocks defined by those
sequence points, ignoring their details.  So that is useful.)
Tony
(For the pedantic: in this example, sequential consistency was
overkill.  we only needed "release" semantics on globalP = p)
Right.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com