Re: [boost] [lockfree] review

13 Sep 2011

      Alexander Terekhov <terekhov <at> web.de> writes:
...
Hans Boehm wrote:
[...]
...
Although the existence of such an algorithm is interesting, the point is
that
...
...
avoiding the equivalent of the sequentuially consistent store here is
incredibly complicated, appears to require scheduler help and, at least for
me, would require careful study of the proof before I were willing to
implement it.  On the other hand, the algorithm using the equivalent of
sequentially consistent atomics (which I think is still far more widely 
used)
is orders of magnitude simpler.  If you understand the complex algorithm, 
we
assume you can add the ordering constraints corectly.
With waiters bit (0x80000000) and atomic RMW, mutex is simply:
(pseudo-code below)
WHILE atomic_bit_test_set_hlb(&lock, 1) // efficient acquire MO
    lock_queue_wait(&lock, 1) // wait if locked bit is set
for lock() and
uintptr_t lock_queue
  IF atomic_decrement_rel(lock_queue = &lock) // release MO
    THEN lock_queue_wake(lock_queue, 1) 
I'm not sure I understand your syntax here.
I don't see the correctness argument.  What prevents the sequence:

1. In thread T0, performing unlock, atomic_decr_rel update completes, sees no 
waiters, puts updated (released) lock word in store buffer.  (I'm informally 
viewing memory_order_release as just putting a value in a store buffer, which 
I think is reasonable here.)
2. Thread T1 tries to acquire the lock, sees it still held, since store buffer 
hasn't drained.  Enqueues itself.
3. Thread T0s store buffer drains.  Thread T1 is now waiting on an unlocked 
lock.

The real point here again is that such arguments are subtle.  I don't want to 
have to deal with them unless performance considerations warrant that I do, 
especially since we have lots of evidence that even experts get these things 
wrong.

Hans