
Alexander Terekhov <terekhov <at> web.de> writes:
Hans Boehm wrote: [...]
Although the existence of such an algorithm is interesting, the point is
that
avoiding the equivalent of the sequentuially consistent store here is incredibly complicated, appears to require scheduler help and, at least for me, would require careful study of the proof before I were willing to implement it. On the other hand, the algorithm using the equivalent of sequentially consistent atomics (which I think is still far more widely used) is orders of magnitude simpler. If you understand the complex algorithm, we assume you can add the ordering constraints corectly.
With waiters bit (0x80000000) and atomic RMW, mutex is simply: (pseudo-code below)
WHILE atomic_bit_test_set_hlb(&lock, 1) // efficient acquire MO lock_queue_wait(&lock, 1) // wait if locked bit is set
for lock() and
uintptr_t lock_queue IF atomic_decrement_rel(lock_queue = &lock) // release MO THEN lock_queue_wake(lock_queue, 1) I'm not sure I understand your syntax here.
I don't see the correctness argument. What prevents the sequence: 1. In thread T0, performing unlock, atomic_decr_rel update completes, sees no waiters, puts updated (released) lock word in store buffer. (I'm informally viewing memory_order_release as just putting a value in a store buffer, which I think is reasonable here.) 2. Thread T1 tries to acquire the lock, sees it still held, since store buffer hasn't drained. Enqueues itself. 3. Thread T0s store buffer drains. Thread T1 is now waiting on an unlocked lock. The real point here again is that such arguments are subtle. I don't want to have to deal with them unless performance considerations warrant that I do, especially since we have lots of evidence that even experts get these things wrong. Hans