
Hans Boehm wrote: [...]
Although the existence of such an algorithm is interesting, the point is that avoiding the equivalent of the sequentuially consistent store here is incredibly complicated, appears to require scheduler help and, at least for me, would require careful study of the proof before I were willing to implement it. On the other hand, the algorithm using the equivalent of sequentially consistent atomics (which I think is still far more widely used) is orders of magnitude simpler. If you understand the complex algorithm, we assume you can add the ordering constraints corectly.
With waiters bit (0x80000000) and atomic RMW, mutex is simply: (pseudo-code below) WHILE atomic_bit_test_set_hlb(&lock, 1) // efficient acquire MO lock_queue_wait(&lock, 1) // wait if locked bit is set for lock() and uintptr_t lock_queue IF atomic_decrement_rel(lock_queue = &lock) // release MO THEN lock_queue_wake(lock_queue, 1) for unlock() (lock_queue uintptr_t pointer may become indeterminate by the time of lock_queue_wake() and implementation should handle it (spurious wakes are okay)). Similarly, for semaphore: sema_lock: WHILE !atomic_decrement_if_binand_7FFFFFFF_is_not_zero_hlb(&lock) lock_queue_wait(&lock, 0) // wait if sema value is zero sema_unlock: uintptr_t lock_queue IF atomic_increment_rel(lock_queue = &lock) > 0x80000000 THEN lock_queue_wake(lock_queue, 1) regards, alexander.