
Am Monday 30 November 2009 18:48:21 schrieb Phil Endecott:
Helge Bahmann wrote:
why would you ever want to use a mutex that does not properly inform the system scheduler until when the calling thread should be suspended (in case of contention) for "true" inter-thread (or even inter-process) coordination?
When you believe that the probability of contention is very small, and you care only about average and not worst-case performance, and you wish to avoid the overhead (e.g. time or code or data size) of "properly informing" the system scheduler.
in the non-contention case, a properly implemented platform mutex will (unsurprisingly): 1. compare_exchange with acquire semantic 2. detect contention with a single compare of the value obtained in step 1 < your protected code here > 3. compare_exchange with release semantic 4. detect contention with a single compare of the value obtained in step 3 and wake suspended threads A clever implementation of course handles contention out-of-line. if you don't believe me, just disassemble pthread_mutex_lock/unlock on any linux system. FWIW, I just exercised a CAS-based mutex as you proposed (using the __sync_val_compare_and_exchange intrinsic), in a tight lock/unlock cycle on a Linux/PPC32 system and... it is 25% *slower* than the glibc pthread_mutex_lock/unlock based one! This is the "no contention case" you aim to optimize for... (a "proper" CAS-based mutex using inline assembly and with weaker memory barriers, is 10% faster, mainly because it eliminates the function call overhead). BTW we are talking about gains of ~5-10 clock cycles per operation here... As for the space overhead of a pthread_mutex_t... if you cannot pay that, just use hashed locks. Last note: calling "sched_yield" on contention is about the *worst* thing you can do -- linux/glibc will call futex(..., FUTEX_WAIT, ...) instead on contention, which can properly suspend the thread exactly until the lock is released *and* is about 1/2 - 2/3 the cost of sched_yield in the case the lock was released before the thread could be put to sleep. Whatever gains you think you may achieve, you won't. There is justification for rolling your own locking scheme for user-space scheduling (as the fiber library does), but otherwise just don't do it. Helge