Re: [boost] [fiber] new version in vault

30 Nov 2009

      Am Monday 30 November 2009 18:48:21 schrieb Phil Endecott:
...
Helge Bahmann wrote:
...
why would you ever want to use a mutex that does not properly inform the
system scheduler until when the calling thread should be suspended (in
case of contention) for "true" inter-thread (or even inter-process)
coordination?
When you believe that the probability of contention is very small, and
you care only about average and not worst-case performance, and you
wish to avoid the overhead (e.g. time or code or data size) of
"properly informing" the system scheduler.
in the non-contention case, a properly implemented platform mutex will 
(unsurprisingly):

1. compare_exchange with acquire semantic
2. detect contention with a single compare of the value obtained in step 1

< your protected code here >

3. compare_exchange with release semantic
4. detect contention with a single compare of the value obtained in step 3 and 
wake suspended threads

A clever implementation of course handles contention out-of-line. if you don't 
believe me, just disassemble pthread_mutex_lock/unlock on any linux system.

FWIW, I just exercised a CAS-based mutex as you proposed (using the 
__sync_val_compare_and_exchange intrinsic), in a tight lock/unlock cycle  on 
a Linux/PPC32 system and... it is 25% *slower* than the glibc 
pthread_mutex_lock/unlock based one! This is the "no contention case" you aim 
to optimize for... (a "proper" CAS-based mutex using inline assembly and with 
weaker memory barriers, is 10% faster, mainly because it eliminates the 
function call overhead). BTW we are talking about gains of ~5-10 clock cycles 
per operation here...

As for the space overhead of a pthread_mutex_t... if you cannot pay that, just 
use hashed locks.

Last note: calling "sched_yield" on contention is about the *worst* thing you 
can do -- linux/glibc will call futex(..., FUTEX_WAIT, ...) instead on 
contention, which can properly suspend the thread exactly until the lock is 
released *and* is about 1/2 - 2/3 the cost of sched_yield in the case the 
lock was released before the thread could be put to sleep.

Whatever gains you think you may achieve, you won't. There is justification 
for rolling your own locking scheme for user-space scheduling (as the fiber 
library does), but otherwise just don't do it.

Helge