[boost] Re: [Boost.Thread] mutex Win32 implementation changes

20 Jul 2004

      Peter Dimov wrote:
[...]
...
Here's TryEnterCriticalSection (-recursivity) for reference:
bool try_lock( critical_section * p )
{
    return atomic_compare_exchange( p->LockCount, -1, 0 ) == 0;
}
The point of "swap based mutex" exercise was to avoid CAS because 
i386 doesn't have it. With CAS (and absent timedlock operation),
it can be made safe and fast (better than MS "critical" stuff).
...
CRITICAL_SECTIONs have no timed_lock.
That's because timedlock() inevitable introduces a race.
...
Your version can be made posix-safe at the expense of three atomic ops
instead of one in unlock:
void unlock()
{
    atomic_increment( refs_ );
// as before
if( atomic_decrement( refs_ ) == 0 )
    {
        CloseHandle( event_ );
    }
}
void destroy()
{
    if( atomic_decrement( refs_ ) == 0 )
    {
        CloseHandle( event_ );
    }
}
Yes. Or something along the lines of pthreads-win32 mutex (also 
two atomic ops... MS critical section lock/unlock ;-) ). I did it 
in order to support timedlock (it's used to solve the race with
respect to timed out waiters, but the same scheme would work
to synchronize the destruction of "swap based" thing -- simply 
lock/unlock that critsect in the destructor).
...
That's too slow, I guess?
Yes. AFAICS, the best solution (i386 does have atomic_bit_test_set):

mutex_trylock:

  RETURN !atomic_bit_test_set_ddacq(&lock, 1) 

mutex_lock:

  WHILE 
    atomic_bit_test_set_ddacq(&lock, 1) 
      lock_queue_wait(&lock, 1) // wait if locked bit is set

mutex_timedlock:

  WHILE 
    atomic_bit_test_set_ddacq(&lock, 1) 
      IF lock_queue_wait(&lock, 1, timeout) RETURN TRUE
  RETUN FALSE

mutex_unlock:

  uintptr_t lock_queue;
  IF atomic_decrement_rel(lock_queue = &lock)
    THEN lock_queue_wake(lock_queue, 1) 

or something like that with kernel lock_queue interface. It would 
be safe, i386-friendly, memory efficient, and fast.
...
But maybe you'd be able to think of a way to
somehow combine the refs_ manipulation with the lock_ manipulation?
It would need CAS and would still be "slow", I'm afraid.
...
The event pool solution would also work, but the problem there is that the
pool would need its own synchronization.
But it can be slow (MS "native" critical or mutex or event/same 
would work just fine). You don't usually need bizillions lock 
create/destroy per nanosecond.
...
Unless it's an "interlocked slist".
Yep.

regards,
alexander.

[boost] Re: [Boost.Thread] mutex Win32 implementation changes

Alexander Terekhov