Re: [boost] [lock-free] CDS -yet another lock-free library

31 Mar 2010


      Helge Bahmann wrote
...
...
Second solution: use 64bit CAS to load/store 64bit values on x86. It 
seems too heavy for just loading/storing it isn't?
...
this is actually what I do in Boost.Atomic; I *think* it is cheaper than 
shuffling around the values between SSE and general purpose registers (it 
sure is cheaper than MMX considering you also have to issue emms)
It's easy to test!
Express test: CDS's RecursiveSpinLock<atomic64_t> (load64 is used actively by TATAS algo for busy wait when CAS acquiring the lock is failed)
Equipment: WinXP Intel Core2 (3GHz, 2 core, no HT), MSVC++ 2008, release build with full optimization

SSE2 load64:
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {
  __m128i volatile v = _mm_loadl_epi64( (__m128i const *) pMem )    ;
  return v.m128i_i64[0]    ;
}

result (one of, average):
Spinlock_MT::recursiveSpinLock64
           Lock test, thread count=8 loop per thread=1000000...
             Duration=2.21852

CAS64 load64 (no CAS loop):
static inline atomic64_t load64( atomic64_t volatile const * pMem ) {  
  atomic64_t cur = 0 ;
  return _InterlockedCompareExchange64( const_cast<atomic64_t volatile *>(pMem), cur, cur ) ;
}

result (one of, average):
Spinlock_MT::recursiveSpinLock64
           Lock test, thread count=8 loop per thread=1000000...
             Duration=2.79662

+20% performance for SSE2. Not so bad I wait more :)
Unfortunately, I have no access to multi-processor Win32 server for testing now.
Note, boost.atomic uses CAS-based loop for load64, so, I think the performance gain could be more.

Regards, Max

Re: [boost] [lock-free] CDS -yet another lock-free library

Khiszinsky, Maxim