Re: [boost] [atomic_count] Discrepancy between gcc and solaris implementation

9 Sep 2007


      At 11:11 AM +0200 9/9/07, Corrado Zoccolo wrote:
...
when performance tuning a really tight loop in a simple program, I
 found that reading the value of an atomic count is unexpectedly slow
 on a gcc platform.
[...]
Is there a compelling reason to use the locked operation with gcc, or
 a simpler volatile access can serve the same purpose?
[...]
Do you see any drawback in changing the access to the counter to a
 simple volatile access, at least when the platform is known to be an
 IA32?
Don't do that. It won't work properly on a multi-processor
system. Memory barriers are needed to ensure correct operation on such
systems, and gcc (x86) does not generate a memory barrier for a
volatile load.
...
[... quoting existing implementation for gcc ...]
     operator long() const
     {
       return __exchange_and_add(&value_, 0);
     }
The use of __exchange_and_add here is a way to perform a load-acquire
operation (a somewhat clumsy way, presumably necessary in the absence
of a more direct (and possibly better performing) mechanism). The
"acquire" qualifier indicates the kind of memory barrier needed.
...
I checked other implementations, and for example solaris has the 
much lighter:
     operator uint32_t() const
     {
         return static_cast<uint32_t const volatile &>( value_ );
     }
Because the (current) standard does not address threads and such at
all, different implementations have associated different semantics
with "volatile" in the presence of threads.  I expect that *on
solaris* one would find a memory barrier generated for this code
sequence.

Re: [boost] [atomic_count] Discrepancy between gcc and solaris implementation

Kim Barrett