Re: [boost] Shared pointer atomic assembler for ARM?

6 Sep 2007

      Hi Peter,

Peter Dimov wrote:
...
Phil Endecott:
...
I note that shared_ptr uses architecture-specific assembler for the
atomic operations needed for thread safe operations, on x86, ia64 and
ppc; it falls back to pthreads for other architectures.  Has anyone
quantified the performance benefit of the assembler?
Assuming that the benefit is significant, I'd like to implement it for
ARM.  Has anyone else looked at this?
ARM has a swap instruction.  I have a (very vague) recollection that
perhaps some of the newer chips have some other locked instructions
e.g. test-and-set, but I would want to code to the lowest common
denominator i.e. swap only.  Is this sufficient for what shared_ptr wants?
I note that since 4.1, gcc has provided built-in functions for atomic
operations.  But it says that "Not all operations are supported by all
target processors", and the list doesn't include swap; so maybe this
isn't so useful after all.
Can you try the SVN trunk version of shared_ptr and look at the assembly? 
detail/sp_counted_base.hpp should choose sp_counted_base_sync.hpp for g++ 
4.1 and higher and take advantage of the built-ins.
Well it's quicker for me to try this:

int x;

int main(int argc, char* argv[])
{
   __sync_fetch_and_add(&x,1);
}

$ arm-linux-gnu-g++ --version
arm-linux-gnu-g++ (GCC) 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)

$ arm-linux-gnu-g++ -W -Wall check_sync_builtin.cc
check_sync_builtin.cc:3: warning: unused parameter ‘argc’
check_sync_builtin.cc:3: warning: unused parameter ‘argv’
/tmp/ccwWxfsT.o: In function `main':
check_sync_builtin.cc:(.text+0x20): undefined reference to `__sync_fetch_and_add_4'
collect2: ld returned 1 exit status

(It does compile on x86, and the disassembly includes a "lock addl" instruction.)

As I mentioned before, gcc doesn't implement these atomic builtins on 
all platforms, i.e. it doesn't implement them on platforms where the 
hardware doesn't provide them.  I don't fully understand how this all 
works in libstdc++ (there are too many levels of #include and #if for 
me to follow) but there seems to be a __gnu_cxx::__mutex that they can 
use in those cases.
...
To answer your question: no, a mere swap instruction is not enough for 
shared_ptr, it needs atomic increment, decrement and compare and swap.
Well, I think you can implement a spin-lock mutex with swap:

int mutex=0;  // 0 = unlocked, 1 = locked

void lock() {
   do {
     int n=1;
     swap(mutex,n);  // atomic swap instruction
   } while (n==1);   // if n is 1 after the swap, the mutex was already locked
}

void unlock() {
   mutex=0;
}

So you could using something like that to protect the reference counts, 
rather than falling back to the pthread method.  Or alternatively, 
could you use a sentinel value (say -1) in the reference to indicate 
that it's locked:

int refcount;

int read_refcount() {
   do {
     int r = refcount;
   } while (r==-1);
   return r;
}

int adj_refcount(int adj) {
   int r=-1;
   do {
     swap(refcount,r);
   } while (r==-1);
   refcount = r+adj;
}

(BTW, for gcc>=4.1 on x86 would you plan to use the gcc builtins or the 
existing Boost asm?)

Regards,

Phil.