
Hi Peter, Peter Dimov wrote:
Phil Endecott:
I note that shared_ptr uses architecture-specific assembler for the atomic operations needed for thread safe operations, on x86, ia64 and ppc; it falls back to pthreads for other architectures. Has anyone quantified the performance benefit of the assembler?
Assuming that the benefit is significant, I'd like to implement it for ARM. Has anyone else looked at this?
ARM has a swap instruction. I have a (very vague) recollection that perhaps some of the newer chips have some other locked instructions e.g. test-and-set, but I would want to code to the lowest common denominator i.e. swap only. Is this sufficient for what shared_ptr wants?
I note that since 4.1, gcc has provided built-in functions for atomic operations. But it says that "Not all operations are supported by all target processors", and the list doesn't include swap; so maybe this isn't so useful after all.
Can you try the SVN trunk version of shared_ptr and look at the assembly? detail/sp_counted_base.hpp should choose sp_counted_base_sync.hpp for g++ 4.1 and higher and take advantage of the built-ins.
Well it's quicker for me to try this: int x; int main(int argc, char* argv[]) { __sync_fetch_and_add(&x,1); } $ arm-linux-gnu-g++ --version arm-linux-gnu-g++ (GCC) 4.1.2 20061028 (prerelease) (Debian 4.1.1-19) $ arm-linux-gnu-g++ -W -Wall check_sync_builtin.cc check_sync_builtin.cc:3: warning: unused parameter ‘argc’ check_sync_builtin.cc:3: warning: unused parameter ‘argv’ /tmp/ccwWxfsT.o: In function `main': check_sync_builtin.cc:(.text+0x20): undefined reference to `__sync_fetch_and_add_4' collect2: ld returned 1 exit status (It does compile on x86, and the disassembly includes a "lock addl" instruction.) As I mentioned before, gcc doesn't implement these atomic builtins on all platforms, i.e. it doesn't implement them on platforms where the hardware doesn't provide them. I don't fully understand how this all works in libstdc++ (there are too many levels of #include and #if for me to follow) but there seems to be a __gnu_cxx::__mutex that they can use in those cases.
To answer your question: no, a mere swap instruction is not enough for shared_ptr, it needs atomic increment, decrement and compare and swap.
Well, I think you can implement a spin-lock mutex with swap: int mutex=0; // 0 = unlocked, 1 = locked void lock() { do { int n=1; swap(mutex,n); // atomic swap instruction } while (n==1); // if n is 1 after the swap, the mutex was already locked } void unlock() { mutex=0; } So you could using something like that to protect the reference counts, rather than falling back to the pthread method. Or alternatively, could you use a sentinel value (say -1) in the reference to indicate that it's locked: int refcount; int read_refcount() { do { int r = refcount; } while (r==-1); return r; } int adj_refcount(int adj) { int r=-1; do { swap(refcount,r); } while (r==-1); refcount = r+adj; } (BTW, for gcc>=4.1 on x86 would you plan to use the gcc builtins or the existing Boost asm?) Regards, Phil.