
Phil Endecott:
Dear All,
I note that shared_ptr uses architecture-specific assembler for the atomic operations needed for thread safe operations, on x86, ia64 and ppc; it falls back to pthreads for other architectures. Has anyone quantified the performance benefit of the assembler?
Assuming that the benefit is significant, I'd like to implement it for ARM. Has anyone else looked at this?
ARM has a swap instruction. I have a (very vague) recollection that perhaps some of the newer chips have some other locked instructions e.g. test-and-set, but I would want to code to the lowest common denominator i.e. swap only. Is this sufficient for what shared_ptr wants?
I note that since 4.1, gcc has provided built-in functions for atomic operations. But it says that "Not all operations are supported by all target processors", and the list doesn't include swap; so maybe this isn't so useful after all.
Can you try the SVN trunk version of shared_ptr and look at the assembly? detail/sp_counted_base.hpp should choose sp_counted_base_sync.hpp for g++ 4.1 and higher and take advantage of the built-ins. To answer your question: no, a mere swap instruction is not enough for shared_ptr, it needs atomic increment, decrement and compare and swap.