
On 28/03/2011 14:03, Peter Dimov wrote:
Hi everyone,
Ticket #5372:
https://svn.boost.org/trac/boost/ticket/5372
says that shared_ptr's ARM spinlock implementation (which uses the swp instruction) doesn't work properly on iPad2 (which has a dual core ARM processor). The sample program in the ticket compares it to a loop using __sync_fetch_and_add, which means that the __sync intrinsics are implemented by the compiler the submitter is using. These didn't work on gcc for ARM when we tested them, but may have been added meanwhile. (I can see some code samples that test for 4.4, but the official docs state that ARM intrinsics are only supported on Linux before 4.6, which was released yesterday.)
So, we have two questions; first, why does the swp-based spinlock fail, and second, how can we detect support for __sync intrinsics and use them.
Anybody with ARM knowledge and iPad2 development access?
It appears SWP does not work across multiple cores because it doesn't perform a memory barrier. Wrap the code in calls to DMB or better yet, rewrite it to use LDREX/STREX. I'll test on my dual-core Cortex-A9 when I have the time.