
Hi Peter, Peter Dimov wrote:
Ticket #5372:
https://svn.boost.org/trac/boost/ticket/5372
says that shared_ptr's ARM spinlock implementation (which uses the swp instruction) doesn't work properly on iPad2 (which has a dual core ARM processor). The sample program in the ticket compares it to a loop using __sync_fetch_and_add, which means that the __sync intrinsics are implemented by the compiler the submitter is using. These didn't work on gcc for ARM when we tested them, but may have been added meanwhile. (I can see some code samples that test for 4.4, but the official docs state that ARM intrinsics are only supported on Linux before 4.6, which was released yesterday.)
So, we have two questions; first, why does the swp-based spinlock fail, and second, how can we detect support for __sync intrinsics and use them.
Anybody with ARM knowledge and iPad2 development access?
First let me say that the "right way" to fix this is surely to get Boost.Atomic finished and to use that as the basis of shared_ptr. I've contributed ARM code for Boost.Atomic that knows about the different architecture versions and will use ldrex/strex on ARMv7 (though it needs some attention from someone who knows more than I do about memory barriers, and it has had very little testing). I also have a trivial sp_counted_base_atomic.hpp that uses it. These are in use in a number of iPad apps and I've not yet had any reports of problems on the iPad 2 (fingers crossed). It seems that perhaps Helge doesn't have enough free time to finish this off - in that case, I think it's a sufficiently important library that we should perhaps consider how we can help to progress it. I could certainly contribute a modest amount of time and testing resource to it. In the meantime, my understanding is that SWP is "deprecated" in ARMv7 - except that it is a peculiarly strong kind of deprecation where you have to turn on a bit in a control register to enable it. I have asked Apple what they do with this bit on the iPad 2 (where of course the lockdown means individual apps cannot change it) and I await an answer. One other issue is that even when enabled, SWP might not have the required memory barrier semantics on the multi-processor systems, i.e. you might need to put explicit barrier instructions either side of it. I'm uncertain about this; it doesn't help that the ARMv7 architecture documents are still only available under NDA. Anyone here have copies? I don't yet have an iPad 2, but will eventually; I do have another dual-core ARM box with an Nvidia Tegra 2 chip, but I'm not sure if anything useful can be learnt from testing on it.
how can we detect support for __sync intrinsics and use them.
I believe that there is a macro something like __GCC_HAVE_SYNC_COMPARE_AND_SWAP__. The difficulty is that it was introduced well after the actual intrinsics were added, so there are gcc versions that do have the intrinsics but not the macro. Last time I checked, this was too much of an issue to ignore. Maybe things have moved on enough that this macro could now be used. Regards, Phil.