
Helge Bahmann wrote:
Hi Phil!
Thanks for your interest, and I appreciate any help for Arm, as I don't have this architecture available.
Currently my ARM v4 (XScale) dev system is a bit broken, but I might be able to fix it. I have working v6/v7 systems.
Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott: [snip]
Architecture v6 introduced 32-bit load-locked/store-conditional instructions. Architecture v7 introduced 16- and 8-bit versions.
The library already has infrastructure in place to emulate 8- and 16-bit atomics by "embedding" them into a properly aligned 32-bit atomic (created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha require this already, as they do not have 8/16-bit ll/sc. This is of course slower than native 8-/16-bit versions, but is workable.
I will shortly be adding a small howto on adding platform support to the library.
That will be useful.
ARM Linux has kernel support that provides compare-and-swap even on processors that don't support it by guaranteeing to not interrupt code in certain address ranges. This has the cost of a function call, i.e. it's slower than inline assembler but a lot faster than a system call. Kernels that don't support this are now sufficiently old that I think they can be ignored. Newer versions of gcc may use this mechanism when the atomic builtins are used, but versions of gcc that don't do this are sufficiently widespread that they should still be supported efficiently.
these functions are part of libc, glibc or the vdso?
It's something provided by the kernel in a vdso-like way; I'm not sure if it's actually vdso. For the details google for __kernel_cmpxchg and/or look at entry-armv.S in the kernel source.
I believe that OS X on ARM (i.e. the iPhone) always runs on architecture v6 or newer. However Apple supply a version of gcc that is too old to support ARM atomics via the builtins. The "recommended" way to do atomics is via a set of function calls described here: http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa ges/man3/atomic.3.html I have not looked at what these functions do or tried to benchmark them. They are also available on other OS X platforms.
these should easily be usable, but - the *Barrier versions are still stronger than what is required (see below) - there are no "Load with Barrier" and "Store with Barrier" operations, these would have to be emulated with compare_exchange
Since these devices are (currently) all uniprocessor, many of these issues are (currently) unimportant.
I note that you don't seem to use the gcc atomic builtins even on platforms where they have worked for a while e.g. x86. Any reason for that?
on x86 it would not matter; on all other platforms, the intrinsics have the unfortunate side-effect of always acting as (usually bi-directional) memory barriers. There are however legitimate use cases, for example the following operation (equivalent to __sync_fetch_and_add):
atomic<int>::fetch_add(1, memory_order_acq_rel)
is 2 to 3 times slower on ppc than the version not enforcing memory ordering:
atomic<int>::fetch_add(1, memory_order_relaxed)
If you always use fully-fenced versions, then any lock-free algorithm will usually be noticeably *slower* than the platform's native mutex lock/unlock operation (which use only the weakest barriers necessary), making the whole exercise rather pointless.
Right. Cheers, Phil.