Re: [boost] Notice: Boost.Atomic (atomic operations library)

30 Nov 2009

      Helge Bahmann wrote:
...
Hi Phil!
Thanks for your interest, and I appreciate any help for Arm, as I don't have 
this architecture available.
Currently my ARM v4 (XScale) dev system is a bit broken, but I might be 
able to fix it.  I have working v6/v7 systems.
...
Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott:
[snip]
...
Architecture v6 introduced 32-bit load-locked/store-conditional
instructions. Architecture v7 introduced 16- and 8-bit versions.
The library already has infrastructure in place to emulate 8- and 16-bit 
atomics by "embedding" them into a properly aligned 32-bit atomic 
(created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha 
require this already, as they do not have 8/16-bit ll/sc. This is of course 
slower than native 8-/16-bit versions, but is workable.
I will shortly be adding a small howto on adding platform support to the 
library.
That will be useful.
...
...
ARM Linux has kernel support that provides compare-and-swap even on
processors that don't support it by guaranteeing to not interrupt code
in certain address ranges.  This has the cost of a function call, i.e.
it's slower than inline assembler but a lot faster than a system call.
Kernels that don't support this are now sufficiently old that I think
they can be ignored.  Newer versions of gcc may use this mechanism when
the atomic builtins are used, but versions of gcc that don't do this
are sufficiently widespread that they should still be supported
efficiently.
these functions are part of libc, glibc or the vdso?
It's something provided by the kernel in a vdso-like way; I'm not sure 
if it's actually vdso.  For the details google for __kernel_cmpxchg 
and/or look at entry-armv.S in the kernel source.
...
...
I believe that OS X on ARM (i.e. the iPhone) always runs on
architecture v6 or newer.  However Apple supply a version of gcc that
is too old to support ARM atomics via the builtins.  The "recommended"
way to do atomics is via a set of function calls described here: 
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa
ges/man3/atomic.3.html I have not looked at what these functions do or tried
to benchmark them.  They are also available on other OS X platforms.
these should easily be usable, but
- the *Barrier versions are still stronger than what is required (see below)
- there are no "Load with Barrier" and "Store with Barrier" operations, these 
would have to be emulated with compare_exchange
Since these devices are (currently) all uniprocessor, many of these 
issues are (currently) unimportant.
...
...
I note that you don't seem to use the gcc atomic builtins even on
platforms where they have worked for a while e.g. x86.  Any reason for
that?
on x86 it would not matter; on all other platforms, the intrinsics have the 
unfortunate side-effect of always acting as (usually bi-directional) memory 
barriers. There are however legitimate use cases, for example the following 
operation (equivalent to __sync_fetch_and_add):
atomic<int>::fetch_add(1, memory_order_acq_rel)
is 2 to 3 times slower on ppc than the version not enforcing memory ordering:
atomic<int>::fetch_add(1, memory_order_relaxed)
If you always use fully-fenced versions, then any lock-free algorithm will 
usually be noticeably *slower* than the platform's native mutex lock/unlock 
operation (which use only the weakest barriers necessary), making the whole 
exercise rather pointless.
Right.

Cheers,  Phil.