Re: [boost] Notice: Boost.Atomic (atomic operations library)

30 Nov 2009

      Hi Phil!

Thanks for your interest, and I appreciate any help for Arm, as I don't have 
this architecture available.

Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott:
[snip]
...
Architecture v6 introduced 32-bit load-locked/store-conditional
instructions. Architecture v7 introduced 16- and 8-bit versions.
The library already has infrastructure in place to emulate 8- and 16-bit 
atomics by "embedding" them into a properly aligned 32-bit atomic 
(created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha 
require this already, as they do not have 8/16-bit ll/sc. This is of course 
slower than native 8-/16-bit versions, but is workable.

I will shortly be adding a small howto on adding platform support to the 
library.
...
ARM Linux has kernel support that provides compare-and-swap even on
processors that don't support it by guaranteeing to not interrupt code
in certain address ranges.  This has the cost of a function call, i.e.
it's slower than inline assembler but a lot faster than a system call.
Kernels that don't support this are now sufficiently old that I think
they can be ignored.  Newer versions of gcc may use this mechanism when
the atomic builtins are used, but versions of gcc that don't do this
are sufficiently widespread that they should still be supported
efficiently.
these functions are part of libc, glibc or the vdso?
...
I believe that OS X on ARM (i.e. the iPhone) always runs on
architecture v6 or newer.  However Apple supply a version of gcc that
is too old to support ARM atomics via the builtins.  The "recommended"
way to do atomics is via a set of function calls described here: 
http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa
ges/man3/atomic.3.html I have not looked at what these functions do or tried
to benchmark them.  They are also available on other OS X platforms.
these should easily be usable, but
- the *Barrier versions are still stronger than what is required (see below)
- there are no "Load with Barrier" and "Store with Barrier" operations, these 
would have to be emulated with compare_exchange
...
I note that you don't seem to use the gcc atomic builtins even on
platforms where they have worked for a while e.g. x86.  Any reason for
that?
on x86 it would not matter; on all other platforms, the intrinsics have the 
unfortunate side-effect of always acting as (usually bi-directional) memory 
barriers. There are however legitimate use cases, for example the following 
operation (equivalent to __sync_fetch_and_add):

	atomic<int>::fetch_add(1, memory_order_acq_rel)

is 2 to 3 times slower on ppc than the version not enforcing memory ordering:

	atomic<int>::fetch_add(1, memory_order_relaxed)

If you always use fully-fenced versions, then any lock-free algorithm will 
usually be noticeably *slower* than the platform's native mutex lock/unlock 
operation (which use only the weakest barriers necessary), making the whole 
exercise rather pointless.

Cheers Helge

Re: [boost] Notice: Boost.Atomic (atomic operations library)

Helge Bahmann