Re: [boost] [thread] Some thoughts about "atomic"

1 Dec 2006

      Roland Schwarz <roland.schwarz@chello.at> writes:
...
I would be glad if we could (re)start a discussion about the topic. 
Perhaps I am not the only one to benefit from this.
Sounds sensible.
...
Following are some things I learned, but this might be wrong, and I 
would appreciate clarification. Also some questions:
1)     atomicity (in this specialized context) is about optimizing the 
pattern: enter_critical_section; do_something; leave_critical_section; 
by making use of processor/platform specific means.
Essentially, yes. Other CPUs/threads will either see the state before the
atomic op, or after, but never a "partial" effect.

On x86, normal reads and writes to suitably-aligned 32-bit values are atomic
in this sense.
...
In particular in 
presence of multiple processors. I.e. an atomic lib is primarily about 
performance.
Not just about performance. It also enables the construction of the
higher-level primitives.

Atomic instructions also affect visibility, which is addressed below.
...
2)     atomicity better would be addressed by the compiler, given a 
suitable memory model, than as a library.
Yes.
...
3)     Despite 2) it would be possible to write a library, but it will 
be hard to get processor independent semantics. E.g. there is one 
concept of read/write/full memory barriers or another of acquire/release 
semantics for SMP.
I think that the memory barrier and acquire/release semantics are just two
ways of talking about the same thing.

As I understand it, on x86, the SFENCE instruction is a "Store Fence", which
is a "Write Barrier", and has "Release Semantics". Any store instructions
which happen before it on this CPU are made globally visible afterwards. No
stores instructions which occur afterwards on this CPU are permitted to be
globally visible beforehand.

Again on x86, the LFENCE instruction is a "Load Fence", which is a "Read
Barrier", and has "Acquire Semantics". Any read instructions which happen
before it on this CPU must have already completed afterwards. No loads
instructions which occur afterwards on this CPU are permitted to be executed
beforehand.

A full memory barrier, the MFENCE instruction on x86, does both.

There is also the concept of a "raw" atomic operation, which does not have any
impact on memory visibility, except it is either done or not done. As
described above, on x86 this applies to all suitably-aligned 32-bit reads and
writes.

Some atomic operations also incorporate a full memory barrier. On x86, these
are those ops that assert the LOCK# signal, which include XCHG (with or
without the LOCK prefix), LOCK CMPXCHG, LOCK INC and LOCK ADD, amongst others.
...
4)     Does there exist a canonical set of atomic primitives, from which 
others can be built?
Yes, I'm sure there is, but I'd have to think hard to work out what the
minimal set is. I expect that there are several possible such sets.
...
5)     Is it worth the effort to create a library with processor 
independent semantics, at the price of not being optimal? E.g. by doing 
away with the various kinds of barriers, instead simply requiring 
atomicity and full memory barrier semantics for the operation? Which 
operations, besides load and store would be essential?
I think it's worth the effort. For processor independence, you could just
specify that the barriers are "at least" what is specified --- if you specify
a read barrier, you might get a store barrier too, and vice versa.
...
Sorry if this is not the perfect list to discuss the topic, but I think 
boost could possibly benefit from such a library, as previous 
discussions let me believe.
The details of the memory model, atomics, and visibility, and how it applies
to C++, are under discussion amongst C++ standards committee members. I would
imagine that you'd be welcome to join such discussions.

Anyway, this is important to boost, if we're going to provide a library that
does it.

Anthony
-- 
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk