
Roland Schwarz <roland.schwarz@chello.at> writes:
I would be glad if we could (re)start a discussion about the topic. Perhaps I am not the only one to benefit from this.
Sounds sensible.
Following are some things I learned, but this might be wrong, and I would appreciate clarification. Also some questions:
1) atomicity (in this specialized context) is about optimizing the pattern: enter_critical_section; do_something; leave_critical_section; by making use of processor/platform specific means.
Essentially, yes. Other CPUs/threads will either see the state before the atomic op, or after, but never a "partial" effect. On x86, normal reads and writes to suitably-aligned 32-bit values are atomic in this sense.
In particular in presence of multiple processors. I.e. an atomic lib is primarily about performance.
Not just about performance. It also enables the construction of the higher-level primitives. Atomic instructions also affect visibility, which is addressed below.
2) atomicity better would be addressed by the compiler, given a suitable memory model, than as a library.
Yes.
3) Despite 2) it would be possible to write a library, but it will be hard to get processor independent semantics. E.g. there is one concept of read/write/full memory barriers or another of acquire/release semantics for SMP.
I think that the memory barrier and acquire/release semantics are just two ways of talking about the same thing. As I understand it, on x86, the SFENCE instruction is a "Store Fence", which is a "Write Barrier", and has "Release Semantics". Any store instructions which happen before it on this CPU are made globally visible afterwards. No stores instructions which occur afterwards on this CPU are permitted to be globally visible beforehand. Again on x86, the LFENCE instruction is a "Load Fence", which is a "Read Barrier", and has "Acquire Semantics". Any read instructions which happen before it on this CPU must have already completed afterwards. No loads instructions which occur afterwards on this CPU are permitted to be executed beforehand. A full memory barrier, the MFENCE instruction on x86, does both. There is also the concept of a "raw" atomic operation, which does not have any impact on memory visibility, except it is either done or not done. As described above, on x86 this applies to all suitably-aligned 32-bit reads and writes. Some atomic operations also incorporate a full memory barrier. On x86, these are those ops that assert the LOCK# signal, which include XCHG (with or without the LOCK prefix), LOCK CMPXCHG, LOCK INC and LOCK ADD, amongst others.
4) Does there exist a canonical set of atomic primitives, from which others can be built?
Yes, I'm sure there is, but I'd have to think hard to work out what the minimal set is. I expect that there are several possible such sets.
5) Is it worth the effort to create a library with processor independent semantics, at the price of not being optimal? E.g. by doing away with the various kinds of barriers, instead simply requiring atomicity and full memory barrier semantics for the operation? Which operations, besides load and store would be essential?
I think it's worth the effort. For processor independence, you could just specify that the barriers are "at least" what is specified --- if you specify a read barrier, you might get a store barrier too, and vice versa.
Sorry if this is not the perfect list to discuss the topic, but I think boost could possibly benefit from such a library, as previous discussions let me believe.
The details of the memory model, atomics, and visibility, and how it applies to C++, are under discussion amongst C++ standards committee members. I would imagine that you'd be welcome to join such discussions. Anyway, this is important to boost, if we're going to provide a library that does it. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk