
Helge Bahmann wrote:
On Thu, 3 Dec 2009, Phil Endecott wrote:
1. Linux kernel provided memory-barrier and CAS operations (only);
does any of these arm platforms (this is pre-v6 probably?) actually support smp? if not, then the barriers will probably be NOPs
I think the barrier is a DMB instruction, but in principle the kernel could put nothing there on uniprocessors. There's still the small overhead of the call, which we could consider omitting if we were certain that it was a uniprocessor.
out of curiosity -- DMB also enforces ordered MMIO access? This would be stronger than required.
I don't know.
If this is always an "emulated" CAS
It could be an ll/sc sequence on systems that have those instructions. I don't think that counts as "emulated" in this sense, so memory barriers are needed - right?
then I don't think DMB would be required under any circumstances -- if the system is uni-processor, then obviously no barrier is required. If it is multi-processor, then the emulation requires an internal spin-lock in the kernel, which must itself already include sufficient memory barriers.
Anyway, in this case I think I need to implement load, store and compare_exchange_weak using the kernel-provided functions and add your __build_atomic_from_minimal and __build_atomic_from_larger_type on top.
I'm not sure if the kernel-provided CAS is restarted or aborted on interruption
I'm pretty sure that currently it's restarted, but that may not be guaranteed.
, if it is restarted then it will not fail spuriously and qualifies for compare_exchange_strong -- in that case I would recommend to additionally manually implement "exchange", have c_ex_weak call c_ex_strong and use __build_atomic_from_exchange (yes, it's not that well-named).
I believe you, but I'm getting out of my depth here.
(BTW, why do you use leading __s ? I was under the impression that such identifiers were reserved.)
habit of mine to name really internal stuff that way, I can change it if it collides with boost coding style
I think that would be a good idea - "namespace detail" sufficiently identifies these things as being internal. While you're at it, I suggest adding some license/copyright headers.
2. Asm load-locked/store-conditional for words (only); 3. As 2 but also for smaller types.
sounds like this is going to be one of the most complicated platforms, so I really appreciate your experience here...
Hmmm....
Would it be possible to add another set of builders that could use load-locked and store-conditional functions from a lower layer? This could reduce the amount of assembler needed.
The problem is that ll/sc are quite constrained on the architectures that I know of -- most processors will clear the reservation established by ll when there is a memory reference to the same cacheline before the sc, some will do this for _any_ memory reference, so that the ll/sc loop could effectively live-lock. I don't think it is possible to constrain the compiler sufficiently to prevent it from accidentally inserting such memory references if you allow C++ code between these instructions (either -O0 builds not inlining the wrapper functions, or -O2 with very aggressive inlining moving code in between), so I fear that exposing ll/sc will be rather brittle.
I suppose that's true, but it's unfortunate. Maybe someone can think of a trick to help us. Phil.