
shared memory support: the fallback implementation relies on the spinlock pool that also used by the smart pointers. however this pool is per-process, so the fallback implementation won't work in shared memory. can this be changed/fixed?
fixing this would require a per-variable lock... depending on the platform this can have enormous overheads.
I would suggest using the compile-time macros BOOST_ATOMIC_*_LOCK_FREE to pick an alternate code path.
then we need some kind of interprocess-specific atomic ... maybe as part of boost.interprocess ... iac, maybe we should provide an implementation which somehow matches the behavior of c++11 compilers ...
atomic::is_lock_free(): is_lock_free is set to either `true' or `false'. however in some cases, there are alignment constraints (iirc, 64bit atomics on ia32/x86_64 require a 64bit alignment). afaict there are not precautions to take care of this, are there?
for x86_64 there is nothing to do, ABI requires 8 byte alignment already
there used to be an __align__(8) to cover ia32, but it got lost... I *think* the "lock" prefix will cover this case nevertheless (at a hefty performance cost, though...)
i see
compile-time vs run-time dispatching: some instructions are not available on every CPU of a specific architecture. e.g. cmpxchg8b or cmpxchg16b are not available on all ia32/x86_64 cpus. i would appreciate if these instructions would not be used before performing a CPUID check, whether these instructions are really available (at least in a legacy mode)
the correct way to do that is to have different libraries for sub-architectures and have the runtime- linker decide... this requires infrastructure not present in boost
it would be equally correct to have something like: static bool has_cmpxchg16b = query_cpuid_for_cmpxchg16b() if (has_cmpxchg16b) use_cmpxchg16b(); else use_fallback(); less bloat and prbly only a minor performance hit ;)
cmpxchg16b: currently cmpxchg16b doesn't seem to be supported. this instruction is required for some lock-free data structures (e.g. there is a dequeue algorithm, that requires a pair of tagged pointers).
could do, but cmpxchg16b is dog-slow, the fallback path is going to be faster anyways
in the average, but not in the worst case. for real-time systems it is not acceptable that the os preempts a real-time thread while it is holding a spinlock. cheers, tim