Re: [boost] [atomic] comments

21 Oct 2011

      ...
...
shared memory support:
the fallback implementation relies on the spinlock pool that also used
by
the smart pointers. however this pool is per-process, so the fallback
implementation won't work in shared memory. can this be changed/fixed?
fixing this would require a per-variable lock... depending on the platform
this can have enormous overheads.
I would suggest using the compile-time macros BOOST_ATOMIC_*_LOCK_FREE to
pick an alternate code path.
then we need some kind of interprocess-specific atomic ... maybe as part of 
boost.interprocess ... iac, maybe we should provide an implementation which 
somehow matches the behavior of c++11 compilers ...
...
...
atomic::is_lock_free():
is_lock_free is set to either `true' or `false'. however in some cases,
there are alignment constraints (iirc, 64bit atomics on ia32/x86_64
require a 64bit alignment). afaict there are not precautions to take
care of this, are there?
for x86_64 there is nothing to do, ABI requires 8 byte alignment already
there used to be an __align__(8) to cover ia32, but it got lost... I *think*
the "lock" prefix will cover this case nevertheless (at a hefty performance
cost, though...)
i see
...
...
compile-time vs run-time dispatching:
some instructions are not available on every CPU of a specific
architecture. e.g. cmpxchg8b or cmpxchg16b are not available on all
ia32/x86_64 cpus. i would appreciate if these instructions would not be
used before performing a CPUID check, whether these instructions are
really available (at least in a legacy mode)
the correct way to do that is to have different libraries for
sub-architectures and have the runtime- linker decide... this requires
infrastructure not present in boost
it would be equally correct to have something like:
static bool has_cmpxchg16b = query_cpuid_for_cmpxchg16b()

if (has_cmpxchg16b)
    use_cmpxchg16b();
else
    use_fallback();

less bloat and prbly only a minor performance hit ;)
...
...
cmpxchg16b:
currently cmpxchg16b doesn't seem to be supported. this instruction is
required for some lock-free data structures (e.g. there is a dequeue
algorithm, that requires a pair of tagged pointers).
could do, but cmpxchg16b is dog-slow, the fallback path is going to be
faster anyways
in the average, but not in the worst case. for real-time systems it is not 
acceptable that the os preempts a real-time thread while it is holding a 
spinlock.

cheers, tim

Re: [boost] [atomic] comments

Tim Blechmann