
On Saturday 22 October 2011 20:32:44 Tim Blechmann wrote:
then we need some kind of interprocess-specific atomic ... maybe as part of boost.interprocess ... iac, maybe we should provide an implementation which somehow matches the behavior of c++11 compilers ...
well if the atomics are truely atomic, then BOOST_ATOMIC_*_LOCK_FREE == 2 and I find a platform where you cannot use them safely between processes difficult to imagine (not that something like that could not exist)
one would have to do the dispatching logic in the preprocessor, so one cannot dispatch depending on the typedef operator.
it's certainly possible to build a helper template to map types to these macro values (map to the value of BOOST_ATOMIC_INT_LOCK_FREE for all types T with sizeof(T) == sizeof(int) for example)
if they are not atomic, then you are going to hit a "fallback-via locking" path in whiche case you are almost certainly better off picking an interprocess communication mechanism that just uses locking directly
true, but at the cost of increasing the program logic. however there are cases, when you are happy that you don't have to change the program at the cost of performance on legacy hardware.
okay that's a valid point -- not sure how common this use case is, but I do not think it deserves penalizing the process-local path doing it in Boost.Interprocess might be something to consider however
it would be equally correct to have something like: static bool has_cmpxchg16b = query_cpuid_for_cmpxchg16b()
if (has_cmpxchg16b)
use_cmpxchg16b();
else
use_fallback();
less bloat and prbly only a minor performance hit ;)
problematic because the compiler must insert a lock to ensure thread-safe initialization of the "static bool" (thus it is by definition not "lock-free" any more)
well, one could also set a static variable with a function called before main (e.g. via __attribute__(constructor))
might be possible, but this will then cost everyone the cpuid at load time I am currently trying out something different, namely a tristate variable ("unknown", "has_cmpxchg8b", "lacks_cmpxchg8b") with a benign race where (in bad cases) multiple threads might end up doing "cpuid" concurrently until all threads "see" that it has a state other than "unknown"
in the average, but not in the worst case. for real-time systems it is not acceptable that the os preempts a real-time thread while it is holding a spinlock.
prio-inheriting mutexes are usually much faster than cmpxchg16b -- use these for hard real-time (changing the fallback path to use PI mutexes as well might even be something to consider)
do you have some numbers which latencies can be achieved with PI mutexes?
no I don't, but the literature measuring wakeup latencies in operating systems is plentiful I only have throughput numbers, and these peg a double-word-CAS operation as slightly less than twice as expensive as single-word-CAS -- considering that most protocols need one pair of (either single- or double-word) CAS, and considering that PI mutex lock/unlock can essentially be just a CAS on the lock variable (to store/clear the owner id) in the fast path, PI mutexes usually end up faster Nevertheless I will add cmpxchg16b for experimentation. Best regards Helge