
I think, having a mutex per atomic instance is an overkill. However, a spinlock per instance might just be the silver bullet. The size overhead should be quite modest (1 to 4 bytes, I presume) and the performance would still be decent. After all, atomic<> is intended to be used with relatively small types with simple operations, such as copying and arithmetics. In other cases it is natural to use explicit mutexes, and we could emphasise it in the docs.
might be possible, the problem is that this assumes that there is atomic<something> available -- as soon as you hit a platform where everything hits the fallback, you just have to use a mutex and the cost becomes unbearable
True. But are there realistic platforms without any support of atomic ops whatsoever today? If there are, I'm not sure the library should support these platforms in the first place.
well, it is quite a chicken-and-egg problem, we need atomics to implement atomics to implement atomics, when atomics are not available. but in the real world i guess all platforms will provide some kind of atomic operations, which are sufficient to implement basic spinlocks. it would also be fine with me to delegate the implementation to boost::detail::spinlock in the smart_ptr library (assuming that it will never be implemented via atomic<>) cheers, tim