
On Wed, 9 Dec 2009, Anthony Williams wrote:
So far I have not yet met any 64-bit platform that provides 64 bits of (user-space!) address space (the same cannot be said of 32-bit platforms), so I would say that any use of spare bits on 64 bit is safe (for user-space!).
As Chris also noted, I would be wary of using them for ABA checking though: it's easy to overflow a 16-bit counter quite quickly, and that's all you've got even if the OS only uses 48 of your 64 bits for address space.
This concern is certainly valid for basically any algorithm that uses "generation counters" or similar constructs for disambiguation (it could be argued that even 32 bits may not be enough). Using spare bits at all should however be considered a valid technique.
FWIW there are other ways to avoid ABA than tagging pointers, and without DCAS on 32-bit, see e.g. this sample implementation of a lock-free stack:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp
Yup. Incidentally, you don't even need to count the nodes in push, provided you're careful when reclaiming nodes in pop.
hm, I'm not sure what you mean, I don't think I am counting anything but the number of threads in the critical section?
The downside of this technique is that if the traffic is too heavy then the ref count never reaches zero, so the nodes can never be reclaimed.
Right, and although I would argue that such a highly contended hotspot is a design flaw in itself, there are ways to do reliable reclaim even in that case (e.g. that allow reclaiming an element after the last thread that could actually have "seen" it leaves the critical section -- can be done with checkpoints similar RCU, so it can even be more efficient than the atomic counter I used), yet I am not sure this is really worth it -- I would like to make an entirely different point that has been nagging me: I'm not certain that portable, generally useful "symmetric" [1] wait-free (or obstruction-free) data structures for unbounded numbers of threads are both a) feasible and b) required. As to a) implementations overwhelmingly end up being more expensive than locking (as well as less deterministic than locking with priority inheritance), and there is also some theoretical basis for suspecting this to be an intrinsic property [2] [3]. As to b), the cases when I have needed and successfully used wait-/obstruction-free data structures are roughly: - bounded numbers of threads with static roles (e.g. producer/consumer with single reader/multiple writer or vice versa) - strongly asymmetric access patterns (e.g. number of read accesses vastly outweighs number of write accesses; or the converse: a producer in "interrupt" context, with many consumers in the slow path) In the former case, often all operations can/must be wait-/obstruction-free. In the latter case the goal is just to make a subset of the operations "less expensive" or wait/obstruction-free, while the other subset typically becomes "more expensive". Both cases are easier to address, cover a large number of use cases and still pay off tremendously. Personally I would therefore consider less generic data structures more useful, e.g. hash-tables/trees with wait-free read-side path, or fast fully wait-free 1:n and n:1 producer/consumer data structures -- instead of not very portable (DCAS), quite expensive (compared to locking) n:m p/c data structures that strive to be egg-laying woolly dairy pigs. (I don't consider my stack implementation terribly useful, for that matter). Helge [1] by "symmetric" I mean: all types of accesses (e.g. both insertion and deletion) are wait/obstruction-free [2] http://portal.acm.org/citation.cfm?id=1146427 [3] http://portal.acm.org/citation.cfm?id=197975