
Helge Bahmann <hcb@chaoticmind.net> writes:
On Wed, 9 Dec 2009, Anthony Williams wrote:
FWIW there are other ways to avoid ABA than tagging pointers, and without DCAS on 32-bit, see e.g. this sample implementation of a lock-free stack:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp
Yup. Incidentally, you don't even need to count the nodes in push, provided you're careful when reclaiming nodes in pop.
hm, I'm not sure what you mean, I don't think I am counting anything but the number of threads in the critical section?
Sorry, that's what I meant: you're counting threads in both push and pop, but with a bit of care you only need to count the ones in pop. I haven't examined your algorithm closely enough to know if your code takes the necessary "bit of care".
The downside of this technique is that if the traffic is too heavy then the ref count never reaches zero, so the nodes can never be reclaimed.
Right, and although I would argue that such a highly contended hotspot is a design flaw in itself,
Agreed. Such a highly contented queue is going to take a huge performance hit due to all the cache ping pong, and a blocking queue might actually perform better.
there are ways to do reliable reclaim even in that case (e.g. that allow reclaiming an element after the last thread that could actually have "seen" it leaves the critical section -- can be done with checkpoints similar RCU, so it can even be more efficient than the atomic counter I used), yet I am not sure this is really worth it
Agreed. You need to use a strategy that works for your use case. RCU-style checkpoints are one technique that does work, but there may be better solutions at the architecture level that avoid the heavy contention.
-- I would like to make an entirely different point that has been nagging me:
I'm not certain that portable, generally useful "symmetric" [1] wait-free (or obstruction-free) data structures for unbounded numbers of threads are both a) feasible and b) required. As to a) implementations overwhelmingly end up being more expensive than locking (as well as less deterministic than locking with priority inheritance), and there is also some theoretical basis for suspecting this to be an intrinsic property [2] [3]. As to b), the cases when I have needed and successfully used wait-/obstruction-free data structures are roughly:
- bounded numbers of threads with static roles (e.g. producer/consumer with single reader/multiple writer or vice versa)
- strongly asymmetric access patterns (e.g. number of read accesses vastly outweighs number of write accesses; or the converse: a producer in "interrupt" context, with many consumers in the slow path)
In the former case, often all operations can/must be wait-/obstruction-free. In the latter case the goal is just to make a subset of the operations "less expensive" or wait/obstruction-free, while the other subset typically becomes "more expensive". Both cases are easier to address, cover a large number of use cases and still pay off tremendously.
Yes. There are many ways to implement a FIFO queue, each making a different set of trade-offs.
Personally I would therefore consider less generic data structures more useful, e.g. hash-tables/trees with wait-free read-side path, or fast fully wait-free 1:n and n:1 producer/consumer data structures -- instead of not very portable (DCAS), quite expensive (compared to locking) n:m p/c data structures that strive to be egg-laying woolly dairy pigs. (I don't consider my stack implementation terribly useful, for that matter).
Totally agreed. The egg-laying woolly dairy pigs are fun to design, but are hard to get right, and often have a much higher performance cost than a specialized data structure. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++0x thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976