Re: [boost] [lockfree::fifo] Review

10 Dec 2009

      On Wed, 9 Dec 2009, Anthony Williams wrote:
...
...
So far I have not yet met any 64-bit platform that provides 64 bits of
(user-space!) address space (the same cannot be said of 32-bit
platforms), so I would say that any use of spare bits on 64 bit is
safe (for user-space!).
As Chris also noted, I would be wary of using them for ABA checking
though: it's easy to overflow a 16-bit counter quite quickly, and that's
all you've got even if the OS only uses 48 of your 64 bits for address
space.
This concern is certainly valid for basically any algorithm that uses 
"generation counters" or similar constructs for disambiguation (it could 
be argued that even 32 bits may not be enough). Using spare bits at all 
should however be considered a valid technique.
...
...
FWIW there are other ways to avoid ABA than tagging pointers, and
without DCAS on 32-bit, see e.g. this sample implementation of a
lock-free stack:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp
Yup. Incidentally, you don't even need to count the nodes in push,
provided you're careful when reclaiming nodes in pop.
hm, I'm not sure what you mean, I don't think I am counting anything but 
the number of threads in the critical section?
...
The downside of this technique is that if the traffic is too heavy then
the ref count never reaches zero, so the nodes can never be reclaimed.
Right, and although I would argue that such a highly contended hotspot is 
a design flaw in itself, there are ways to do reliable reclaim even in 
that case (e.g. that allow reclaiming an element after the last thread 
that could actually have "seen" it leaves the critical section -- can be 
done with checkpoints similar RCU, so it can even be more efficient than 
the atomic counter I used), yet I am not sure this is really worth it -- I 
would like to make an entirely different point that has been nagging me:

I'm not certain that portable, generally useful "symmetric" [1] wait-free 
(or obstruction-free) data structures for unbounded numbers of threads are 
both a) feasible and b) required. As to a) implementations overwhelmingly 
end up being more expensive than locking (as well as less deterministic 
than locking with priority inheritance), and there is also some 
theoretical basis for suspecting this to be an intrinsic property [2] [3]. 
As to b), the cases when I have needed and successfully used 
wait-/obstruction-free data structures are roughly:

- bounded numbers of threads with static roles (e.g. producer/consumer 
with single reader/multiple writer or vice versa)

- strongly asymmetric access patterns (e.g. number of read accesses vastly 
outweighs number of write accesses; or the converse: a producer in 
"interrupt" context, with many consumers in the slow path)

In the former case, often all operations can/must be 
wait-/obstruction-free. In the latter case the goal is just to make a 
subset of the operations "less expensive" or wait/obstruction-free, while 
the other subset typically becomes "more expensive". Both cases are easier 
to address, cover a large number of use cases and still pay off 
tremendously.

Personally I would therefore consider less generic data structures more 
useful, e.g. hash-tables/trees with wait-free read-side path, or fast 
fully wait-free 1:n and n:1 producer/consumer data structures -- instead 
of not very portable (DCAS), quite expensive (compared to locking) n:m p/c 
data structures that strive to be egg-laying woolly dairy pigs. (I don't 
consider my stack implementation terribly useful, for that matter).

Helge

[1] by "symmetric" I mean: all types of accesses (e.g. both insertion and 
deletion) are wait/obstruction-free

[2] http://portal.acm.org/citation.cfm?id=1146427

[3] http://portal.acm.org/citation.cfm?id=197975

Re: [boost] [lockfree::fifo] Review

Helge Bahmann