
"Helge Bahmann" <hcb@chaoticmind.net> wrote in message news:alpine.DEB.1.10.0912101115460.456@m65s28.vlinux.de...
On Wed, 9 Dec 2009, Anthony Williams wrote:
So far I have not yet met any 64-bit platform that provides 64 bits of (user-space!) address space (the same cannot be said of 32-bit platforms), so I would say that any use of spare bits on 64 bit is safe (for user-space!).
As Chris also noted, I would be wary of using them for ABA checking though: it's easy to overflow a 16-bit counter quite quickly, and that's all you've got even if the OS only uses 48 of your 64 bits for address space.
This concern is certainly valid for basically any algorithm that uses "generation counters" or similar constructs for disambiguation (it could be argued that even 32 bits may not be enough). Using spare bits at all should however be considered a valid technique.
FWIW there are other ways to avoid ABA than tagging pointers, and without DCAS on 32-bit, see e.g. this sample implementation of a lock-free stack:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/stack.hpp
Yup. Incidentally, you don't even need to count the nodes in push, provided you're careful when reclaiming nodes in pop.
hm, I'm not sure what you mean, I don't think I am counting anything but the number of threads in the critical section?
The downside of this technique is that if the traffic is too heavy then the ref count never reaches zero, so the nodes can never be reclaimed.
Right, and although I would argue that such a highly contended hotspot is a design flaw in itself,
You have created a proxy garbage collector that reminds me of one I did a long time ago which also suffered from this exact same problem. I was using it for memory reclamation: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3d... it went something like, membars aside for a moment: _____________________________________________________________ struct node { struct node* next; }; struct proxy { struct node* head; // = NULL uint32_t count; // = 0 }; void proxy_acquire(struct proxy* const self) { struct proxy cmp, xchg; cmp.head = self->head; cmp.count = self->count; do { xchg.head = cmp.head; xchg.count = cmp.count + 1; } while (! DWCAS(self, &cmp, &xchg)); } struct node* proxy_release(struct proxy* const self) { struct proxy cmp, xchg; cmp.head = self->head; cmp.count = self->count; do { xchg.count = cmp.count - 1; if (! xchg.count) { xchg.head = NULL; } else { xchg.head = cmp.head; } } while (! DWCAS(self, &cmp, &xchg)); if (! xchg.count) { return cmp.head; } return NULL; } struct node* proxy_defer(struct proxy* const self, struct node* node) { struct proxy cmp, xchg; cmp.head = self->head; cmp.count = self->count; do { node->next = cmp.head; xchg.count = cmp.count; if (! xchg.count) { xchg.head = NULL; } else { xchg.head = node; } } while (! DWCAS(self, &cmp, &xchg)); if (! xchg.count) { return node; } return NULL; } _____________________________________________________________ This solves memory reclamation, but will pile up nodes during times of sustained load...
there are ways to do reliable reclaim even in that case (e.g. that allow reclaiming an element after the last thread that could actually have "seen" it leaves the critical section
Yes. My next incarnation of the proxy collector above was coded to explicitly handle this situation. It simply breaks off collector objects during deferment and backlinks them all together: http://webpages.charter.net/appcore/misc/pc_sample_h_v1.html
-- can be done with checkpoints similar RCU, so it can even be more efficient than the atomic counter I used), yet I am not sure this is really worth it -- I would like to make an entirely different point that has been nagging me:
I don't think it's worth using RCU just to get around ABA in a lock-free stack. Heck, it does not even fit in with the general usage pattern which does not expect a lot of writes. Humm, well, of course you can take tremendous pressure off of RCU by combining it with version counters and only using them to determine when to defer nodes. For instance, you don't need to defer a node into RCU until a version counter is just about to rollover.