Lock-free atomic_shared_ptr<T>?

Porting lock-free data structures from Java is complicated by the fact that the implementations assume garbage collection. There are a few techniques available for dealing with manual memory management without locks, but they complicate the implementation of algorithms that are complex enough already. So I'm thinking about trying to implement an atomic_shared_ptr<T> along the lines of the interface for other atomics defined in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html but using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list: 1) Would there be interest in adding such a type to the Boost smart pointers? 2) Has someone already done this? 3) Is anyone already working on an implementation of atomic<T> from N2427? I'm unlikely to need the whole thing, so even a half-completed public repository would be helpful. Thanks, Jeffrey Yasskin

Jeffrey Yasskin:
Porting lock-free data structures from Java is complicated by the fact that the implementations assume garbage collection. There are a few techniques available for dealing with manual memory management without locks, but they complicate the implementation of algorithms that are complex enough already. So I'm thinking about trying to implement an atomic_shared_ptr<T> along the lines of the interface for other atomics defined in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html but using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list:
1) Would there be interest in adding such a type to the Boost smart pointers?
I (and many others) will be very, very interested in your implementation, doubly so if you implement it using the interface proposed in http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2297.html#atomic I can't guarantee that your implementation will make it into Boost since hazard pointers are rumored to be patent-encumbered.
2) Has someone already done this?
If you google :-) for Joe Seigh's atomic_ptr, you'll find his implementation. It doesn't implement the entire shared_ptr interface though, so your implementation will be unique in this regard.

On Jan 18, 2008 10:34 AM, Peter Dimov <pdimov@pdimov.com> wrote:
Jeffrey Yasskin:
Porting lock-free data structures from Java is complicated by the fact that the implementations assume garbage collection. There are a few techniques available for dealing with manual memory management without locks, but they complicate the implementation of algorithms that are complex enough already. So I'm thinking about trying to implement an atomic_shared_ptr<T> along the lines of the interface for other atomics defined in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2427.html but using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list:
1) Would there be interest in adding such a type to the Boost smart pointers?
I (and many others) will be very, very interested in your implementation, doubly so if you implement it using the interface proposed in
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2297.html#atomic
Thanks for the support! I currently plan to extrapolate from the design of atomic<T> from N2427 instead of adding operations to shared_ptr itself. The requirement that "A program is allowed to access the same shared_ptr instance concurrently from multiple threads if the access is done exclusively via the member functions in this section." implies that instances accessed through those three methods are logically a separate type from instances accessed through other methods. Overloading an existing type to include two distinct logical types sounds like a recipe for trouble.
I can't guarantee that your implementation will make it into Boost since hazard pointers are rumored to be patent-encumbered.
*sigh*
2) Has someone already done this?
If you google :-) for Joe Seigh's atomic_ptr, you'll find his implementation. It doesn't implement the entire shared_ptr interface though, so your implementation will be unique in this regard.
Thanks, that looks like a good place to start, although by not implementing shared_ptr<>'s casting ability, he gets around one significant difficulty. On Jan 18, 2008 12:43 PM, Cory Nelson <phrosty@gmail.com> wrote:
Athlon 64s (pre-dual core) do not have cmpxchg16b - how do you plan to get around that?
The same way atomic<struct {int* a, int* b}> gets around it in N2427: return false from .is_lock_free() and use a spin lock. That's why I was looking for someone else's implementation of atomic<T>; I'd rather not implement that switching myself if I can avoid it. :) Tim's solution is tempting, but I think incorrect, because the two words are actually used for two whole pointers (object+refcount_object). Hazard pointers for the refcount object prevent the ABA problem and ensure that it stays alive long enough for me to increment its refcount, although there may be other ways to ensure that too. I'm not certain this actually works; I just wanted to check that nobody else had already finished before starting on the implementation. -- Namasté, Jeffrey Yasskin

Jeffrey Yasskin wrote:
On Jan 18, 2008 12:43 PM, Cory Nelson <phrosty@gmail.com> wrote:
Athlon 64s (pre-dual core) do not have cmpxchg16b - how do you plan to get around that?
The same way atomic<struct {int* a, int* b}> gets around it in N2427: return false from .is_lock_free() and use a spin lock. That's why I was looking for someone else's implementation of atomic<T>; I'd rather not implement that switching myself if I can avoid it. :)
Tim's solution is tempting, but I think incorrect, because the two words are actually used for two whole pointers (object+refcount_object). Hazard pointers for the refcount object prevent the ABA problem and ensure that it stays alive long enough for me to increment its refcount, although there may be other ways to ensure that too. I'm not certain this actually works; I just wanted to check that nobody else had already finished before starting on the implementation.
Remember also that merging pointer and refcount in the same variable falls afoul of the pointer masking rules in N2481. Of course, if an implementation announces that it supports garbage collection (which the masking rules are about), the implementation could switch to a far easier method anyway. Sebastian Redl

Sorry for the belated response. Jeffrey Yasskin:
On Jan 18, 2008 10:34 AM, Peter Dimov <pdimov@pdimov.com> wrote: ...
I (and many others) will be very, very interested in your implementation, doubly so if you implement it using the interface proposed in
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2007/n2297.html#atomic
Thanks for the support! I currently plan to extrapolate from the design of atomic<T> from N2427 instead of adding operations to shared_ptr itself. The requirement that "A program is allowed to access the same shared_ptr instance concurrently from multiple threads if the access is done exclusively via the member functions in this section." implies that instances accessed through those three methods are logically a separate type from instances accessed through other methods.
There are use cases in which you really do want to access the same variable in both atomic and non-atomic ways. Something like: rwm_.rdlock(); // use atomic accesses on px rwm_.unlock(); //... rwm_.wrlock(); // use non-atomic accesses on px, we're exclusive rwm_.unlock();

On Jan 18, 2008 9:56 AM, Jeffrey Yasskin <jyasskin@google.com> wrote:
using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list:
Athlon 64s (pre-dual core) do not have cmpxchg16b - how do you plan to get around that? -- Cory Nelson

On Fri, 18 Jan 2008 12:43:39 -0800, Cory Nelson wrote:
On Jan 18, 2008 9:56 AM, Jeffrey Yasskin <jyasskin@google.com> wrote:
using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list:
Athlon 64s (pre-dual core) do not have cmpxchg16b - how do you plan to get around that?
iirc, the x86_64 doesn't actually use a 64-bit virtual address space, but only 48-bit ... the remaining 16 bit could be used to store the tag ... best, tim -- tim@klingt.org http://tim.klingt.org Question: Then what is the purpose of this "experimental" music? Answer: No purposes. Sounds. John Cage

Well, according to http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details it might be dangerous to rely on the fact that top 16 bits are not currently used. On Fri, 18 Jan 2008 15:03:21 -0700, Tim Blechmann <tim@klingt.org> wrote:
On Fri, 18 Jan 2008 12:43:39 -0800, Cory Nelson wrote:
On Jan 18, 2008 9:56 AM, Jeffrey Yasskin <jyasskin@google.com> wrote:
using shared_ptr<T> as a value type. The implementation I'm thinking of uses hazard pointers (http://erdani.org/publications/cuj-2004-12.pdf) and double-wide-cas (cmpxchg16b on x86-64). So I have three questions for this list:
Athlon 64s (pre-dual core) do not have cmpxchg16b - how do you plan to get around that?
iirc, the x86_64 doesn't actually use a 64-bit virtual address space, but only 48-bit ... the remaining 16 bit could be used to store the tag ...
best, tim

On Jan 18, 2008 11:57 PM, Andrey Tcherepanov <moyt63c02@sneakemail.com> wrote:
Well, according to http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details it might be dangerous to rely on the fact that top 16 bits are not currently used.
You need only to rely on it on older machines that do not have DCAS. Microsoft has a lock free list that chooses at runtime whether to use plain CAS with 48 ptr + 16 bit ABA counter (actually it is probably something like 44/20 by relying on minimum alignment), or DCAS on more recent machines. -- gpd
participants (7)
-
Andrey Tcherepanov
-
Cory Nelson
-
Giovanni Piero Deretta
-
Jeffrey Yasskin
-
Peter Dimov
-
Sebastian Redl
-
Tim Blechmann