[proposal] Atomically Thread-Safe Reference Counting Algorithm...

Currently, Boost doesn't provide support for atomic reference counting; shared_ptr<T> falls under 'basic' thread-safety. I propose a reference counting algorithm that falls under 'strong' thread-safety. Here is a experimental prototype I created: http://appcore.home.comcast.net/vzoom/refcount/ A SPARC 32-64 version is underway. Here is some more information on my algorithm: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4e 8717d3bcdedfe9 (initial idea; pseudo-code) http://groups.google.com/group/comp.programming.threads/msg/2f21a151d3916592 (mostly lock-free...) http://groups.google.com/group/comp.programming.threads/msg/0022ef08ae26e2f3 (async-signal-safe aspects of my algorithm) http://groups.google.com/group/comp.programming.threads/msg/667b1867c19e6288 (async-signal...) http://groups.google.com/group/comp.programming.threads/msg/9ee468f341a33ee2 (adding more async-signal-safety characteristics'...) http://groups.google.com/group/comp.programming.threads/msg/64a46f3ef24b786a http://groups.google.com/group/comp.programming.threads/msg/e363f874241bcaf4 (possible improvements...) Does anybody think that Boost could possible benefit from this level of thread-safety? Any thoughts? Thank you all for you time! -- Chris Thomasson http://appcore.home.comcast.net/ (portable lock-free data-structures)

"loufoque" <mathias.gaunard@etu.u-bordeaux1.fr> wrote in message news:eh89to$hev$2@sea.gmane.org...
Chris Thomasson wrote:
Does anybody think that Boost could possible benefit from this level of thread-safety? Any thoughts?
I think a whole lock-free data structures library would be even better.
Well, I have a full blown commercial library called vZOOM (only released to Sun so far, via. CoolThreads Contest; it's one of the finalists), but it contains some of my patented algorithms: https://coolthreads.dev.java.net/ http://groups.google.com/group/comp.programming.threads/search?group=comp.programming.threads&q=vzoom&qt_g=1 I have another full blown library, but it contains an IBM patent application (e.g., SMR): http://appcore.home.comcast.net/ However, after I strip SMR from AppCore, this is planned, then it will only consist of MY prior art... I wonder how Boost community would feel about incorporating AppCore into Boost...

Hey Chris, Chris Thomasson wrote:
Currently, Boost doesn't provide support for atomic reference counting; shared_ptr<T> falls under 'basic' thread-safety. I propose a reference counting algorithm that falls under 'strong' thread-safety. Here is a experimental prototype I created:
This doesn't contain any documentation, only source. You can't expect people to understand it without at least a brief reference that specifies what the various functions are supposed to do.
A SPARC 32-64 version is underway. Here is some more information on my algorithm:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4e... (initial idea; pseudo-code)
Imagine that I add two member functions to shared_ptr: shared_ptr<T> shared_ptr<T>::copy() const; // return *this void shared_ptr<T>::replace( shared_ptr<T> const & p ); // *this = p that are synchronized with a mutex/spinlock/rwlock/rwspinlock and I declare that "strong" thread safety is offered only when everything goes through these two accessors and nothing else. Is this inferior to your proposed scheme? In what scenarios? Can you implement the copy/replace interface in a "more lock-free way" using your scheme (you are allowed to add to sp_counted_base whatever members you need)?

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:009a01c6f39f$c72f3ff0$6507a8c0@pdimov2...
Hey Chris,
[...]
Here is a experimental prototype I created:
This doesn't contain any documentation, only source. You can't expect people to understand it without at least a brief reference that specifies what the various functions are supposed to do.
Yeah... Sorry. I plan on posting docs soon. [...]
Imagine that I add two member functions to shared_ptr:
shared_ptr<T> shared_ptr<T>::copy() const; // return *this void shared_ptr<T>::replace( shared_ptr<T> const & p ); // *this = p
that are synchronized with a mutex/spinlock/rwlock/rwspinlock and I declare that "strong" thread safety is offered only when everything goes through these two accessors and nothing else.
That should work...
Is this inferior to your proposed scheme?
Maybe... Currently, all of my pointer-ops are 100% lock-free. My reference count adjustments are 100% lock-free for everything except strong competing accesses (e.g., it only *takes a spinlock for strong competing accesses, **and when the count drops to zero). My counter objects can be swapped using normal word-based atomic operations (e.g., XCHG and CAS, no DWCAS required)... Also, I can use parts of my algorithm in the context of a signal handler. Can you use any part of shared_ptr<T> in a signal handler? I think the answer is going to be NO; however, I would love to be corrected if I am wrong... ;)
In what scenarios?
I think my scheme could possibly be more efficient because all of the pointer-ops are 100% lock-free, and most of the reference counts are 100% lock-free... Humm... I would have to see a sketch of the algorithm you have in mind Peter...
Can you implement the copy/replace interface in a "more lock-free way" using your scheme (you are allowed to add to sp_counted_base whatever members you need)?
Let me define some of the previous asterisks: *, **: The "hashed locking scheme can be completely replaced with one of my PDR schemes" I posted here: http://groups.google.com/group/comp.arch/browse_frm/thread/b2f751d5dad6c07b (DWCAS: no) or this one I posted here: http://groups.google.com/group/comp.programming.threads/msg/f443b38cf7bbca8a http://groups.google.com/group/comp.programming.threads/msg/a4a796e25a157ca1 (DWCAS: yes) This stuff is not in my prototype yet... I guess I should augment it with PDR. Once I do that, then I can implement copy/replace in a 100% lock-free way... Humm... Does any of this begin to address your questions Peter?

Chris Thomasson wrote:
Is this inferior to your proposed scheme?
Maybe... Currently, all of my pointer-ops are 100% lock-free. My reference count adjustments are 100% lock-free for everything except strong competing accesses (e.g., it only *takes a spinlock for strong competing accesses, **and when the count drops to zero). My counter objects can be swapped using normal word-based atomic operations (e.g., XCHG and CAS, no DWCAS required)...
How do you distingush strong competing accesses from noncompeting accesses? You have two pointers, global and local; what levels of thread safety do these offer? Is global the atomic pointer and local the "as safe as an int" equivalent? Or is local to be kept local to a thread? If I have ptr::global<X> px; and I copy px into another global, does this take a spinlock? If I assign to it? Sorry for asking so many questions, but lock-free source code is hard. :-)

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87k62u1p63.fsf@pereiro.luannocracy.com...
"Peter Dimov" <pdimov@mmltd.net> writes:
lock-free source code is hard.
Not to mention doc-free source.
- I should have some fairly crude documentation that will cover the actual refcount algorithm posted on my site within the next couple of days. - I will have some documentation that will cover the entire refcount C Abstraction API posted on my site within the next couple of days. - The documentation for the C++ Abstraction API will appear shortly after... Sorry for the delay! :O

Sorry for not answering this sooner! "Peter Dimov" <pdimov@mmltd.net> wrote in message news:001c01c6f491$0d3e2690$6507a8c0@pdimov2...
Chris Thomasson wrote:
Is this inferior to your proposed scheme? Maybe... Currently, all of my pointer-ops are 100% lock-free. My reference count adjustments are 100% lock-free for everything except strong competing accesses (e.g., it only *takes a spinlock for strong competing accesses, **and when the count drops to zero). My counter objects can be swapped using normal word-based atomic operations (e.g., XCHG and CAS, no DWCAS required)...
How do you distingush strong competing accesses from noncompeting accesses?
ptr::global loads ptr::global = strong ptr::global loads ptr::local = weak ptr::global stores/swaps are atomic ptr::global refcount updates are atomic ptr::local loads ptr::global = strong ptr::local loads ptr::local = weak ptr::local stores/swaps are not-atomic ptr::local refcount updates are atomic
You have two pointers, global and local; what levels of thread safety do these offer?
ptr::global = strong
Is global the atomic pointer and local the "as safe as an int" equivalent? Or is local to be kept local to a thread?
ptr::local should only have one 1 thread loading or storing into it at any one time; local does not have atomic pointer swaps. It makes use of impl_base::swap_naked(...). No reason to have atomic pointer swaps for a ptr::local. However, the count updates are still atomic. For example: - You can contain ptr::local<foo> in a shared collection that is protected by a mutex. If the collection was not protected by a mutex then it would have to contain ptr::global<foo> instead. - One thread can create a ptr::local<foo> and transfer it to another thread, via. queue. This can be accomplished without using ptr::global<foo>. - The only time you make use of ptr::global<foo> is when you wish to make use of strong thread-safety. Otherwise, use ptr::local<foo>.
If I have
ptr::global<X> px;
and I copy px into another global, does this take a spinlock?
Yes.
If I assign to it?
If you assign a ptr::local to a ptr::global, then you have a weak increment. If you assign a ptr::global to a ptr::local, then you have a strong increment.
Sorry for asking so many questions,
No Problem! :^)
but lock-free source code is hard. :-)
This is mostly lock-free... It's still kind of tricky though... ;^)

Chris Thomasson wrote:
You have two pointers, global and local; what levels of thread safety do these offer?
ptr::global = strong
Is global the atomic pointer and local the "as safe as an int" equivalent? Or is local to be kept local to a thread?
ptr::local should only have one 1 thread loading or storing into it at any one time; local does not have atomic pointer swaps. It makes use of impl_base::swap_naked(...). No reason to have atomic pointer swaps for a ptr::local. However, the count updates are still atomic.
So ptr::local = basic.
- You can contain ptr::local<foo> in a shared collection that is protected by a mutex. If the collection was not protected by a mutex then it would have to contain ptr::global<foo> instead.
I don't see how this can work; the collection itself needs to be atomic, not its elements (it could demand element atomicity, I guess, but it still needs to be atomic on top of that)... Having two separate pointer types is a legitimate option but (as you might've deduced by now) I've been exploring having only one and exposing the 'strong thread safety' as separate member functions. The reason I haven't added copy/replace to shared_ptr is that I'm not sure whether it won't be better to generalize the idea even further and implement something along the lines of atomic_cell< shared_ptr<X> > px; which would be useful for adding atomicity on top of any (thread-safe as int) type, not just shared_ptr. What do you think of that? Is a "mostly lock-free" atomic_cell possible?

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00e901c6f613$06ed8a00$6607a8c0@pdimov2...
Chris Thomasson wrote:
You have two pointers, global and local; what levels of thread safety do these offer?
ptr::global = strong
Is global the atomic pointer and local the "as safe as an int" equivalent? Or is local to be kept local to a thread?
ptr::local should only have one 1 thread loading or storing into it at any one time; local does not have atomic pointer swaps. It makes use of impl_base::swap_naked(...). No reason to have atomic pointer swaps for a ptr::local. However, the count updates are still atomic.
So ptr::local = basic.
Yup.
- You can contain ptr::local<foo> in a shared collection that is protected by a mutex. If the collection was not protected by a mutex then it would have to contain ptr::global<foo> instead.
I don't see how this can work; the collection itself needs to be atomic, not its elements (it could demand element atomicity, I guess, but it still needs to be atomic on top of that)...
Are you referring to the case in which a collection was not protected by a mutex? If so, I was basically referring to some sort of lock-free, or mostly lock-free collection. You can use ptr::global in a lock-free collection, it isn't very practical because of the spinlock, but it is compatible. Its was just a simple example to show some of the algorithms flexibility...
Having two separate pointer types is a legitimate option but (as you might've deduced by now) I've been exploring having only one and exposing the 'strong thread safety' as separate member functions. The reason I haven't added copy/replace to shared_ptr is that I'm not sure whether it won't be better to generalize the idea even further and implement something along the lines of
atomic_cell< shared_ptr<X> > px;
which would be useful for adding atomicity on top of any (thread-safe as int) type, not just shared_ptr.
What do you think of that?
Interesting...
Is a "mostly lock-free" atomic_cell possible?
Probably, but it might be more expensive that a more direct solution... Humm... Need to put my thinking cap back on...

"Chris Thomasson" <cristom@comcast.net> wrote in message news:ehgjr6$iru$1@sea.gmane.org...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00e901c6f613$06ed8a00$6607a8c0@pdimov2...
Chris Thomasson wrote:
[...]
Are you referring to the case in which a collection was not protected by a mutex? If so, I was basically referring to some sort of lock-free, or mostly lock-free collection. You can use ptr::global in a lock-free collection, it isn't very practical because of the spinlock, but it is compatible. Its was just a simple example to show some of the algorithms flexibility...
To clarify, you can use ptr::global in the anchor structure of a lock-free collection. For example: template<typename T_state> struct node { ptr::global<node> m_next; T_state *m_state; }; // packed structure template<typename T_node> struct lflifo_anchor { ptr::global<T_node> m_node; uintword_t m_aba; }; typedef lflifo_anchor<node<foo> > my_anchor_foo_t; my_anchor_foo_t can be modified with DWCAS, because (sizeof(my_anchor_foo_t) == (sizeof(void*) * 2)).

Chris Thomasson wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:009a01c6f39f$c72f3ff0$6507a8c0@pdimov2...
Hey Chris,
[...]
Here is a experimental prototype I created:
This doesn't contain any documentation, only source. You can't expect people to understand it without at least a brief reference that specifies what the various functions are supposed to do.
Yeah... Sorry. I plan on posting docs soon.
You should probably post a C implementation of refcount-ia32 (for example using Interlocked* instrinsics as a pseudocode for XCHG and XADD).

Chris Thomasson wrote:
I think my scheme could possibly be more efficient because all of the pointer-ops are 100% lock-free, and most of the reference counts are 100% lock-free... Humm... I would have to see a sketch of the algorithm you have in mind Peter...
Basically... using copy/replace on a shared_ptr would make it behave atomically ('global') and using the other ops would make it non-atomic ('local'/semi-local). shared_ptr copy() const take read (rw)spinlock for pointer to count make a copy of *this release spinlock return copy void replace( shared_ptr const & rhs ) take write (rw)spinlock for pointer to count *this = rhs release spinlock As I understand your scheme, you take a spinlock in your equivalent of the copy operation (in atomic_add_load_strong). However you are able to avoid the write lock in the replace operation by deferring it until the refcount reaches zero, something like (best guess): void replace( ptr const & rhs ) atomic_decrement( this->count ) // #1 if reached 0 take write lock // #2 re-check for zero if true, invoke destructor else ???, someone sneaked a 'copy' between #1 and #2 swap this and rhs except I don't see the "re-check for zero" part in your assembly, so I might be missing something; how do you protect against that scenario? I can't easily do that for shared_ptr since its swap is not atomic (shared_ptr is double-wide) but it could be possible... but I'm not sure whether it's worth it, since in a typical application the copy ops will outnumber the replace ops, so avoiding only the write lock may not be that important.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:002a01c6f498$3e7a0bf0$6507a8c0@pdimov2...
Chris Thomasson wrote:
I think my scheme could possibly be more efficient
[...]
Humm... I would have to see a sketch of the algorithm you have in mind Peter...
Basically... using copy/replace on a shared_ptr would make it behave atomically ('global') and using the other ops would make it non-atomic ('local'/semi-local).
[...]
As I understand your scheme, you take a spinlock in your equivalent of the copy operation (in atomic_add_load_strong).
Yes.
However you are able to avoid the write lock in the replace operation by deferring it until the refcount reaches zero,
Exactly.
something like (best guess):
[...]
except I don't see the "re-check for zero" part in your assembly, so I might be missing something; how do you protect against that scenario?
Here is some more detailed pseudo-code I posted in response to Joe Seigh: http://groups.google.com/group/comp.programming.threads/msg/224aa9d097f4300e The basic technique for 'copying' is as follows: rc* copy(rc **sloc, int count) { 1: load ptr from *sloc if ptr is null goto 2 lock spinlock associated with ptr re-load ptr from *sloc compare w/ previous load if not equal unlock and goto 1 XADD ptr's with count * if the previous value was less than 1 unlock and goto 1 unlock 2: return ptr } It also good to keep the following in mind: *: If the value was less than 1, that means that we detected a drop to zero condition on the 'ptr'. Keep in mind that we are locked on the spinlock that is associated with 'ptr'. The decrement thread always locks before it destroys, so it will have to wait for us. It also means that all of the shared locations that previously contained a pointer value equal to 'ptr' are now either null or have **changed to another value; we fail, and try again. The decrement logic looks like this: bool dec(rc *ptr, int count) { XADD ptr's refs with negated count; if new value is greater than 0 then return false; D1: lock spinlock associated with ptr; unlock spinlock associated with ptr; call ptr dtor; return true; } If D1 is reached, that means we have to lock the spinlock for 'ptr'. This must be done because there could be an increment thread(s) trying to increment; if any of them has it locked, we have to wait until they fail-and-unlock: If there is an increment thread that has locked, that means that it will fail because it will notice that it tried to decrement a value that was less than 1. If there happens to be an increment thread that has loaded a pointer value that is equal to 'ptr' and is waiting or just about to lock the same spinlock, then it will also fail; it's re-load and compare will ensure that. **: If there happens to be an increment thread that has loaded a pointer value that is equal to 'ptr' and is waiting or just about to lock the same spinlock, and the decrement thread runs the dtor and the user-application reuses the 'ptr' and swaps it into the exact location that the increment thread loaded its pointer from, then all is well. The increment threads lock-and-compare will succeed, and the increment will attempt to proceed. This algorithm is ABA proof. My documentation is almost complete; it should help clear things up. Did my description address some of your initial concerns?
I can't easily do that for shared_ptr since its swap is not atomic (shared_ptr is double-wide) but it could be possible...
Yes. IIRC, you mentioned that you could use DWCAS to modify a shared location that contains a shared_ptr. However, as we know, DWCAS will introduce your algorithm to a rather annoying portability problem: http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5 ;^)
but I'm not sure whether it's worth it, since in a typical application the copy ops will outnumber the replace ops, so avoiding only the write lock may not be that important.
IMHO, anytime you can reduce the number of atomic operations and/or memory barriers is a plus... :^)

Chris Thomasson wrote:
The basic technique for 'copying' is as follows:
rc* copy(rc **sloc, int count) { 1: load ptr from *sloc if ptr is null goto 2 lock spinlock associated with ptr re-load ptr from *sloc compare w/ previous load if not equal unlock and goto 1 XADD ptr's with count * if the previous value was less than 1 unlock and goto 1 unlock 2: return ptr }
I thought that that might be the case. Won't you need to undo the XADD before unlocking and retrying, though? That, or use CAS. The decrement thread may not care, but if another increment thread gets in first, bad things happen; or am I missing something else?
IMHO, anytime you can reduce the number of atomic operations and/or memory barriers is a plus...
Maybe... Given a choice between the two, I would prefer to somehow eliminate the spinlock in 'copy', though. :-)

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000601c6f582$10c92890$6507a8c0@pdimov2...
Chris Thomasson wrote:
The basic technique for 'copying' is as follows:
[...]
I thought that that might be the case. Won't you need to undo the XADD before unlocking and retrying, though? That, or use CAS.
I could use CAS, however I wanted to try an come up with an algorithm that can avoid CAS...
The decrement thread may not care, but if another increment thread gets in first, bad things happen; or am I missing something else?
This is where the reload-and-compare logic comes into play. It basically boils down to a simple coherent lock-based snapshot algorithm. Load before lock, reload after lock, compare, if equal you know that the previous load is coherent with the loaded value that you grabbed under the protection of the lock, if not unlock and retry... This will detect ALL swaps from shared locations that jump in just before we lock; I guess its kind of similar logic to the double check in SMR... If SMR did not perform that double-check, that it would be broken beyond repair... Just like my algorithm would. The snapshot logic does not touch any part of a refcount object. It operates on shared locations that contain pointers to refcount object. It is not possible for a increment thread to load a pointer to refcount object, after it drops to zero, except in the condition I described earlier wrt ABA occurs; the algorithm is compatible with ABA. A refcount object drop to zero condition means that there are no shared locations that contain any pointers to it. Any increment threads that have loaded pointers, will fail when they reload-and-compare. If one gets through, and their compare succeeds, it doesn't matter because it has already locked the associated spinlock, so it has exclusive access to the refcount. Since its reload and compare succeeded, that means it happened before any decrement thread got to execute; swaps happen before decrements (e.g., swap shared loc with 0, dec the old ptr). A decrement thread will always lock the spinlock before it calls the destructor, so it will wait for the one that got through the compare logic... One thing I forgot to mention in my last post, nit-picking, your algorithm sketch invoked the user-provided destructor in the context of the critical-section provided by the lock that is associated with the count ptr. Can't call function pointers while you hold a lock? ;) Humm... I have to admit that if I used CAS, the algorithm logic would be easier to understand... ;)
IMHO, anytime you can reduce the number of atomic operations and/or memory barriers is a plus...
Maybe... Given a choice between the two, I would prefer to somehow eliminate the spinlock in 'copy', though. :-)
You can get rid of it by using PDR... I have a hybrid version of my algorithm that uses it. Things get tricky here because of all of the RCU, SMR patents, not to mention my vZOOM patent... I think Boost could make use of one of the following collectors: http://groups.google.com/group/comp.programming.threads/msg/f443b38cf7bbca8a http://groups.google.com/group/comp.programming.threads/msg/a4a796e25a157ca1 One of them uses DWCAS... I have workaround, you can make use of offset trick explained here: http://groups.google.com/group/comp.arch/msg/a3ebfe80363a4399 The other one uses CAS, however it operated on fixed number of threads. I also have a workaround using an offset trick, or another technique that allows for X number of threads to acquire proxy reference without blocking... This is good because reads and writes can be lock-free, and reads and writes are no longer mutually excludable... Humm... I don't have docs on the algorithm... Wonder if I should post it; its experimental...

"Chris Thomasson" <cristom@comcast.net> wrote in message news:ehenuk$if2$1@sea.gmane.org...
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:000601c6f582$10c92890$6507a8c0@pdimov2...
Chris Thomasson wrote:
The basic technique for 'copying' is as follows:
[...] I need to clarify one thing here
A refcount object drop to zero condition means that there are no shared locations that contain any pointers to it. Any increment threads that have loaded pointers, will fail when they reload-and-compare.
[...]
If one gets through, and their compare succeeds, it doesn't matter because it has already locked the associated spinlock, so it has exclusive access to the refcount. Since its reload and compare succeeded, that means it happened before any decrement thread got to execute; swaps happen before decrements (e.g., swap shared loc with 0, dec the old ptr). A decrement thread will always lock the spinlock before it calls the destructor, so it will wait for the one that got through the compare logic... ^^^^^^^^^^^^^^^^^
I forgot to mention that the thread that got through the gap between the decrement to 0, and the lock, it will be subject to the compare logic and/or the logic the prevents a copy when a refcount value get increments and its previous value was less than 1. I know this is a bit tricky logic, however I believe that if you examine some more you will find it to be correct... ;^)

"Chris Thomasson" <cristom@comcast.net> wrote in message news:ehenuk$if2$1@sea.gmane.org...
"Peter Dimov" <pdimov@mmltd.net> wrote in message
Chris Thomasson wrote:
[...]
Maybe... Given a choice between the two, I would prefer to somehow eliminate the spinlock in 'copy', though. :-)
You can get rid of it by using PDR...
There may be a way without using PDR... Need to find and tinker with an older algorithm I was experimenting with a couple of years ago... Humm...

Chris Thomasson wrote:
One thing I forgot to mention in my last post, nit-picking, your algorithm sketch invoked the user-provided destructor in the context of the critical-section provided by the lock that is associated with the count ptr. Can't call function pointers while you hold a lock? ;)
Yes, I "oversimplified" a bit. *this = rhs is a shorthand for shared_ptr( rhs ).swap( *this ) and the copy/destroy parts of it need to be outside the lock: void replace( shared_ptr rhs ) take write (rw)spinlock for this this->swap( rhs ) release spinlock Another mistake was that the spinlock should be associated with 'this', not with the pointer to count as in your scheme.

DOH!!! I found a killed a bug. It only existed in the C++ Abstraction API. It has to do with the fact that pointers to user objects are contained in the userstate<T>::m_state member. I was returning a pointer to the userstate<T>, rather than the userstate<T>::m_state member in the following functions: inline T* load_ptr() throw() { return static_cast<T*>(atomic_state_load_depends()); } inline T const* load_ptr() const throw() { return static_cast<T const*>(atomic_state_load_depends()); } They have to be changed to this: inline T* load_ptr() throw() { return static_cast<userstate<T>*>(atomic_state_load_depends())->m_state; } inline T const* load_ptr() const throw() { return static_cast<userstate<T>*>(atomic_state_load_depends())->m_state; } Also, you need to add this to the userstate<T> class: friend class ptr_base<T>; The code is fixed on my site. Sorry for any confusion! ;^(... Luckily, the bug was in the C++ layer, and not in the assembly! ;^)

"Chris Thomasson" <cristom@comcast.net> wrote in message news:ehhbkk$gr4$1@sea.gmane.org...
DOH!!!
I found a killed a bug. It only existed in the C++ Abstraction API. It has to do with the fact that pointers to user objects are contained in the userstate<T>::m_state member. I was returning a pointer to the userstate<T>, rather than the userstate<T>::m_state member in the following functions:
Did anyone else notice (e.g., seg-fault) this little bugger? Man, it slipped under my radar!

I found a killed (BUG-FIX #2): It only existed in the Low-Level IA-32 Abstraction C API/ABI. It had to do with the fact that the g_spinlock_ia32_table_mem, g_spinlock_ia32_table global variables and the spinlock_ia32_libinit(void) function were declared static. These dosen't work well with multiple translation units; you can get multiple copies of the table per-process. I added the 'spinlock-ia32.c' file to the library in order to keep a global copy of the locking table. Of course, this particular bug didn't show up before because all of my initial test applications only consisted of a single translation unit! I had to add this file: http://appcore.home.comcast.net/vzoom/refcount/spinlock-ia32.c And alter this one: http://appcore.home.comcast.net/vzoom/refcount/spinlock-ia32.h You should probably re-download: http://appcore.home.comcast.net/vzoom/refcount/refcount-ia32-0-0-0-2.zip Thank You.
participants (5)
-
Chris Thomasson
-
Chris Thomasson
-
David Abrahams
-
loufoque
-
Peter Dimov