
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:002a01c6f498$3e7a0bf0$6507a8c0@pdimov2...
Chris Thomasson wrote:
I think my scheme could possibly be more efficient
[...]
Humm... I would have to see a sketch of the algorithm you have in mind Peter...
Basically... using copy/replace on a shared_ptr would make it behave atomically ('global') and using the other ops would make it non-atomic ('local'/semi-local).
[...]
As I understand your scheme, you take a spinlock in your equivalent of the copy operation (in atomic_add_load_strong).
Yes.
However you are able to avoid the write lock in the replace operation by deferring it until the refcount reaches zero,
Exactly.
something like (best guess):
[...]
except I don't see the "re-check for zero" part in your assembly, so I might be missing something; how do you protect against that scenario?
Here is some more detailed pseudo-code I posted in response to Joe Seigh: http://groups.google.com/group/comp.programming.threads/msg/224aa9d097f4300e The basic technique for 'copying' is as follows: rc* copy(rc **sloc, int count) { 1: load ptr from *sloc if ptr is null goto 2 lock spinlock associated with ptr re-load ptr from *sloc compare w/ previous load if not equal unlock and goto 1 XADD ptr's with count * if the previous value was less than 1 unlock and goto 1 unlock 2: return ptr } It also good to keep the following in mind: *: If the value was less than 1, that means that we detected a drop to zero condition on the 'ptr'. Keep in mind that we are locked on the spinlock that is associated with 'ptr'. The decrement thread always locks before it destroys, so it will have to wait for us. It also means that all of the shared locations that previously contained a pointer value equal to 'ptr' are now either null or have **changed to another value; we fail, and try again. The decrement logic looks like this: bool dec(rc *ptr, int count) { XADD ptr's refs with negated count; if new value is greater than 0 then return false; D1: lock spinlock associated with ptr; unlock spinlock associated with ptr; call ptr dtor; return true; } If D1 is reached, that means we have to lock the spinlock for 'ptr'. This must be done because there could be an increment thread(s) trying to increment; if any of them has it locked, we have to wait until they fail-and-unlock: If there is an increment thread that has locked, that means that it will fail because it will notice that it tried to decrement a value that was less than 1. If there happens to be an increment thread that has loaded a pointer value that is equal to 'ptr' and is waiting or just about to lock the same spinlock, then it will also fail; it's re-load and compare will ensure that. **: If there happens to be an increment thread that has loaded a pointer value that is equal to 'ptr' and is waiting or just about to lock the same spinlock, and the decrement thread runs the dtor and the user-application reuses the 'ptr' and swaps it into the exact location that the increment thread loaded its pointer from, then all is well. The increment threads lock-and-compare will succeed, and the increment will attempt to proceed. This algorithm is ABA proof. My documentation is almost complete; it should help clear things up. Did my description address some of your initial concerns?
I can't easily do that for shared_ptr since its swap is not atomic (shared_ptr is double-wide) but it could be possible...
Yes. IIRC, you mentioned that you could use DWCAS to modify a shared location that contains a shared_ptr. However, as we know, DWCAS will introduce your algorithm to a rather annoying portability problem: http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5 ;^)
but I'm not sure whether it's worth it, since in a typical application the copy ops will outnumber the replace ops, so avoiding only the write lock may not be that important.
IMHO, anytime you can reduce the number of atomic operations and/or memory barriers is a plus... :^)