Re: [boost] [proposal] AtomicallyThread-SafeReferenceCountingAlgorithm...

22 Oct 2006

      "Peter Dimov" <pdimov@mmltd.net> wrote in message 
news:002a01c6f498$3e7a0bf0$6507a8c0@pdimov2...
...
Chris Thomasson wrote:
...
I think my scheme could possibly be more efficient
[...]
...
...
Humm... I would have to see a sketch of the algorithm you have in mind 
Peter...
...
Basically... using copy/replace on a shared_ptr would make it behave
atomically ('global') and using the other ops would make it non-atomic
('local'/semi-local).
[...]
...
As I understand your scheme, you take a spinlock in your equivalent of the
copy operation (in atomic_add_load_strong).
Yes.
...
However you are able to avoid
the write lock in the replace operation by deferring it until the refcount
reaches zero,
Exactly.
...
something like (best guess):
[...]
...
except I don't see the "re-check for zero" part in your assembly, so I 
might
be missing something; how do you protect against that scenario?
Here is some more detailed pseudo-code I posted in response to Joe Seigh:

http://groups.google.com/group/comp.programming.threads/msg/224aa9d097f4300e

The basic technique for 'copying' is as follows:

rc* copy(rc **sloc, int count) {
  1: load ptr from *sloc
  if ptr is null goto 2
  lock spinlock associated with ptr
  re-load ptr from *sloc
  compare w/ previous load
  if not equal unlock and goto 1
  XADD ptr's with count
  * if the previous value was less than 1 unlock and goto 1
  unlock
2: return ptr
}

It also good to keep the following in mind:

*: If the value was less than 1, that means that we detected a drop to zero 
condition on the 'ptr'. Keep in mind that we are locked on the spinlock that 
is associated with 'ptr'. The decrement thread always locks before it 
destroys, so it will have to wait for us. It also means that all of the 
shared locations that previously contained a pointer value equal to 'ptr' 
are now either null or have **changed to another value; we fail, and try 
again.

The decrement logic looks like this:

bool dec(rc *ptr, int count) {
  XADD ptr's refs with negated count;
  if new value is greater than 0 then return false;
  D1: lock spinlock associated with ptr;
  unlock spinlock associated with ptr;
  call ptr dtor;
  return true;
}

If D1 is reached, that means we have to lock the spinlock for 'ptr'. This 
must be done because there could be an increment thread(s) trying to 
increment; if any of them has it locked, we have to wait until they 
fail-and-unlock: If there is an increment thread that has locked, that means 
that it will fail because it will notice that it tried to decrement a value 
that was less than 1. If there happens to be an increment thread that has 
loaded a pointer value that is equal to 'ptr' and is waiting or just about 
to lock the same spinlock, then it will also fail; it's re-load and compare 
will ensure that.

**: If there happens to be an increment thread that has loaded a pointer 
value that is equal to 'ptr' and is waiting or just about to lock the same 
spinlock, and the decrement thread runs the dtor and the user-application 
reuses the 'ptr' and swaps it into the exact location that the increment 
thread loaded its pointer from, then all is well. The increment threads 
lock-and-compare will succeed, and the increment will attempt to proceed. 
This algorithm is ABA proof.

My documentation is almost complete; it should help clear things up. Did my 
description address some of your initial concerns?
...
I can't easily do that for shared_ptr since its swap is not atomic
(shared_ptr is double-wide) but it could be possible...
Yes. IIRC, you mentioned that you could use DWCAS to modify a shared 
location that contains a shared_ptr. However, as we know, DWCAS will 
introduce your algorithm to a rather annoying portability problem:

http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5

;^)
...
but I'm not sure whether it's worth it, since in a typical application the 
copy ops will
outnumber the replace ops, so avoiding only the write lock may not be that
important.
IMHO, anytime you can reduce the number of atomic operations and/or memory 
barriers is a plus...

:^)