Re: [boost] Re: Lock-free shared_ptr on g++/x86, test/review needed

4 Apr 2005

      Alexander Terekhov wrote:
...
Peter Dimov wrote:
[...]
...
If you are interested, please take a look at the file
Looks correct, but sill not quite optimal to my taste.
A) There's no need to hinder compiler's ability to cache/reorder
across increments. So you need neither __volatile__ nor "memory"
clobber in increments case (lock prefix is still needed to ensure
MP safety of competing read-modify-write operations).
Yep. This doesn't make a difference in my tests here (single Athlon).
...
B) Something branchless is better for unconditional increments.
xadd is branchless; it just returns the old value, whereas inc doesn't. MSVC 
always generates lock xadd, even for _InterlockedIncrement, BTW. So there's 
probably no difference between the two. But I don't have a P4 or an Athlon 
64 here to verify that. If someone wants to play, the version in the CVS now 
has atomic_increment; uncomment the one-liner at the top and comment out the 
__asm__ statement to compare the performance of the two versions.
...
C) In the case of decrements on weak_count, there's no need to
make all clients pay the price of rather expensive interlocked
operation even if they don't use weak pointers. I'd use "may
not store zero" decrement. You'll need __volatile__ and "memory"
as compiler fence, and as for hardware, that initial load does
have acquire semantics and lock cmpxchg does have "msync::hsb"
which we need here.
I wanted to get it to work first ;-)

    void release() // nothrow
    {
        if( atomic_exchange_and_add( &use_count_, -1 ) == 1 )
        {
            dispose();

            if( (long volatile&)weak_count_ == 1 ) // no weak ptrs
            {
                destroy();
            }
            else
            {
                weak_release();
            }
        }
    }

?
...
P.S. When are you going to kick start an incarnation for Itanic
with value dependent cmpxchg.rel-vs-cmpxchg.acq? ;-)
IA64 assembly by hand? No thanks. I'll probably use _Interlocked* on Intel 
and __sync_* on g++. But x86 and PPC (CW and g++ versions) have priority 
over IA64.