
Alexander Terekhov wrote:
Peter Dimov wrote: [...]
If you are interested, please take a look at the file
Looks correct, but sill not quite optimal to my taste.
A) There's no need to hinder compiler's ability to cache/reorder across increments. So you need neither __volatile__ nor "memory" clobber in increments case (lock prefix is still needed to ensure MP safety of competing read-modify-write operations).
Yep. This doesn't make a difference in my tests here (single Athlon).
B) Something branchless is better for unconditional increments.
xadd is branchless; it just returns the old value, whereas inc doesn't. MSVC always generates lock xadd, even for _InterlockedIncrement, BTW. So there's probably no difference between the two. But I don't have a P4 or an Athlon 64 here to verify that. If someone wants to play, the version in the CVS now has atomic_increment; uncomment the one-liner at the top and comment out the __asm__ statement to compare the performance of the two versions.
C) In the case of decrements on weak_count, there's no need to make all clients pay the price of rather expensive interlocked operation even if they don't use weak pointers. I'd use "may not store zero" decrement. You'll need __volatile__ and "memory" as compiler fence, and as for hardware, that initial load does have acquire semantics and lock cmpxchg does have "msync::hsb" which we need here.
I wanted to get it to work first ;-) void release() // nothrow { if( atomic_exchange_and_add( &use_count_, -1 ) == 1 ) { dispose(); if( (long volatile&)weak_count_ == 1 ) // no weak ptrs { destroy(); } else { weak_release(); } } } ?
P.S. When are you going to kick start an incarnation for Itanic with value dependent cmpxchg.rel-vs-cmpxchg.acq? ;-)
IA64 assembly by hand? No thanks. I'll probably use _Interlocked* on Intel and __sync_* on g++. But x86 and PPC (CW and g++ versions) have priority over IA64.