
Alexander Terekhov wrote:
Peter Dimov wrote: [...]
If you are interested, please take a look at the file
Looks correct, but sill not quite optimal to my taste.
A) There's no need to hinder compiler's ability to cache/reorder across increments. So you need neither __volatile__ nor "memory" clobber in increments case (lock prefix is still needed to ensure MP safety of competing read-modify-write operations).
B) Something branchless is better for unconditional increments.
And unconditional decrements of use_count. I think that "lock inc" without __volatile__ and "memory" clobber for increments, and "lock dec" with __volatile__ and "memory" clobber for decrements of use_count, is the most optimal.
C) In the case of decrements on weak_count, there's no need to make all clients pay the price of rather expensive interlocked operation even if they don't use weak pointers. I'd use "may not store zero" decrement. You'll need __volatile__ and "memory" as compiler fence, and as for hardware, that initial load does have acquire semantics and lock cmpxchg does have "msync::hsb" which we need here.
It's probably better to get rid of cmpxchg loop for weak_count decrements as well. Just do load, compare it with 1, and (if !ZF) "lock dec" it (all inside one __asm__ __volatile__ with "memory" clobber). So now it's safe to get rid of __volatile__ and "memory" clobber for conditional increment in add_ref_lock() using "lock cmpxchg".
[...]
thanks to Alexander Terekhov
Thanks to you for helping me to identify the problem with lock() (lack of inter-thread ordering with respect to that access and destruction of control block when the object is expired) in my original sketch.
regards, alexander.
P.S. When are you going to kick start an incarnation for Itanic with value dependent cmpxchg.rel-vs-cmpxchg.acq? ;-)
and mere {re-}load.acq instead of load->cmpxchg.acq (when count value == 1) in weak_release() with cmpxchg.acq in add_ref_lock(). regards, alexander.