
Peter Dimov wrote:
Alexander Terekhov wrote:
and
asm long atomic_decrement_strong( register long * pw ) {
[... loop -> loop1+loop2 ...]
but it's either suboptimal (more than one sync) or incorrect (missing sync), I think. It needs a state machine.
And how is this asm long atomic_decrement_strong( register long * pw ) { <load-reserved> <add -1> <branch if zero to acquire> {lw}sync loop1: <store-conditional> <branch if !failed to done> loop2: <load-reserved> <add -1> <branch if !zero to loop1> acquire: <store-conditional> <branch if failed to loop2> isync done: <...> } incorrrect or suboptimal?
loop0:
lwarx add -1 beq acquire-without-sync
sync
loop1:
stwcx. beq+ done
loop2:
lwarx add -1 bne loop1
acquire-with-sync:
stwcx. bne- loop2 isync blr
acquire-without-sync:
stwcx. bne- loop0 isync
done:
blr
I must be missing something, but it looks to me that you have way too much branching and isync-ing. regards, alexander.