
John Torjo <john.lists@torjo.com> wrote: <snip>
But now, by looking again at atomic_conditional_increment. I'm not a threading expert, but it seems pretty costly as it is implemented now. Could it not go multiple times (n>=2) thorughout the for?
Yes.
Current implementation:
inline long atomic_conditional_increment(long volatile & value) { for(;;) { long tmp = value; if(tmp == 0) return 0; if(InterlockedCompareExchange(&value, tmp + 1, tmp) == tmp) return tmp + 1; } }
Couldn't it become:
inline long atomic_conditional_increment(long volatile & value) { long tmp = value;
Suppose we read the count as 2 here...
if(tmp == 0) return 0;
...and that another thread decrements the count to 1 around here...
long new_val = InterlockedCompareExchange(&value, tmp + 1, tmp); if(new_val == tmp) return tmp + 1;
...so that InterlockedCompareExchange fails and returns 1, so this doesn't return; and suppose that another thread decrements the count to 0 around here...
else return InterlockedIncrement(&value);
...then this increments the count to 1, which is wrong.
}
The increment has to be conditional on the count being non-zero, and neither the Win32 API nor the x86 processor (AFAIK) can perform that atomically. So the loop is unavoidable. There is room for a slight optimisation in that the value returned by InterlockedCompareExchange can be used the next time round: inline long atomic_conditional_increment(long volatile & value) { long old = value, older; do { if (old == 0) return 0; older = old; old = InterlockedCompareExchange(&value, old, old + 1); } while (old != older); return old + 1; }