Re: [boost] [atomic] comments

31 Oct 2011

      On Monday 31 October 2011 19:29:35 Andrey Semashev wrote:
...
...
considering the cost of cmpxchg8b itself, the cost of a branch -- if done
correctly [1] -- is most likely immeasurable
Probably. But I'm a perfectionist. :)
me too, but if it does not have a measurable detriment, I consider it 
perfect :)
...
...
...
Unfortunately, cmpxchg16b is not as common as cmpxchg8b, so a dynamic
check would be desirable. However, I would prefer that there were no
if's like the one above. Perhaps, a global table of pointers to the
actual function implementations would be better. Initially pointers
should point to functions that perform cpuid and initialize this table
and then call the real functions for the detected hardware. This way we
eliminate almost all overhead in the long run, including call_once.
the processor most likely has more difficulties correctly predicting the
code flow through a register-indirect branch than a static one, so I am
not really sure this is cheaper, but it is in any case worth trying out
Yes, this needs testing, however I hope that unconditional jump should be
quite well predictable.
it's only predictable as long as it is in the BTB, as soon as it gets 
flushed -- out of luck

branch to static address, to out-of-line forward address to hit "predict not 
taken" default assumption on cold cache on the other hand is still 
essentially free
...
...
also, this would not be a "single" function pointer but a whole bunch of
them to cover the different atomic operations (reducing everything to CAS
generates more lock/unlock cycles in the fallback path otherwise)
Sure, like I said - a table of pointers.
since boost.atomic is (supposed) to stay a header-only library, there are 
cases where these will be instantiated multiple times -- the many different 
pointers may pressure the BTB unduly

Best regards
Helge

Re: [boost] [atomic] comments

Helge Bahmann