On 4 Dec 2014 at 17:08, Gavin Lambert wrote:
? The only difference is that this assumes that acquiring the lock should succeed most of the time, so it skips the initial speculative relaxed load. It still avoids spinning directly on the exchange.
That form has a guaranteed cache line invalidation. It might be a win on a dual core CPU, I would doubt a win on a heavily contended eight core CPU. As far as NUMA goes though, it might be indeed more fair.
(Also please correct me if I'm wrong but I thought on x86 at least relaxed and acquire have similar performance anyway, so there's no benefit to doing an initial relaxed read.)
On Intel the least strong read you can do is an acquire. It makes me very wary of ever using atomics with relaxed because you simply can't test them on Intel. It was a big reason I invested in that ARM board. The initial relaxed read I just tested there now and when paired with an exchange I see a 8.7% speed bump over a straight exchange with no relaxed precheck. Though it does depend on how much work you do inside the spinlock. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/