
"Helge Bahmann" <hcb@chaoticmind.net> wrote in message news:alpine.DEB.1.10.0912211548590.31425@m65s28.vlinux.de...
On Mon, 21 Dec 2009, Chris M. Thomasson wrote:
object = m_buffer[i].exchange(object, memory_order_release);
if (object) { atomic_thread_fence(memory_order_acquire); }
return object;
[...]
T* pop() { T* object = m_buffer[m_tail].exchange(NULL, memory_order_acquire);
if (object) { m_tail = (m_tail == T_depth - 1) ? 0 : (m_tail + 1); }
return object;
you generally do not need "memory_order_acquire" when dereferencing an "atomically published" pointer -- "memory_order_consume" suffices to provide the proper ordering for all operations that are data-dependent on the pointer value (and any dereference obviously needs the value to compute the memory address to access).
This is faster on any non-x86 and non-alpha by a fair amount.
Of course you are right. For some reason, I was thinking that `memory_order_consume' would boil down to a: MEMBAR #LoadLoad on a SPARC. The name was confusing me. Perhaps it should be named `memory_order_depends' or something... BTW, where is `memory_order_produce'? ;^) I don't think I can use C++0x memory ordering to achieve simple #LoadLoad and #StoreStore barriers without the #LoadStore constraint. Am I right?