
Gottlob Frege wrote:
bool enqueue(T const & t) { node * n = alloc_node(t);
if (n == NULL) return false;
for (;;) { atomic_node_ptr tail (tail_); //1 read_memory_barrier(); //2 atomic_node_ptr next (tail->next); //3
if (likely(tail == tail_)) //4 { if (next.get_ptr() == 0) //5 { if ( tail->next.cas(next, n) )//6 { tail_.cas(tail, n); return true; } } else tail_.cas(tail, next.get_ptr()); } } }
1. You're missing write barrier after a call to alloc_node(). Rational: alloc_node sets the "next" pointer of the newly allocated node to NULL. That is a store to shared memory. Setting next pointer to NULL has to be observable to other threads BEFORE they observe the inserted node. If you don't insert a write barrier, it is possible for other threads to see the new element in the list before element's next pointer is initialized with NULL.
'n' (and n.next) isn't "published" until the CAS in line 6 - doesn't the CAS do a write barrier (or full barrier)?
That's exactly the point - n.next has to be observable (or "published") before CAS inserts the element into the queue, whose results are immediate. As far as whether CAS does a mem barrier or not, I think it's not specified. If you know otherwise - please let me know. In that case I'm wrong and you don't need a write barrier here. But just looking at the implementations of CAS, I think it looks like some of them don't do membars. Granted, since x86 memory model is so strict you don't need any kinds of memory barriers, but on PowerPC, for example, CAS will be implemented using lwarx/stwcx. The documentation on these doesn't say anything about barriers.
Tony
Andy.