
bool enqueue(T const & t) { node * n = alloc_node(t);
if (n == NULL) return false;
for (;;) { atomic_node_ptr tail (tail_); //1 read_memory_barrier(); //2 atomic_node_ptr next (tail->next); //3
if (likely(tail == tail_)) //4 { if (next.get_ptr() == 0) //5 { if ( tail->next.cas(next, n) )//6 { tail_.cas(tail, n); return true; } } else tail_.cas(tail, next.get_ptr()); } } }
1. You're missing write barrier after a call to alloc_node(). Rational: alloc_node sets the "next" pointer of the newly allocated node to NULL. That is a store to shared memory. Setting next pointer to NULL has to be observable to other threads BEFORE they observe the inserted node. If you don't insert a write barrier, it is possible for other threads to see the new element in the list before element's next pointer is initialized with NULL.
'n' (and n.next) isn't "published" until the CAS in line 6 - doesn't the CAS do a write barrier (or full barrier)? Tony