
On Mon, Mar 16, 2009 at 2:09 PM, Anthony Williams <anthony.ajw@gmail.com>wrote:
Tim Blechmann <tim@klingt.org> writes:
Suppose there are two threads calling deqeue() concurrently. If there is an item in the queue, both will proceed identically down to line 137, as neither thread modifies the queue data before then. Now suppose one thread (A) gets preempted, but the other (B) proceeds. Thread B will read next->data and assign it to ret. It will then update head_ with CAS (which will succeed as no other thread is modifying/has modified the data structure), and *deallocate the node* at line 141. This will destroy next->data. Now, when thread A wakes up it will try and read next->data => reading destroyed object, undefined behaviour. dealloc_node in line 141 does not free the node, but pushes it to a freelist stack ... from my understanding, this leads to an increased memory usage ... allocated nodes are not freed until the fifo is destroyed, but reused for new nodes ... so memory reclamation should be safe ... or am i missing something?
Your dealloc_node is:
void dealloc_node(node * n) { n->~node(); pool.deallocate(n); }
So the node is destroyed (by the destructor) even if the memory is not deallocated.
i see your point ... restricting the use of the fifo queue to PODs should work around this issue ...
Yes, in that you can remove the node destructor from dealloc_node, and thus you no longer have undefined behaviour.
However, the node could still be reused by another thread calling push() before our thread A wakes up. In this case, you could have an ABA problem --- thread A has a pointer to node X which originally had a next ptr of Y, but in the mean time thread B has deallocated node X. Node X may subsequently be reused with a next ptr of node Z. If it node X becomes the head node before thread A wakes up, then the CAS on line 139 will succeed, despite the fact that node X has been deallocated and reallocated since, and the CAS will store the wrong "next" value (Y rather than Z), thus corrupting the queue.
Thus, you cannot reuse node X unless you can be sure there is no thread in the position of thread A. With the code as it stands, you cannot know that, as thread A has not done any writing in dequeue().
Since you are using tagged pointers, you could increment the tag value whenever you reuse a node, which will at least ensure that the CAS in thread A fails if there is an ABA issue, since the tag part of the pointer will be different. Since the tag has a finite size, this is still a problem in theory, as the tag value might wrap around before thread A resumes, but in practice it's probably OK.
Alternatively, you could use hazard pointers, reference-counted pointers (as per the implementation I posted), reference counting on the whole queue (last one out deletes any dequeued nodes), or another memory reclamation scheme.
That's why my lock-free queue in c# was comparatively easy to implement,although
GC doesn't necessarily prevents the ABA problem in every situation, for a fifo queue with enqueue and dequeue it suffices. Anyway don't you think that confining T to be POD type a little bit of restrictive?