[boost] Fwd: Boost thread library bugs

16 Sep 2005


      I don't regularly read the boost mailing list - so please reply  
directly to me or Mike.
Sean

Begin forwarded message:
...
From: "Mike Schuster" <schuster@adobe.com>
Date: September 15, 2005 3:42:26 PM PDT
Subject: Boost thread library bugs
Here is a summary of several bugs I've discovered over the past few  
months in the Boost thread library (version 1_32_0). Sean, please  
forward this email to the Boost thread developers. Thanks.
1) On the PowerPC, the sequence of memory write operations executed  
by one processor may be seen by another processor or device in a  
different order. This weak write ordering property implies that  
when modifying a shared resource, the modifying processor must  
execute a sync instruction to make these modifications visible to  
all other processors before releasing the lock. I discovered  
several situations in the Boost thread library where a sync call is  
missing.
call_once: Immediately after the client function returns a lock  
variable is set to one. Other processors may see this lock equal to  
one before all memory write operations performed by the client  
function are completed. A call to __sync() should be made  
immediately prior to setting the lock to one.
synchronization class constructors (mutex, read_write_mutex,  
condition, etc): Once the class constructor returns, Boost provides  
an API where other threads are free to call the synchronization  
member functions. However, the memory write operations performed by  
the constructor may not have been completed when the member  
functions are executed by a different processor. So a call to __sync 
() should be made immediately prior to returning from the constructor.
Note that a similar situation occurs between member function calls.  
However the MacOS synchronization primitives used by Boost do  
perform a sync, so correct operation is guaranteed implicitly as  
long the last operation performed by a member function involves an  
OS synchronization primitive call. This appears to be the situation  
in many places, but there may be places in the Boost library where  
this requirement is not met. So someone needs to review all of the  
source code for problems of this sort.
2) On the PowerPC, I have seen situations where call_once deadlocks  
in the MPRemoteCall function. I have not been able to diagnose the  
problem. Deadlocks occur when call_once is executed by non-main  
threads. I believe I have a solution to the problem which uses a  
completely different implementation similar to that of the Win32  
version and avoids all calls to MPRemoveCall. Maybe I should submit  
this solution to the Boost developers for consideration.
3) I discovered a deadlock in read_write_mutex. If either of the  
alternating scheduling policies are used, the implementation will  
deadlock the first reader to arrive when no writers are active. The  
deadlock occurs in the function void  
read_write_mutex_impl<Mutex>::do_read_lock. If m_state == 0 and  
m_num_readers_to_wait == 0 (this holds immediately on  
construction), then an arriving reader will hang indefinitely on  
m_waiting_readers. There are other related situations where the  
BOOST_ASSERT on loop_count fails.
Note I am concerned that such a blatant flaw is present in the  
library. This implies that the library has not been very well  
tested. This is worrisome especially for a thread library where  
threading bugs can be extremely frustrating and hard for uses of  
the library to reproduce and diagnose.
-Mike Schuster

[boost] Fwd: Boost thread library bugs

Sean Parent