Re: [boost] read_write_mutex fundamentally broken in 1.33?

The trouble is what do we do with 1.33.1? We could either revert to the 1.32 version (if that is actually reliable, the current regression tests don't show up these problems so we don't really know), or we could yank it out of Boost entirely. I'm leaning towards yanking it out at present, but it's a drastic step, and this presupposes that the original authors of the class can't temporarily fix things for 1.33.1 (they all look to have either gone away or be too busy at present however).
What do other folks think?
I don't know either whether someone relies on the presence of rw mutex being in the boost. However if he/she is using it, the usage itself is dangerous anyhow.
The company I am working for is actually using timed rw mutex on a current project (and they seem to be working), so I for one would like them to stay in. I suspect timed locks aren't going to deadlock?
In my opinion it is better to omit it until it is fixed.
I am also not in the mood to dig through the sources to find out how the design was intended to work. I would also vote for basing the design on a theoretically sound ground (not claiming the current is not) that is understandable too.
While trying to find out the reason for the initial reported bug I found myself in a maze of hard to understand details. Perhaps it would even be better not to base the rw mutex on level 0 primitives only, but write it to the native API of the respective platform?
How should we proceed from here? Should the community come up with a list of alternative implementations that could be evaluated and used as a starting point? Or do we need to go back a step further and decide what types of mutexes or primitives are actually desired?

On 10/09/05, russell@eminence.com.au <russell@eminence.com.au> wrote: <snip> How should we proceed from here? Should the community come up with a list of alternative implementations that could be evaluated and used as a starting point? Or do we need to go back a step further and decide what types of mutexes or primitives are actually desired?
A candidate solution should be around in a few daze I hope. matt.

Matt Hurd <matt.hurd@gmail.com> writes:
On 10/09/05, russell@eminence.com.au <russell@eminence.com.au> wrote: <snip> How should we proceed from here? Should the community come up with a list of alternative implementations that could be evaluated and used as a starting point? Or do we need to go back a step further and decide what types of mutexes or primitives are actually desired?
A candidate solution should be around in a few daze I hope.
Attached is my latest offering, for win32. It still doesn't support timed locks or anything beyond basic read/write/upgradeable locks. However, it is faster than my earlier offering, since it uses interlocked operations to avoid OS locks where possible. With each attempt to lock, then the locking thread will wait until it can enter the "pending active" state. Only one thread is permitted in this state at a time. Once in this state, the locking thread will wait until the mutex state is correct for the lock to be acquired. The mutex state is managed as a state flag, and a reader count. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Sep 10, 2005, at 5:53 PM, Anthony Williams wrote:
Attached is my latest offering, for win32. It still doesn't support timed locks or anything beyond basic read/write/upgradeable locks. However, it is faster than my earlier offering, since it uses interlocked operations to avoid OS locks where possible.
With each attempt to lock, then the locking thread will wait until it can enter the "pending active" state. Only one thread is permitted in this state at a time. Once in this state, the locking thread will wait until the mutex state is correct for the lock to be acquired. The mutex state is managed as a state flag, and a reader count.
This looks like a significant improvement over your previous offering. But I'm still concerned. But admittedly my analysis is incomplete, so please forgive me if I've got it wrong: With both the previous release and this one it looks like those threads waiting on upgradable or exclusive access are essentially polling if another thread already enjoys such access. What I'm not seeing is a signal or condition variable upon which a thread sleeps after it enters the first gate and tries to acquire upgradable or exclusive access but discovers such access is not immediately available. I fear the polling behavior will defeat the optimizations that read/write/upgradable exist for in the first place. But I have not measured your code, so I could well be mistaken. -Howard

Howard Hinnant <hinnant@twcny.rr.com> writes:
On Sep 10, 2005, at 5:53 PM, Anthony Williams wrote:
Attached is my latest offering, for win32. It still doesn't support timed locks or anything beyond basic read/write/upgradeable locks. However, it is faster than my earlier offering, since it uses interlocked operations to avoid OS locks where possible.
With each attempt to lock, then the locking thread will wait until it can enter the "pending active" state. Only one thread is permitted in this state at a time. Once in this state, the locking thread will wait until the mutex state is correct for the lock to be acquired. The mutex state is managed as a state flag, and a reader count.
This looks like a significant improvement over your previous offering. But I'm still concerned. But admittedly my analysis is incomplete, so please forgive me if I've got it wrong:
With both the previous release and this one it looks like those threads waiting on upgradable or exclusive access are essentially polling if another thread already enjoys such access. What I'm not seeing is a signal or condition variable upon which a thread sleeps after it enters the first gate and tries to acquire upgradable or exclusive access but discovers such access is not immediately available. I fear the polling behavior will defeat the optimizations that read/write/upgradable exist for in the first place. But I have not measured your code, so I could well be mistaken.
You're half right! If the threads block on trying to enter a new state, then they wait on the mutex_state_sem. This semaphore is triggered every time a lock is released, and again if a waiting thread is awakened but cannot satisfy it's condition. This can lead to a thread polling, if it is the only waiting thread, because once it has been awakened, it will keep releasing the semaphore and reacquiring it until it's condition can be satisfied. Thanks for your comments, I guess there's work to be done! Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams <anthony_w.geo@yahoo.com> writes:
Howard Hinnant <hinnant@twcny.rr.com> writes:
On Sep 10, 2005, at 5:53 PM, Anthony Williams wrote:
Attached is my latest offering, for win32. It still doesn't support timed locks or anything beyond basic read/write/upgradeable locks. However, it is faster than my earlier offering, since it uses interlocked operations to avoid OS locks where possible.
With each attempt to lock, then the locking thread will wait until it can enter the "pending active" state. Only one thread is permitted in this state at a time. Once in this state, the locking thread will wait until the mutex state is correct for the lock to be acquired. The mutex state is managed as a state flag, and a reader count.
This looks like a significant improvement over your previous offering. But I'm still concerned. But admittedly my analysis is incomplete, so please forgive me if I've got it wrong:
With both the previous release and this one it looks like those threads waiting on upgradable or exclusive access are essentially polling if another thread already enjoys such access. What I'm not seeing is a signal or condition variable upon which a thread sleeps after it enters the first gate and tries to acquire upgradable or exclusive access but discovers such access is not immediately available. I fear the polling behavior will defeat the optimizations that read/write/upgradable exist for in the first place. But I have not measured your code, so I could well be mistaken.
You're half right! If the threads block on trying to enter a new state, then they wait on the mutex_state_sem. This semaphore is triggered every time a lock is released, and again if a waiting thread is awakened but cannot satisfy it's condition.
This can lead to a thread polling, if it is the only waiting thread, because once it has been awakened, it will keep releasing the semaphore and reacquiring it until it's condition can be satisfied.
Thanks for your comments, I guess there's work to be done!
Attached is a new implementation. It solves this problem by only releasing the state-change semaphore if there is another thread waiting to change states. It is possible that two waiting, but not satisfied, threads could therefore repeatedly wake each other up. I can't think of a better solution for now Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams <anthony_w.geo@yahoo.com> writes:
Howard Hinnant <hinnant@twcny.rr.com> writes:
With both the previous release and this one it looks like those threads waiting on upgradable or exclusive access are essentially polling if another thread already enjoys such access. What I'm not seeing is a signal or condition variable upon which a thread sleeps after it enters the first gate and tries to acquire upgradable or exclusive access but discovers such access is not immediately available. I fear the polling behavior will defeat the optimizations that read/write/upgradable exist for in the first place. But I have not measured your code, so I could well be mistaken.
You're half right! If the threads block on trying to enter a new state, then they wait on the mutex_state_sem. This semaphore is triggered every time a lock is released, and again if a waiting thread is awakened but cannot satisfy it's condition.
This can lead to a thread polling, if it is the only waiting thread, because once it has been awakened, it will keep releasing the semaphore and reacquiring it until it's condition can be satisfied.
With thanks to Roland Schwarz, who's condition implementation gave me the inspiration I needed, I attach a new implementation of read-write-mutexes for win32. The idea is to have a second gate on the mutex states --- if a thread is unable to make the change it requires immediately, then it must enter the gate. This allows the thread to again check whether it can enter the new state, and if not, increment the "waiting threads" count without fear of interruption, or of missing a wakeup. It can then exit the gate, and wait for notification. When a thread transitions into a state that allows other threads in other states (i.e. every state except the exclusive writing state, and the acquiring_XXX or releasing_XXX states), then it must notify waiting threads. To do this, it must first enter the gate, and check for waiting threads. If there are none, it exits the gate. If there are some, then it leaves the gate locked, and notifies the waiting threads (using a semaphore to release the appropriate number of threads). When a thread is woken up, it decreases the waiting thread count. If it is the last thread to be woken, then it unlocks the gate. This ensures that only one notification is in progress at once. Whereas the problems Howard described above were real --- in long running tests, my old implementation consumed near 100% CPU usage --- the new implementation does not suffer from this problem, and consumes very little CPU time. Not only that, but the overall runtime is less, too. Any comments? Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote: [...]
Any comments?
Read-write stuff aside for a moment, regarding lightweight_mutex and its lazy init, see http://groups.google.de/group/comp.programming.threads/msg/7a53a47294eacbda?... http://groups.google.de/group/comp.programming.threads/msg/930d8b1be9069d21?... Note that this sketch doesn't provide POSIX safety with respect to mutex destruction. regards, alexander.

The company I am working for is actually using timed rw mutex on a current project (and they seem to be working), so I for one would like them to stay in. I suspect timed locks aren't going to deadlock?
Unfortunately the test case that finally broken the camels back involves timed locks on a read write mutex, and yes, timed locks can deadlock especially if used in combination with non-timed locks (see http://article.gmane.org/gmane.comp.lib.boost.devel/130787 for the original message and test case). Of course this all refers to 1.33 I haven't tested 1.32. John.
participants (7)
-
Alexander Terekhov
-
Anthony Williams
-
Howard Hinnant
-
John Maddock
-
Matt Hurd
-
Neal Becker
-
russell@eminence.com.au