read_write_mutex fundamentally broken in 1.33?

I've been trying to solve the deadlock problem reported here: http://lists.boost.org/Archives/boost/2005/08/92307.php but I've simply run into more and more problems, and just about every operation I've tried will deadlock if you try hard enough. THIS IS NOT GOOD. Applying Roland Swartz's suggested fix everywhere that looked appropriate initially seemed to fix the issue, but once you move to more complex situations (both readers and writers) then the deadlocks simply come back again. At this point I've exhausted my understanding of the problem and the design of the lock. As an aside: it is *very* hard to implement something like this correctly: no amount of debugging will ever get you a reliable system, only a clearly understood design backed up by a solid theoretical understanding of the implementation method (formal methods anyone?). To be honest I have to question whether it's even a good idea for this to be in Boost (no slur intended towards the implementers of that class, it's the difficulty of the problem that's the issue). In case anyone wants to pursue this further I'm attaching my current test case, and the current patches I have to the implementation, however, please note that neither this test case, nor the current regression tests are anything like comprehensive, THEY WILL NOT CATCH ALL POSSIBLE PROBLEMS. Sorry about the capital letters folks, but I suspect we have a "Huston, we have a problem" moment here :-( Regards, John.

On Sep 8, 2005, at 6:45 AM, John Maddock wrote:
I've been trying to solve the deadlock problem reported here: http://lists.boost.org/Archives/boost/2005/08/92307.php but I've simply run into more and more problems, and just about every operation I've tried will deadlock if you try hard enough.
THIS IS NOT GOOD.
Applying Roland Swartz's suggested fix everywhere that looked appropriate initially seemed to fix the issue, but once you move to more complex situations (both readers and writers) then the deadlocks simply come back again. At this point I've exhausted my understanding of the problem and the design of the lock. As an aside: it is *very* hard to implement something like this correctly: no amount of debugging will ever get you a reliable system, only a clearly understood design backed up by a solid theoretical understanding of the implementation method (formal methods anyone?). To be honest I have to question whether it's even a good idea for this to be in Boost (no slur intended towards the implementers of that class, it's the difficulty of the problem that's the issue).
In case anyone wants to pursue this further I'm attaching my current test case, and the current patches I have to the implementation, however, please note that neither this test case, nor the current regression tests are anything like comprehensive, THEY WILL NOT CATCH ALL POSSIBLE PROBLEMS.
Sorry about the capital letters folks, but I suspect we have a "Huston, we have a problem" moment here :-(
I've taken a look at thread_scap.cpp, but not read_write_mutex.cpp. I don't have any suggestions on why the deadlock except to say that I see nothing wrong with your test, and have run it without any significant problems on a Freescale implementation of read/write locks. I do have a comment about the interface of the read/write mutex/lock though. I have a use case that looks like: class A { public: // ... A& operator=(const A& a); // ... private: typedef ... Mutex; typedef Mutex::... WriteLock; typedef Mutex::... ReadLock; mutable Mutex mut_; // ... }; A& A::operator=(const A& a) { if (this != &a) { // Here I want to read-lock "a" and write-lock "*this", and // of course not dead-lock. How do I do this? } return *this; } A naive (incorrect) implementation using the interface I gleaned from thread_scap.cpp would look like: class A { public: // ... A& operator=(const A& a); // ... private: typedef timed_read_write_mutex Mutex; typedef Mutex::scoped_timed_read_write_lock WriteLock; typedef Mutex::scoped_timed_read_write_lock ReadLock; mutable Mutex mut_; // ... }; A& A::operator=(const A& a) { if (this != &a) { ReadLock rlock(a.mut_, read_write_lock_state::unlocked); WriteLock wlock(mut_, read_write_lock_state::unlocked); rlock.read_lock(); wlock.write_lock(); // OOPS! DEADLOCK! } return *this; } You can't lock two locks willy-nilly like this. So ideally what I'd like to do is create a generic lock function that will lock multiple locks without danger of deadlock: template <class TryLock1, class TryLock2> void lock(TryLock1& l1, TryLock2& l2); and maybe also: template <class TryLock1, class TryLock2, class TryLock3> void lock(TryLock1& l1, TryLock2& l2, TryLock3& l3); and etc. And then call that: A& A::operator=(const A& a) { if (this != &a) { ReadLock rlock(a.mut_, read_write_lock_state::unlocked); WriteLock wlock(mut_, read_write_lock_state::unlocked); lock(rlock, wlock); // won't deadlock // Ok! } return *this; } Now I can solve my two-lock locking algorithm once and for all in generic code, and reuse it whenever, wherever. Problem: With the current interface, my "generic" lock algorithm is no longer generic. It has to know about read_lock() and write_lock(), even though these concepts have absolutely nothing to do with how you lock two locks while avoiding deadlock. Here is how I've solved this problem: class A { public: // ... A& operator=(const A& a); // ... private: typedef Metrowerks::rw_mutex Mutex; typedef Mutex::scoped_lock WriteLock; typedef Mutex::sharable_lock ReadLock; mutable Mutex mut_; // ... }; A& A::operator=(const A& a) { if (this != &a) { ReadLock rlock(a.mut_, defer_lock); WriteLock wlock(mut_, defer_lock); lock(rlock, wlock); // Ok! } return *this; } That is, there is a read-lock and there is a write-lock, and they are not the same type. These two different types have a generic locking interface: lock(), unlock(), try_lock(), etc. That is, they meet a "lock concept". Now my two-lock-deadlock-avoiding function doesn't need to know about read locks and write locks. All it has to know is the generic interface of locks (the lock concept). It will work with two read locks, two write locks, a read lock and a write lock, or any other two types of locks that meet the lock concept. See http://home.twcny.rr.com/hinnant/cpp_extensions/threads_move.html for details of this interface. The important part (for this discussion) is that there are several types of locks, but they all have this interface (or a subset of it): void lock(); bool try_lock(); bool timed_lock(const elapsed_time& elps_time); void unlock(); I'm arguably missing: bool timed_lock(const universal_time& unv_time); but that's beside the point. The point is that generic locking code is going to be important, especially when read/write locks are thrown into the mix (upgradable locks too anyone?). Without a generic interface, generic locking code is going to be overly cumbersome to write. The use case I have above is most basic (surely there will be many classes with a per-object lock that want to write a thread-safe assignment operator?), and it already begs for a generic lock algorithm. Sorry, but: Houston, we have an even bigger problem... -Howard

On Sep 8, 2005, at 6:45 AM, John Maddock wrote:
I've been trying to solve the deadlock problem reported here: http://lists.boost.org/Archives/boost/2005/08/92307.php but I've simply run into more and more problems, and just about every operation I've
Howard Hinnant wrote: tried will deadlock if you try hard enough.
THIS IS NOT GOOD.
I can add that I have run across this bug in the wild. We were using read_write_mutex in some code, and we started seeing several threads locking up waiting on either a read lock or a write lock when the application was under load. There was no reason I could see that at least one thread couldn't make progress. I changed to a normal mutex and the problem went away. -- Timothy Ritchey Chief Technical Officer Paragent, LLC 1317 West Marsh Street Muncie, IN 47306 (765) 285-4913

You can't lock two locks willy-nilly like this. So ideally what I'd like to do is create a generic lock function that will lock multiple locks without danger of deadlock:
template <class TryLock1, class TryLock2> void lock(TryLock1& l1, TryLock2& l2);
and maybe also:
template <class TryLock1, class TryLock2, class TryLock3> void lock(TryLock1& l1, TryLock2& l2, TryLock3& l3);
Yep, those would be useful additions to Boost.Thread IMO.
Problem: With the current interface, my "generic" lock algorithm is no longer generic. It has to know about read_lock() and write_lock(), even though these concepts have absolutely nothing to do with how you lock two locks while avoiding deadlock.
Ah, that's what happens from looking at my not so good test case, rather than the docs :-) try_read_write_mutex does have member typedefs try_read_lock and try_write_lock, which do what you want them to, I just happened to use a different locking primitive because it was easier to change the test code around to test different combinations of readers and writers.
Sorry, but: Houston, we have an even bigger problem...
Not this time, still it was big enough to begin with :-( John.

On Sep 9, 2005, at 7:34 AM, John Maddock wrote:
template <class TryLock1, class TryLock2> void lock(TryLock1& l1, TryLock2& l2);
and maybe also:
template <class TryLock1, class TryLock2, class TryLock3> void lock(TryLock1& l1, TryLock2& l2, TryLock3& l3);
Yep, those would be useful additions to Boost.Thread IMO.
Problem: With the current interface, my "generic" lock algorithm is no longer generic. It has to know about read_lock() and write_lock(), even though these concepts have absolutely nothing to do with how you lock two locks while avoiding deadlock.
Ah, that's what happens from looking at my not so good test case, rather than the docs :-)
try_read_write_mutex does have member typedefs try_read_lock and try_write_lock, which do what you want them to, I just happened to use a different locking primitive because it was easier to change the test code around to test different combinations of readers and writers.
Sorry, but: Houston, we have an even bigger problem...
Not this time, still it was big enough to begin with :-(
As a simple exercise, code this: template <class TryLock1, class TryLock2> void lock(TryLock1& l1, TryLock2& l2); Here is one possible implementation: template <class TryLock1, class TryLock2> void lock(TryLock1& l1, TryLock2& l2) { while (true) { l1.lock(); if (l2.try_lock()) break; l1.unlock(); l2.lock(); if (l1.try_lock()) break; l2.unlock(); } } Another possibility would be to assume an ordering functionality among the locks: template <class Lock1, class Lock2> void lock(Lock1& l1, Lock2& l2) { if (l1 < l2) { l1.lock(); l2.lock(); } else { l2.lock(); l1.lock(); } } The former assumes the locks know how to lock(), try_lock() and unlock(). The latter assumes the locks are less-than-comparable, and know about lock(). Neither of these assume try_read_lock or try_write_lock. Read locking and write locking are issues orthogonal to this generic code. If this generic 2-lock function must know how to read-lock and write-lock (and perhaps even promote from read to write lock), things get significantly messier. In the design I've laid out, the generic 2-lock algorithm can deal with two exclusive locks, two read locks, two upgradable locks, or any combination of the above (modulo the ordering functionality -- that's a different soap box ;-) ). You could even atomically lock an exclusive lock, while promoting another one from read to write using this same generic code (using a promote_lock adaptor not yet presented). And if you don't do something like the above, you are subject to deadlock if a thread tries to hold one lock while promoting another. A lock (read, scoped, whatever) is either locked or unlocked. You can try-lock it or timed-lock it. But you can't read-lock it, write-lock it or any other kind of lock. You can only lock it or unlock it. It is the mutex that holds the read/write magic, not the lock: template <class RW_Mutex> class sharable_lock { public: ... void lock() {m_.lock_sharable(); locked_ = true;} ... private: mutex_type& m_; bool locked_; }; -Howard

On Sep 8, 2005, at 5:45 AM, John Maddock wrote:
I've been trying to solve the deadlock problem reported here: http://lists.boost.org/Archives/boost/2005/08/92307.php but I've simply run into more and more problems, and just about every operation I've tried will deadlock if you try hard enough.
THIS IS NOT GOOD.
So, do we yank read_write_mutex from 1.33.1 and, err, start over? Doug

So, do we yank read_write_mutex from 1.33.1 and, err, start over?
What about avoiding policies, or making independent read-write locks, one for reader priority, other for writers priority and other for fifo priority? Or a templatized priority policy. Or just pick a default implementation like pthreads and don't guarantee anything. There are simple implementations of read-write locks available in books that we can use without thinking very much, for example, I can remember one from "Programming with posix threads" and another one from "Unix network programming vol.2", one with reader prioriy and another with writer priority. All these where made using posix conditions and mutexes, and the implementation is quite simple. ACE has its own implementation if I'm not wrong. I think that runtime policy has an overhead and complexity. But this would require changing, Boost.Threads read-write lock interface, something that I don't know we like to do. On the other hand, I don't know if reader/writer priority is very important, since I think posix does not define anything about this. Regards, Ion

On Sep 8, 2005, at 5:56 PM, Ion Gaztañaga wrote:
So, do we yank read_write_mutex from 1.33.1 and, err, start over?
What about avoiding policies, or making independent read-write locks, one for reader priority, other for writers priority and other for fifo priority? Or a templatized priority policy. Or just pick a default implementation like pthreads and don't guarantee anything.
There are simple implementations of read-write locks available in books that we can use without thinking very much, for example, I can remember one from "Programming with posix threads" and another one from "Unix network programming vol.2", one with reader prioriy and another with writer priority. All these where made using posix conditions and mutexes, and the implementation is quite simple. ACE has its own implementation if I'm not wrong.
I think that runtime policy has an overhead and complexity. But this would require changing, Boost.Threads read-write lock interface, something that I don't know we like to do.
On the other hand, I don't know if reader/writer priority is very important, since I think posix does not define anything about this.
If priority policy is the overriding issue, I recommend considering Alexander Terekhov's algorithm summarized here: http://home.twcny.rr.com/hinnant/cpp_extensions/rw_perf.html
... use an algorithm inspired by Alexander Terekhov which attempts to side step the issue of reader priority or writer priority by having all threads initially wait at the same entry gate. This leaves the priority for entry into the first gate up to the thread scheduler. Once the first gate is entered by a writer thread, it blocks until all threads currently holding read locks have released those locks. While the writer thread is blocked on the second gate, no new read locks can be acquired.
I've implemented and beat this algorithm to death with experiments. Personally I know of no better way to implement a read/write lock. -Howard

Hi Howard,
On the other hand, I don't know if reader/writer priority is very important, since I think posix does not define anything about this.
If priority policy is the overriding issue, I recommend considering Alexander Terekhov's algorithm summarized here:
http://home.twcny.rr.com/hinnant/cpp_extensions/rw_perf.html
Thanks for the info. Apart from this, I've seen in your web that you have concepts such as upgradeable locks, shareable locks, etc..., sketched in: http://home.twcny.rr.com/hinnant/cpp_extensions/threads_move.html Do you have more information about these concepts and their use? It seems quite interesting for thread proposals. Currently in boost we have the following: scoped_lock, scoped_try_lock, scoped_timed_lock Your web talks about upgradeable, shareable, so maybe we could think about this when thinking about C++ threading interface. I suppose Kevlin Henney is thinking about this issues regarding its C++ Threading interface proposal, so maybe these concepts are interesting. Alexander Terekhov's algorithm seems interesting, so we decide to go ahead with this, we should implement it on top of some boost atomic primitives. Regards, Ion

On Sep 9, 2005, at 5:32 AM, Ion Gaztañaga wrote:
http://home.twcny.rr.com/hinnant/cpp_extensions/threads_move.html
Do you have more information about these concepts and their use? It seems quite interesting for thread proposals.
The "performance testing" link off of the above page has example code: http://home.twcny.rr.com/hinnant/cpp_extensions/rw_perf.html
Currently in boost we have the following:
scoped_lock, scoped_try_lock, scoped_timed_lock
Right. And I went this route too (following boost's lead). But hindsight is 20/20. ;-) The scoped_lock, scoped_try_lock, and scoped_timed_lock concepts are all strict supersets of one another (and this is good). You can code a scoped_timed_lock, templated on the mutex type, and use it only as a scoped_lock with absolutely zero penalty. There is no extra data associated with a scoped_timed_lock over a scoped_lock. And the code associated with timed locking isn't instantiated unless you use it. Said another way, you can get rid of the template classes scoped_lock and scoped_try_lock, and rename scoped_timed_lock to scoped_lock, and then use this new "combined" scoped_lock for all three uses. You will pay no penalty in code size or speed, and you will cut your source code maintenance by a factor of 3. If you instantiate this new "combined" scoped_lock with a mutex that can not execute a timed_lock (for example), absolutely no harm is done ... unless of course the client of this instantiation actually tries to call timed_lock, in which case you would quite naturally get a compile time error. Analogy: std::vector doesn't require a default copy constructor of its contained type, unless you actually instantiate certain member functions of vector. My scoped_lock doesn't require the mutex to have a timed_lock() member unless you instantiate the timed_lock() member of the scoped_lock. So my scoped_lock is simply the boost scoped_lock, scoped_try_lock, scoped_timed_lock rolled into one class.
Your web talks about upgradeable, shareable, so maybe we could think about this when thinking about C++ threading interface.
The rw_perf.html page gives motivation for the read/write - renamed to scoped/sharable - and for adding the upgradable concept as well. The sharable and upgradable locks all meet your scoped/try/timed concepts, but do not demand these concepts of the mutex.
I suppose Kevlin Henney is thinking about this issues regarding its C++ Threading interface proposal, so maybe these concepts are interesting.
Alexander Terekhov's algorithm seems interesting, so we decide to go ahead with this, we should implement it on top of some boost atomic primitives.
<nod> See the performance link for a few notes on this (but there is no implementation posted there, only notes). -Howard

If priority policy is the overriding issue, I recommend considering Alexander Terekhov's algorithm summarized here:
Interesting, clearly using interlocked operations where possible is a real win, and would certainly be easy to implement on Win32. Unfortunately, I don't believe the algorithm you provide is compatible with the current interface, which promises to let the user choose how to schedule interleaves between readers and writers. On the other hand, it is that interface which is part of the problem: it introduces too much complexity, and too many code-paths through which locking could occur to be able to analyse the code easily. So maybe we should dump the current interface. John.

"John Maddock" <john@johnmaddock.co.uk> writes:
If priority policy is the overriding issue, I recommend considering Alexander Terekhov's algorithm summarized here:
Interesting, clearly using interlocked operations where possible is a real win, and would certainly be easy to implement on Win32.
Unfortunately, I don't believe the algorithm you provide is compatible with the current interface, which promises to let the user choose how to schedule interleaves between readers and writers. On the other hand, it is that interface which is part of the problem: it introduces too much complexity, and too many code-paths through which locking could occur to be able to analyse the code easily. So maybe we should dump the current interface.
Attached is a sample implementation of a read-write-mutex for win32. It uses my new version of call_once to initialize itself. It doesn't support different scheduling priorities, but tries to be fair to all. Nor have I implemented try_lock semantics yet, but that wouldn't be too hard. I have, however, implemented upgradeable locks. The implementation is basic, and uses the "gating" idea --- only one thread can be "active pending" at any one time. Once that thread is locked, it allows another thread to become active pending. This allows for fair access, so no threads become starved. It uses Mutexes, so is quite heavyweight. I'll see what I can do to improve the performance. testrw.cpp is based on Howard's test code at http://home.twcny.rr.com/hinnant/cpp_extensions/rw_perf.html Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Sep 9, 2005, at 7:57 AM, John Maddock wrote:
If priority policy is the overriding issue, I recommend considering Alexander Terekhov's algorithm summarized here:
Interesting, clearly using interlocked operations where possible is a real win, and would certainly be easy to implement on Win32.
That should be in capitals: Real Win (tm) ;-) http://home.twcny.rr.com/hinnant/cpp_extensions/rw_perf.html (which is a PPC implementation making use of atomic primitives) Note this is not lock-free. It is only lock-free in the case of no contention for the mutex. If there is contention, an OS mutex must be locked. I'm calling it lock-reduced for lack of better terminology at the moment.
Unfortunately, I don't believe the algorithm you provide is compatible with the current interface, which promises to let the user choose how to schedule interleaves between readers and writers. On the other hand, it is that interface which is part of the problem: it introduces too much complexity, and too many code-paths through which locking could occur to be able to analyse the code easily. So maybe we should dump the current interface.
Or maybe independently develop the Terekhov algorithm, and stress test the two side by side ... um if we can get the current interface working. -Howard

On 09/09/05, Douglas Gregor <doug.gregor@gmail.com> wrote:
On Sep 8, 2005, at 5:45 AM, John Maddock wrote:
I've been trying to solve the deadlock problem reported here: http://lists.boost.org/Archives/boost/2005/08/92307.php but I've simply run into more and more problems, and just about every operation I've tried will deadlock if you try hard enough.
THIS IS NOT GOOD.
So, do we yank read_write_mutex from 1.33.1 and, err, start over?
I think it should be yanked. There will be a new implementation in the rewrite. I don't really think we should be trying to re-implement r/w mutexes on platforms that provide them natively at this stage. We should have a portable mechanism that wraps the posix implementation and try and provide that on win32 as well. matt.

So, do we yank read_write_mutex from 1.33.1 and, err, start over?
Heaven only knows! Well, I guess read_write_mutex isn't much use as it is: basically I think it tries to be too clever for it's own good. Ultimately, since there is a rewrite in progress, then yes I think a rewrite based upon a proven design *and not deviating from or extending that design* would be a good thing. It's probably the only way to get a provenly correct implementation, that can be maintained if necessary by someone other than it's original author if necessary. The trouble is what do we do with 1.33.1? We could either revert to the 1.32 version (if that is actually reliable, the current regression tests don't show up these problems so we don't really know), or we could yank it out of Boost entirely. I'm leaning towards yanking it out at present, but it's a drastic step, and this presupposes that the original authors of the class can't temporarily fix things for 1.33.1 (they all look to have either gone away or be too busy at present however). What do other folks think? John.

John Maddock schrieb:
The trouble is what do we do with 1.33.1? We could either revert to the 1.32 version (if that is actually reliable, the current regression tests don't show up these problems so we don't really know), or we could yank it out of Boost entirely. I'm leaning towards yanking it out at present, but it's a drastic step, and this presupposes that the original authors of the class can't temporarily fix things for 1.33.1 (they all look to have either gone away or be too busy at present however).
What do other folks think?
I don't know either whether someone relies on the presence of rw mutex being in the boost. However if he/she is using it, the usage itself is dangerous anyhow. In my opinion it is better to omit it until it is fixed. I am also not in the mood to dig through the sources to find out how the design was intended to work. I would also vote for basing the design on a theoretically sound ground (not claiming the current is not) that is understandable too. While trying to find out the reason for the initial reported bug I found myself in a maze of hard to understand details. Perhaps it would even be better not to base the rw mutex on level 0 primitives only, but write it to the native API of the respective platform? Also I can think of a logical extension of the interface to "restricted group mutexes", where there are multiple groups of privileges. At most n presetable members of a certain group are allowed to hold the mutex at the same time. The rw mutex is just a special case in this scenario. I didn't spent much time yet trying to find out if this concept could benefit from template partial specialization too, to provide optimizations for certain common cases. But this definitely needs much more thought! BTW.: I think we should instead put more effort to the level 0. Only recently I have seen some requests on the user list to get access to the locking primitives from various other parts of the library that are in detail currently. Regards, Roland
participants (8)
-
Anthony Williams
-
Douglas Gregor
-
Howard Hinnant
-
Ion Gaztañaga
-
John Maddock
-
Matt Hurd
-
Roland Schwarz
-
Timothy Ritchey