[threads] win32 mutex and read-write mutex

I have revamped the code on the threads_rewrite branch for the mutex and read_write_mutex. The mutex code is now "swap-based", to eliminate the hand-off problems highlighted by Alexander Terekhov and Peter Dimov. It can be seen in CVS at boost/thread/win32/basic_timed_mutex.hpp on the thread_rewrite branch, or in the CVS web viewer at http://boost.cvs.sourceforge.net/boost/boost/boost/thread/win32/basic_timed_mutex.hpp?revision=1.1.2.6&view=markup&pathrev=thread_rewrite It keeps track of the "active count" of threads with an interest in the mutex, in order to try and avoid signalling the semaphore unnecessarily. The read_write_mutex code is based on ideas from the code offered by Peter Dimov and Cory Nelson. However, I have extended it to include upgradeable locks, and more closely mirror the interface proposed in the C++ Standard threading proposal N2094. At the moment it does not support try_ or timed_ variants. The code can be seen in CVS at boost/thread/win32/read_write_mutex.hpp on the thread_rewrite branch, or in the CVS web viewer at http://boost.cvs.sourceforge.net/boost/boost/boost/thread/win32/read_write_mutex.hpp?revision=1.1.2.6&view=markup&pathrev=thread_rewrite In all cases, the code tries to change state using compare-and-swap instructions. If the current state is blocking the state change (e.g. there is already a writer when we try to lock a reader), then the lock blocks on one of three win32 Event objects. The shared event is a manual reset event, so once readers are allowed access, they are all freed until a thread obtains an exclusive lock and resets the event. There is a limit on how many threads can have a shared/upgradeable lock (0x1fff), and how many can be waiting for a write lock (0xfff). Given the nature of read-write mutexes, this could easily be adjusted to allow more shared locks, and fewer waiting writers. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On 10/5/06, Anthony Williams <anthony_w.geo@yahoo.com> wrote:
I have revamped the code on the threads_rewrite branch for the mutex and read_write_mutex.
The mutex code is now "swap-based", to eliminate the hand-off problems highlighted by Alexander Terekhov and Peter Dimov. It can be seen in CVS at boost/thread/win32/basic_timed_mutex.hpp on the thread_rewrite branch, or in the CVS web viewer at
It keeps track of the "active count" of threads with an interest in the mutex, in order to try and avoid signalling the semaphore unnecessarily.
The read_write_mutex code is based on ideas from the code offered by Peter Dimov and Cory Nelson. However, I have extended it to include upgradeable locks, and more closely mirror the interface proposed in the C++ Standard threading proposal N2094. At the moment it does not support try_ or timed_ variants. The code can be seen in CVS at boost/thread/win32/read_write_mutex.hpp on the thread_rewrite branch, or in the CVS web viewer at
In all cases, the code tries to change state using compare-and-swap instructions. If the current state is blocking the state change (e.g. there is already a writer when we try to lock a reader), then the lock blocks on one of three win32 Event objects. The shared event is a manual reset event, so once readers are allowed access, they are all freed until a thread obtains an exclusive lock and resets the event. There is a limit on how many threads can have a shared/upgradeable lock (0x1fff), and how many can be waiting for a write lock (0xfff). Given the nature of read-write mutexes, this could easily be adjusted to allow more shared locks, and fewer waiting writers.
I've just looked the reader-writer lock over, and while it generally seems "usable" to me, I've got a few nitpicks: 1) This seems like a real big general-purpose class that goes against the C++ism of not paying for what you aren't going to use. Creating three events (these are kernel objects) for each lock makes it a rather heavy instance - I feel like using of a lot of them for fine-grained locking won't be realistic. Given that the need to upgrade a lock is uncommon and usually easily worked around, and the ease a user can cause deadlock when upgrading, would you consider a separate lightweight non-upgradeable class to go alongside it? 2) It doesn't spin at all - giving an app a chance to stay away from WaitForSingleObject on multithreaded systems will be a good boost to scalability under typical use. Win32 critical sections have a default spin count of 4000 on multithreaded systems. With multi-core getting more and more common I think this is an important aspect to consider. 3) I don't like the lock(), unlock() etc member functions being public.. forcing scoped_lock is a great cheap idiom for exception safety and saving people from deadlocks. Even with scoped_lock also available - it isn't something found in other languages so I fear new developers will flock to using the simpler and more familiar public members instead.
Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson

Hi Cory, Thanks for your feedback. "Cory Nelson" <phrosty@gmail.com> writes:
On 10/5/06, Anthony Williams <anthony_w.geo@yahoo.com> wrote:
I have revamped the code on the threads_rewrite branch for the mutex and read_write_mutex.
The read_write_mutex code is based on ideas from the code offered by Peter Dimov and Cory Nelson. However, I have extended it to include upgradeable locks, and more closely mirror the interface proposed in the C++ Standard threading proposal N2094. At the moment it does not support try_ or timed_ variants. The code can be seen in CVS at boost/thread/win32/read_write_mutex.hpp on the thread_rewrite branch, or in the CVS web viewer at
In all cases, the code tries to change state using compare-and-swap instructions. If the current state is blocking the state change (e.g. there is already a writer when we try to lock a reader), then the lock blocks on one of three win32 Event objects. The shared event is a manual reset event, so once readers are allowed access, they are all freed until a thread obtains an exclusive lock and resets the event. There is a limit on how many threads can have a shared/upgradeable lock (0x1fff), and how many can be waiting for a write lock (0xfff). Given the nature of read-write mutexes, this could easily be adjusted to allow more shared locks, and fewer waiting writers.
I've just looked the reader-writer lock over, and while it generally seems "usable" to me, I've got a few nitpicks:
1) This seems like a real big general-purpose class that goes against the C++ism of not paying for what you aren't going to use. Creating three events (these are kernel objects) for each lock makes it a rather heavy instance - I feel like using of a lot of them for fine-grained locking won't be realistic. Given that the need to upgrade a lock is uncommon and usually easily worked around, and the ease a user can cause deadlock when upgrading, would you consider a separate lightweight non-upgradeable class to go alongside it?
I would certainly think it worth having multiple mutexes supporting various levels of locking. Howard's proposal (N2094, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2094.html) has 4 levels --- simple mutex, mutex with shared/exclusive locks, mutex with shared/exclusive locks where the shared ones can "try" to convert to an exclusive lock, and mutex with shared/exclusive/upgradeable locks. This is an implementation of the latter, since I deemed it the hardest to get right.
2) It doesn't spin at all - giving an app a chance to stay away from WaitForSingleObject on multithreaded systems will be a good boost to scalability under typical use. Win32 critical sections have a default spin count of 4000 on multithreaded systems. With multi-core getting more and more common I think this is an important aspect to consider.
Yes, it's worth considering. I would be intrigued to see what difference it made. I have a dual-core system and a single-core system, so I could see how it performed on both with/without spinning. Any ideas for how to construct a benchmark?
3) I don't like the lock(), unlock() etc member functions being public.. forcing scoped_lock is a great cheap idiom for exception safety and saving people from deadlocks. Even with scoped_lock also available - it isn't something found in other languages so I fear new developers will flock to using the simpler and more familiar public members instead.
Howard's proposal has them as public members, so that's what I implemented. I can see the arguments both ways. Making the functions private is easy, and what boost does already. There are certainly use cases where pure scoped locks get in the way. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
<snip >
"Cory Nelson" <phrosty@gmail.com> writes:
<snip>
3) I don't like the lock(), unlock() etc member functions being public.. forcing scoped_lock is a great cheap idiom for exception safety and saving people from deadlocks. Even with scoped_lock also available - it isn't something found in other languages so I fear new developers will flock to using the simpler and more familiar public members instead.
Howard's proposal has them as public members, so that's what I implemented.
I can see the arguments both ways. Making the functions private is easy, and what boost does already.
There are certainly use cases where pure scoped locks get in the way.
Hi, The idea is to make it somewhat difficult to use the functions directly but allow them when absolutely needed, yes? You could make them protected in which case they could be accessed if necessary simply with derivation + using. i.e. class manual_read_write_mutex : public read_write_mutex { public: using read_write_mutex::lock; using read_write_mutex::unlock; }; or is this too ugly? - Mike

Anthony Williams wrote:
"Cory Nelson" <phrosty@gmail.com> writes:
2) It doesn't spin at all - giving an app a chance to stay away from WaitForSingleObject on multithreaded systems will be a good boost to scalability under typical use. Win32 critical sections have a default spin count of 4000 on multithreaded systems. With multi-core getting more and more common I think this is an important aspect to consider.
Yes, it's worth considering. I would be intrigued to see what difference it made. I have a dual-core system and a single-core system, so I could see how it performed on both with/without spinning. Any ideas for how to construct a benchmark?
I would advise against spinning. Spinning is a bit of a gamble; it may or may not be a win depending on several factors (average critical section length, spin count, whether other threads in ready state can make better use of the CPU time, context switch performance) and the optimal spin count is very application-dependent. It isn't a problem if the user gambles with his own money, so to speak, i.e. spins manually with try_lock. However it can be a problem if the implementation gambles with the user's money and loses, since there is no way to make it not spin. IOW, you can easily turn a non-spinning mutex into a spinning mutex, but not vice versa. It's true that spinning can improve the performance of the "average application" but you have to have data on that average application... and it still might be a loss for a specific application. In addition, spinning is more suitable for ordinary mutexes, since their critical sections are expected to be short, and hence the probability that a thread will spin while the thread holding the mutex is scheduled on the same core is low (a situation where spinning is a guaranteed loss.) A read-write mutex, on the other hand, can easily have a longer writer critical section that can span multiple quantums, and hence, it can be better for readers not to spin in order to get out of the writer's way as quickly as possible... since a sluggish writer can block many readers in a worst case scenario. It would be hard to determine the optimal spin count for the "average application" by using benchmarks... you don't know how to represent the average app in a benchmark. It could be possible if the optimal spin count stays relatively constant during varying scenarios, I guess...

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
"Cory Nelson" <phrosty@gmail.com> writes:
2) It doesn't spin at all - giving an app a chance to stay away from WaitForSingleObject on multithreaded systems will be a good boost to scalability under typical use. Win32 critical sections have a default spin count of 4000 on multithreaded systems. With multi-core getting more and more common I think this is an important aspect to consider.
Yes, it's worth considering. I would be intrigued to see what difference it made. I have a dual-core system and a single-core system, so I could see how it performed on both with/without spinning. Any ideas for how to construct a benchmark?
I would advise against spinning. Spinning is a bit of a gamble; it may or may not be a win depending on several factors (average critical section length, spin count, whether other threads in ready state can make better use of the CPU time, context switch performance) and the optimal spin count is very application-dependent.
It isn't a problem if the user gambles with his own money, so to speak, i.e. spins manually with try_lock. However it can be a problem if the implementation gambles with the user's money and loses, since there is no way to make it not spin. IOW, you can easily turn a non-spinning mutex into a spinning mutex, but not vice versa.
How about a try_lock(spin_count) call? or even a lock(spin_count)? Would that be superfluous? Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Oct 8, 2006, at 5:29 AM, Anthony Williams wrote:
3) I don't like the lock(), unlock() etc member functions being public.. forcing scoped_lock is a great cheap idiom for exception safety and saving people from deadlocks. Even with scoped_lock also available - it isn't something found in other languages so I fear new developers will flock to using the simpler and more familiar public members instead.
Howard's proposal has them as public members, so that's what I implemented.
I can see the arguments both ways. Making the functions private is easy, and what boost does already.
There are certainly use cases where pure scoped locks get in the way.
The motivation for making them public in N2094 is that to support those use cases where scoped locking is not the desired behavior. Simply saying that if you want to lock/unlock a mutex in a non-scope pattern then install it in a lock, essentially confuses what a lock is. condition is an example application that needs to lock/unlock a mutex in a non-scoped pattern. boost solves this problem by making condition a friend of the mutex (if I recall correctly). Can it really be true that condition is the only application in need of this service? Don't get me wrong, scoped_lock is great. It is not only the safest way to use a mutex, it is also the easiest. And for most use cases it is exactly what is needed. It is just that scoped_lock isn't the only way to use a mutex. Here are some of the features enabled in N2094 by making mutex::lock/ unlock public: * Clients can write their own mutex types which conform to this concept. * condition can be made to work with any lock *or mutex* which conforms (including client written mutexes). I.e. condition<Mutex>. * Code which is generic in its mutex can be written. Examples include mutex adaptors that turn one type of mutex into another (recursive_mutex_adaptor<Mutex>). Here's another one that makes the "read lock" interface of a read/write mutex look like lock/unlock: template <class Mutex> class sharable_adaptor { private: Mutex& mut_; public: explicit sharable_adaptor(Mutex& mut) : mut_(mut) {} void lock() {mut_.lock_sharable();} bool try_lock() {return mut_.try_lock_sharable();} void unlock() {mut_.unlock_sharable();} }; Now you can use condition variables with a read-locked mutex. --- If safety is easy, it will be used most of the time. If safety is mandated, programmers will be forced into expensive or excessively dangerous workarounds for unanticipated use cases. -Howard
participants (5)
-
Anthony Williams
-
Cory Nelson
-
Howard Hinnant
-
Michael Marcin
-
Peter Dimov