[threads] read_write_mutex performance

What are the design goals of read_write_mutex? (Lock proliferation aside for a moment.) My understanding has always been that read/write locks are an optimization, that is, a read/write lock-based algorithm should deliver better performance than the same algorithm with "ordinary" locks. Right? Wrong. The program at the end of this message demonstrates that in this specific (contrived?) case ordinary mutexes outperform a read/write mutex by a factor of 2.5 or more in almost every scenario, even when no writers are active! In some scenarios (16 readers, 4 writers, 10,000,000 iterations) the ordinary mutex case completes in under 8 seconds on my machine, but the read/write case exceeded my patience threshold. Am I missing something? -- Peter Dimov http://www.pdimov.com #include <boost/thread.hpp> #include <boost/thread/read_write_mutex.hpp> #include <iostream> #include <vector> #include <time.h> //#define RWLOCK int const n = 1000000; int const n_readers = 16; int const n_writers = 0; std::vector<int> v; #ifdef RWLOCK boost::read_write_mutex mtx( boost::read_write_scheduling_policy::alternating_many_reads ); #else boost::mutex mtx; #endif void reader_thread() { int m = 0; for( int i = 0; i < n; ++i ) { #ifdef RWLOCK boost::read_write_mutex::scoped_read_lock lock( mtx ); #else boost::mutex::scoped_lock lock( mtx ); #endif m += v[ i ]; } std::cout << m << std::endl; } void writer_thread() { for( int i = 0; i < n; ++i ) { #ifdef RWLOCK boost::read_write_mutex::scoped_write_lock lock( mtx ); #else boost::mutex::scoped_lock lock( mtx ); #endif v[ i ] += i; } } int main() { v.resize( n ); boost::thread_group group; clock_t t = clock(); for( int i = 0; i < n_writers; ++i ) { group.add_thread( new boost::thread( writer_thread ) ); } for( int i = 0; i < n_readers; ++i ) { group.add_thread( new boost::thread( reader_thread ) ); } group.join_all(); t = clock() - t; std::cout << static_cast<double>( t ) / CLOCKS_PER_SEC << '\n'; }

For what I've seen in code, read_write_mutex has a lot of options regarding reader/writer priorities and for that uses conditions and mutexes to offer that functionality. The overhead is bigger than mutexes, but I think the critical section you have chosen is very small, and where read_writer_mutex shines is when you have to lock expensive operations (lookups for example, in data-bases) with lots of readers. In that case, a mutex would serialize all, but read_writer_mutex would allow concurrent readers. Ion ----- Original Message ----- From: "Peter Dimov" <pdimov@mmltd.net> To: "boost-devel" <boost@lists.boost.org> Sent: Sunday, April 24, 2005 3:07 PM Subject: [boost] [threads] read_write_mutex performance

Ion Gaztañaga wrote:
Yes. However my suspicion is that read/write locks only speed up code that has been written in an inefficient (MT-unfriendly) way. For high performance, critical regions need to be as small and quick as possible, and read/write locks appear to target the case where the critical regions are relatively large and slow.

On Apr 24, 2005, at 3:34 PM, Peter Dimov wrote:
I suspect that there will always be cases where tiny critical regions are just not practical, where writers are rare compared to readers, and thus where read-write locks are a good performance win, especially on a multi-processor architecture. That being said, it occurred to me as I ran your test on the Metrowerks::threads lib (and got similar results) that the first thing we (and boost) do on read_lock is lock a mutex so that the read_write mutex state is protected while updating it. I haven't even begun to figure it out, but I strongly suspect that a lock-free algorithm for read_lock could be developed. This could reduce the need for read_write mutex to lock an internal mutex, thus expanding (perhaps significantly) the contexts where read-write locks make sense. -Howard

Peter Dimov wrote:
It doesn't help that the Boost read/write mutex, as I've mentioned before, is pretty inefficient; I'm not very happy with it. There are at least three reasons for the inefficiency: 1) Supporting four scheduling policies in one code base makes the code too complex; 2) Trying to adhere to the scheduling policies too rigidly is also adding code complexity and inefficiency; 3) The complexity makes it harder to optimize, and not much time has been spent optimizing it in the first place. I mentioned elsewhere that I have some thoughts about ways to improve Boost.Threads; some of these ideas deal with addressing these three points and getting a much better read/write mutex in place. Mike

On 4/24/05, Peter Dimov <pdimov@mmltd.net> wrote:
What are the design goals of read_write_mutex? (Lock proliferation aside for a moment.)
Attached is a copy of the test program modified to use the ACE library and to use n_writers = 1. Performance of the two cases is much closer, though the read-write case still suffers compared to the simple mutex case and is really platform-dependant. Here are my results (where rwlock-ace is the simple case and rwlock-ace-rw is the read/write case): Linux/g++: ./rwlock-ace 3.90s user 14.54s system 326% cpu 5.648 total ./rwlock-ace-rw 6.94s user 26.57s system 226% cpu 14.776 total Solaris/Sun C++: ./rwlock-ace-sun 7.64s user 50.66s system 691% cpu 8.427 total ./rwlock-ace-sun-rw 34.19s user 250.70s system 465% cpu 1:01.18 total I'd say the design of the Boost.Threads read/write mutex is sub-optimal compared to the one on ACE. -- Caleb Epstein caleb dot epstein at gmail dot com
participants (5)
-
Caleb Epstein
-
Howard Hinnant
-
Ion Gaztañaga
-
Michael Glassford
-
Peter Dimov