
On Oct 26, 2006, at 12:39 PM, Peter Dimov wrote:
Howard Hinnant wrote:
Here is my implementation of Terekhov's *fair* read/write algorithm, with upgradable thrown in. This algorithm is just as you say: The OS alone decides who gets to run next.
http://home.twcny.rr.com/hinnant/cpp_extensions/upgrade_mutex.html
There are two interesting decisions in a read/write mutex. The first is what happens when a writer is blocked in wrlock() waiting for readers and another reader tries to rdlock(). POSIX leaves this implementation defined in the general case; however, if priority scheduling is supported, the reader "shall" obtain a lock if and only if it's of a higher priority compared to the waiting writer (or all writers, if there's more than one).
If we assume equal priority for simplicity, this means that rdlock "shall" block. This is detectable by a conformance test, by the way, since tryrdlock fails if and only if rdlock blocks.
This is the "gate 1" part, and all of the algorithms I've been experimenting with also have this property. If the 'writer pending' bit is set, no readers are allowed to obtain a lock. However if another, higher-priority writer arrives after the pending bit is already set, I still grant write access to the first writer, which is probably a violation of POSIX [TPS]. :-)
The other interesting decision is what happens on unlock (two subparts, last reader unlocks, or writer unlocks.) Your implementation seems to use the "thundering herd" approach of waking all. This is fair, but may be wasteful if you wake up several writers, or a writer and several readers and the writer manages to pass through gate 1 (everyone else must block again). (This redundant wakeup may not be visible in a benchmark, but it can steal CPU from other busy non-contending threads in a real application.)
(I see a significant amount of CAS failures even when waking up only readers; again, this doesn't affect the timing of the specific benchmark, and I'm not sure that it's actually a problem.)
Another approach is to wake up one writer in rdunlock and all readers in wrunlock (alternating between readers and writers). I also have a third option in mind that I haven't tested yet.
Very nice summary and astute observations. I must admit to really liking the "thundering herd" approach for the interesting unlock cases you site. This is where you're really turning things over to the OS and saying "you decide". And I just flat haven't experimented with priority and consider myself currently unqualified to comment in that area. Which I guess is another reason for me to prefer to let the OS decide such matters (it presumably knows and picks threads with higher priority to wake). Partial rationale: Last reader unlocks: In this case if there is a writer waiting (at gate 2), then (imho) there's no decision but to give the writer the ago, and this is assuming equal priority as you stated. If there is not a writer waiting at gate 2, and this is the last reader, then my implementation does absolutely nothing. A single reader inside of gate 1 blocks no one else from passing through gate 1. There is no thundering herd for last reader unlock. (see unlock_sharable()). Writer unlocks: By definition no one is waiting at gate 2. There may be waiters, both read and write (and upgradable if you're using that) at gate 1. If you choose at this point, a reader or a writer, you are, by definition, giving one of the two priority, at least for this next round. One might have the write unlock alternate its choice, but I'm not seeing much performance gain here. You either wake up 1 writer, or you wake up many readers (on the reader path it might actually be significantly more expensive to make num_reader calls to notify_one() than just do a notify_all()). Let's look at a typical (anticipated) use case under high contention (we're not even discussing or worried about low contention use cases for this discussion): Few (nw) writers blocked, many (nr) readers blocked; writer unlocks, and issues notify_all(). Chances are good that one of the readers will get past gate 1 before one of the writers gets past gate 1, just because there are so many of them compared to the number of writers (strength in numbers). Chance first thread past gate 1 is a reader: nr/(nr+nw) The above formula rapidly approaches 100% as nw/nr approaches 0. nw/nr Chances of reader being first 0 100% // no writers .1 91% .2 83% .3 77% .4 71% .5 67% ... 1.0 50% // same number of writers as readers Indeed it is likely that several, or even many readers will get past gate 1 before one of the few writers is chosen to try. When one of the few writers is finally chosen, it too will get past gate 1 and block at gate 2. At this point you've got many readers inside of gate 1 getting work done. You have only a few writers (nw-1) that woke up for nothing and now have to go back to sleep. And (on average) you'll have relatively few readers that woke up for nothing and must sleep again (just how many, statistically speaking, drops with the nw/nr ratio -- probability formula left as an exercise to the reader :-)). Of course there will be cases where one of the few writers makes it past gate 1 at or near the beginning of the thundering herd, in which case many threads will be woken up for nothing. However, statistically speaking, and assuming all equal probability for any thread getting woken up, this scenario should happen infrequently, and thus not be a performance problem. If you have an atypical scenario: Many writers, few readers, then I agree that the "thundering herd" approach may exhibit performance problems, as it is then likely that a writer is first, and every other thread is woken up for nothing. And in this scenario, I suggest that a read/write mutex is probably not a good design choice for the application. There's a big gray area inbetween where nr approximately equals nw. It would be interesting to see a better analysis of that scenario, and discussion of the applicability of read/write mutexes there. Another thing I like about this approach is that it is relatively simple to add the upgradable concept to it. And there is no extra cost for supporting upgradable. One gets it practically for free. -Howard