
On Feb 8, 2008, at 2:42 PM, Anthony Williams wrote:
Howard Hinnant <hinnant <at> twcny.rr.com> writes:
template <class L1, class L2> void lock(L1& l1, L2& l2) { while (true) { { unique_lock<L1> __u1(l1); if (l2.try_lock()) { __u1.release(); break; } } std::this_thread::yield();
snipped rest of multi-lock code.
3. The yields are absolutely necessary to get the high performance mentioned in the previous note.
I find this comment intriguing. Is this always the case, or does it depend on the specifics of how a mutex is implemented on a given platform?
That is a question I've been asking and myself for a long time, unfortunately without a satisfactory answer. Here's my experience, and a few guesses... On 2 core Macs and 2 core Windows, I've measured improved speeds with the yields, using a "dining philosophers" style simulation. I haven't tested on Linux. Here are results (dating around 3 years ago): In order: 63 seconds. try/back off/without yield: 47 seconds. try/back off/with yield: 34 seconds. These numbers come from 5 threads each wanting two mutexes each (classic dining philosopher setup). The same runs on single core machines looked like: In order: 293 seconds. try/back off/without yield: not tested. try/back off/with yield: 278 seconds. The single core results were measured on different machines than the dual core results and so aren't comparable to the dual core results. I set up an additional "dining philosophers" test where each philosopher set at the corner of a cube (8) and attempted to attain a mutex associated with the edge of the cube adjacent to the corner (12 mutexes). That is, each diner required "3 forks" to eat. The results were: In order: 226 seconds. try/back off/without yield: not tested. try/back off/with yield: 77 seconds. Unfortunately I didn't run the last two tests without the yields. I believe the try/back off algorithms are faster than the in order algorithm because in the latter a thread sits on a mutex while blocked for another. Thus a "fork" that somebody else could be using is idled. But in the try/back off algorithms the diners are "more polite". If they can not eat, they offer their fork up so that someone else may be able to eat. Now what about the yields... Blocking on a mutex should be very similar to a yield. But the first test is showing a significant speed advantage to the yields. The above isn't noise, the tests were run many times and the averages are reported. On running the tests on a Mac I viewed the CPU activity in the "Activity Monitor" application. When running without the yields, the activity monitor looked like: (I hope you can see the graphic). The red in the graphic indicates "system time" while the green indicates "user time". In case you can't see the graphic, it looks like around 20% to 25% of the time is spent in "system", with the rest spent in "user". When running with the yields, the monitor looks like: Now the "system time" is almost completely gone (maybe 7% or 8% system time). So more "user time" translates to faster application (not so much "system time"). But this still doesn't answer your precise question: Is this behavior really portable? I'm afraid I don't really know for sure. It has just been my experience on Mac and Windows. The Windows tests were done 3 years ago, and I asked someone to do them for me. I didn't have the machine available to investigate myself. The Mac tests were done at the same time, but I've replicated the results more recently. I haven't replicated the Windows results more recently. As long as I'm posting graphics (I hope nobody minds), here's another interesting snapshot of the 8 diner/ 12 fork / 3 forks at a time case. Both snapshots are taken as the test is ending: Lock in order: try/back off/with yield: The graphs show that the lock-in-order algorithm isn't making full use of the cpu's in the final seconds of the test. The reason for this is that some diners "hogged" the system and got thru eating while other diners sat around not getting their work done. In the end there weren't enough diners left to keep both cpu's busy (I don't recall if we are looking at 1 or 2 diners at the end of this test). try/back off/with yield graphic shows the same effect as described above, but over a *much* shorter time period. Both cpu's are fully busy in "user time" for nearly the entire test. I hope this gives you some insights, despite the lack of a clear answer to your question. Perhaps you or others can present a good explanation for this behavior regarding the yields. :-) -Howard