
Anthony Williams wrote:
"Cory Nelson" <phrosty@gmail.com> writes:
2) It doesn't spin at all - giving an app a chance to stay away from WaitForSingleObject on multithreaded systems will be a good boost to scalability under typical use. Win32 critical sections have a default spin count of 4000 on multithreaded systems. With multi-core getting more and more common I think this is an important aspect to consider.
Yes, it's worth considering. I would be intrigued to see what difference it made. I have a dual-core system and a single-core system, so I could see how it performed on both with/without spinning. Any ideas for how to construct a benchmark?
I would advise against spinning. Spinning is a bit of a gamble; it may or may not be a win depending on several factors (average critical section length, spin count, whether other threads in ready state can make better use of the CPU time, context switch performance) and the optimal spin count is very application-dependent. It isn't a problem if the user gambles with his own money, so to speak, i.e. spins manually with try_lock. However it can be a problem if the implementation gambles with the user's money and loses, since there is no way to make it not spin. IOW, you can easily turn a non-spinning mutex into a spinning mutex, but not vice versa. It's true that spinning can improve the performance of the "average application" but you have to have data on that average application... and it still might be a loss for a specific application. In addition, spinning is more suitable for ordinary mutexes, since their critical sections are expected to be short, and hence the probability that a thread will spin while the thread holding the mutex is scheduled on the same core is low (a situation where spinning is a guaranteed loss.) A read-write mutex, on the other hand, can easily have a longer writer critical section that can span multiple quantums, and hence, it can be better for readers not to spin in order to get out of the writer's way as quickly as possible... since a sluggish writer can block many readers in a worst case scenario. It would be hard to determine the optimal spin count for the "average application" by using benchmarks... you don't know how to represent the average app in a benchmark. It could be possible if the optimal spin count stays relatively constant during varying scenarios, I guess...