
Mattias Flodin wrote:
Quoting "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com>:
Well, even critical sections, Windows's fastest mutex primative, are much slower in the noncontended case than a spinlock. A two-stage method is needed to match the performance of the present spinlock: a lightweight atomic operation followed by a heavy-weight mutex if the lock is contended. This is why I was mentioned 8 bytes (one word for the critical section, one word for the atomic operation) would be necessary.
I'm quite surprised by this claim. What you describe is precisely how WIN32 critical sections work. If your measures show them to be slower, there must be some other reason for it. WIN32 also provides InitializeCriticalSectionAndSpinCount which will cause a busy-wait for a few cycles before resorting to waiting on the kernel lock.
In fact, you are quite right. I just tested, and performance for the two methods was identical. I had a peice of apparently stale knowledge in my brain telling me otherwise. I don't remember why I thought that. Well, I see that there is a critical section version of the lwm. Is there some reason the spinlock is being used by default instead? Perhaps the solution here should just be to define BOOST_LWM_USE_CRITICAL_SECTION and be done with it. Aaron W. LaFramboise