
Peter Dimov wrote:
Phil Endecott:
Peter's comment about evil spin-waits in lock() is interesting as I was considering implementing something like that. I wonder about this instead:
void mutex::lock() { if (atomic-attempt-to-lock) { // uncontended return; }; // first wait, try yielding since it's fast: sched_yield(); if (atomic-attempt-to-lock) { // got lock on 2nd attempt return; } // We failed again. Maybe there are lots of threads waiting; in that // case it will be more efficient to futex_wait() since the kernel will // then only wake us when our lock has been woken. futex_wait(....); }
This isn't very good, either. Consider a scenario where your time slice is 10ms, but the mutex is unlocked 3ms after you attempt a lock.
So some other thread gets to do some (useful) work for 7 ms. But I take your point.
In my opinion, mutex::lock should never do fancy stuff behind the user's back in an attempt to achieve "better performance". If a trylock/sched_yield/lock pattern is better for a specific workload, it can be implemented in user space. If it isn't, there is no way to undo the damage.
Yes, I agree emphatically with this. I was pleased, for example, to see that Howard's implementation of the multiple locking algorithm doesn't need access to any "internals" to work efficiently. The approach that I've been experimenting with is to supply a template parameter to the mutex class that determines its behaviour, i.e. template <typename WaitWake> class mutex { public: void lock(); // tries to lock with an atomic operation, and // calls WaitWake.wait() if it fails. void unlock(); // calls WaitWake.wake(). // etc. - an N2447-like interface }; The "WaitWake" concept is something with a futex-like interface: class Futex { int& var; public: Futex(int& var_); void wait(int expected); // returns immediately if var!=expected, // else blocks until wake() is called void wake(); // etc. }; I originally factored this out so that I could substitute a yield loop or spinlock using these degenerate classes: class Yield { public: Yield(int& var_) {} void wait(int expected) { ::sched_yield(); } void wake() {} }; class Spin { public: Spin(int& var_) {} void wait(int expected) {} void wake() {} }; Since then I've written a condition-variable class that also takes a "WaitWake" template parameter. But there are a couple of things that I can't do with this design: - The yield-loop and spin-loop are compatible, in the sense that different locks on the same mutex could use different implementations (based on the expectation of blocking in each case) and it would still work. The futex could be made compatible. So perhaps the WaitWake should be a parameter of the lock, not of the mutex. - But whether a lock is better off spinning or yielding is actually a function of what the _other_ thread that currently holds the lock is doing, not what this thread is going to do. So maybe we want a scheme like this (pseudo-code): mutex m; ... { lock l(m, 1e-6); var++; // lock is only held for a few nanoseconds } ... { lock l(m, 10); var = result_of_http_request(); // lock is held for seconds } class lock { mutex& m; public: lock(mutex& m_, float likely_hold_time): m(m_) { if (m.likely_hold_time < 1e-3) { m.spin_lock(); } else { m.yield_lock(); } m.likely_hold_time = likely_hold_time; } ~lock() { m.unlock(); } ..... }; My thought process gets about this far and then I think "this is too complicated". Simpler is better. - My current scheme also doesn't allow for the sort of spin-then-wait or yield-then-wait design at the top of this message. There are lots of variations on this like "if (m.lock_holder_cpu != this_cpu) { spin for a bit }". In practice, my real code doesn't need most of these features; I have maybe a dozen threads, not thousands, and lock contention seems to be very rare. I got a worthwhile speedup by bypassing glibc's pthread_mutex and pthread_condition implementation and going straight to futexes (for the fast uncontended case), and I try to do as much work as possible between context switches. If we had a set of concurrency benchmarks available then I'd investigate a bit further. Regards, Phil.