[thread] Question about performance of recursive_mutex wrt multiple locks

I'm writing an application that runs on both Win32 (vs 2005) and Linux (gcc). I was curious what the performance penalty is for a scenario like this: Foo1() { recursive_mutex::scoped_lock lock(mutex); // acquire initial mutex lock Foo2(); ... } Foo2() { recursive_mutex::scoped_lock lock(mutex); // increment lock count ... } Relatively speaking, how much time does it cost to 1) Locking the mutex initially and increment the lock count Vs 2) Determine that a mutex is already locked by the thread and just increment the lock count? I'm also curious how much faster mutex is over recursive_mutex for both platforms. If anybody has done performance profiling or just knows in general the different mutex scenario performance costs, I'd greatly appreciate any info. :) Thanks very much, Scott

Scott wrote:
I'm writing an application that runs on both Win32 (vs 2005) and Linux (gcc).
I can say something about Linux.
Relatively speaking, how much time does it cost to 1) Locking the mutex initially and increment the lock count Vs 2) Determine that a mutex is already locked by the thread and just increment the lock count?
I'm also curious how much faster mutex is over recursive_mutex for both platforms. If anybody has done performance profiling or just knows in general the different mutex scenario performance costs, I'd greatly appreciate any info. :)
It's very easy to measure this yourself by timing something like mutex m; for (int i=0; i<1000000000; ++i) { scoped_lock l(m); } and comparing with recursive_mutex rm; scoped_lock l1(rm); for (int i=0; i<1000000000; ++i) { scoped_lock l2(m); } My understanding is that Boost has a fairly thin layer on top of pthreads, and you can see the source for the pthreads implementation here: http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/pthread_mutex_lock.c?rev=1.20&content-type=text/x-cvsweb-markup&cvsroot=glibc It looks like it does what you would expect: switch (kind-of-mutex) { case recursive: if (owner = me) { count++; return; } lock(); break; case normal: lock(); break; } When uncontended, the implementation of lock() should be a single atomic instruction. My guess would be that locking an already-locked recursive mutex would be a fraction faster than locking an unlocked non-recursive mutex - by maybe a nanoseconds or so. If you measure it, let me know if I was right. If you really care about these sorts of differences you might want to consider avoiding the pthreads layer altogether; see my recent posts about using the futex() system call directly. Regards, Phil.
participants (2)
-
Phil Endecott
-
Scott