
A few days late, but... Peter Dimov wrote:
Yuval Ronen wrote:
One overhead I can think of is the need to call CloseHandle after the WaitForSingleObject, which means another OS call (kernel?). This doesn't exist on pthreads where calling pthread_join is enough.
The overhead in this case is not measured in cycles or kernel transitions (remember that you called a blocking function because you have nothing better to do with the CPU).
The kernel transitions take CPU cycles, which means these cycles won't be available to any thread, not just the one that joined. AFAIU, all threads in all process would suffer from this overhead, because CPU cycles were wasted. On the other hand, one might say that if the total cost of creating + destroying a thread is much more than a single kernel transition (is it?), then adding one such transition is negligible.
It's measured in kernel objects; these come from the nonpaged pool and are a (relatively) limited resource. Since (today) a cv under Windows is usually ~3 kernel objects, a mutex one more, adding a mutex+cv to every thread increases its kernel footprint five times.
But are all kernel objects created equal? If a thread object is much "heavier" than a mutex or c/v object, (which means, for instance, there's much fewer of it), than the cost if much less than five times. As I said, I have no idea if that's really the case. Just raising questions... (Oh, and BTW, this cost doesn't need to be added to every thread, only to those who need multi joins)