Re: [boost] lightweight_once

21 Sep 2007

      Anthony Williams wrote:
...
...
...
it looks like you've opted for a
check/sleep/check/sleep loop for threads that are waiting for another thread
to finish running the routine. This is a bad idea. Blocking of this nature
should be done by waiting on an OS primitive rather than with a wait loop.
Why is it that bad? This is safier since there is no opportunity to get an
error on the threading primitive construction, it doesn't use system
resources like kernel objects and it solves the fundamental problems of
creating and destroying those threading primitives in run time. And it will
be run only once after all, so performance is not an issue.
I think that performance *is* an issue, even though this will only be run once
per thread.
A check/sleep polling loop is a bad idea, as it consumes CPU time that could
be spent actually running the once routine (or another thread that doesn't
need to wait). By waiting on an OS primitive, the OS can take the thread out
of the schedule until the primitive is ready to be acquired.
Not only that, but a check/sleep loop forces a latency of at least the
specified sleep time on the waiting thread. If the initialization being waited
for only takes a few microseconds (or less --- if it's just a simple
initialization it might take only a few nanoseconds), then waiting a whole
millisecond is an unnecessary delay.
POSIX provides pthread_once. We should use it.
Do have a look at the analysis that I did for my ARM atomic shared_ptr code:

   http://thread.gmane.org/gmane.comp.lib.boost.devel/164564/focus=164893

If the probability of contention is very low, then on average adding 
even one instruction to the non-contended case, or occupying more 
icache space with yield() calls, may slow the program down more than 
yielding on contention would speed it up.

The probability of contention depends crucially on the duration of the 
critical section, and I imagine that this could vary enormously for 
"once" functions, i.e. anything from a couple of instructions to 
seconds.  So it might be worthwhile having different types of "once" 
for these different cases - and the same could also be said of mutexes.

Take care with the pthreads option.  I spent a while trying to 
understand what the Linux pthreads implementation (in glibc) does (for 
ARM), and it eventually boils down to much the same as I had written.  
However it's almost an order of magnitude slower, and I believe that's 
because it involves a couple of function calls while mine is inline.  
Since pthreads is a C API, I think that the function call overhead is 
inevitable.  So I have put investigating replacing the pthreads mutexes 
used by boost.threads with asm on my to-do list (though it  may never 
reach the top).

Having said all that, does anyone really worry much about "once" 
performance?  It's not like shared_ptr, where code that uses it may be 
doing atomic reference count changes fairly continuously.

Regards,

Phil.

Re: [boost] lightweight_once

Phil Endecott