Re: [boost] Re: Lightweight Mutex Code

6 Jul 2004

      flodin@cs.umu.se wrote:
...
It sounds reasonable that the kind of lock described will perform badly in
high-contention situations. But what evidence do you have that there is any
probability of high contention, especially when, as you say, the locks are
short-lived? These locks aren't taken every time the smart pointer is accessed;
they're only taken when a copy is made. So to get the high contention you speak
of, you would have to have two threads repeatedly making copies of the "same"
smart pointer. From where I stand, this seems sufficiently rare that the extra
performance gain in the common cases is worth it.
[these comments are specific to the win32 spinlock discussed]

I have two specific concerns.

1) Personally, I value predictable performance.  I would prefer slight
performance degradation (Note, however, that I am not examining the
actual numbers.) to the case where I may have an occasional very long
wait.  (Specific example: If a graphically intensive program is aiming
for 30 frames per second, that is only about 30ms per frame.  It would
be very unfortunate if, even occasionally, a single smart pointer copy
accounted for 1/3 of the time it took to render a frame.)

2) In "thundering herd" situations, contention is unavoidable, and
performance may be paramount.  The possibility of a single thread
halting many other threads for 10ms on a machine with many processors is
disturbing. (Specific example: In a case where thread is producing work
for many consumer threads on a multiprocessor machine, and obtaining the
work involves copying a smart pointer, it is possible that some amount
of threads will forced to sleep every single time, which might be a
substantial percentage of total job wall clock time.)

In any case, I feel fairly strongly that sleeping is not the right thing
to do.  Despite the weakness in appeal to authority, I will note that I
have seen Butenhof make the same assertion on Usenet, surely in a manner
more eloquent than mine.

I previously neglected to suggest alternatives.

1) A generally sound 8-byte mutex could be used.  I am not sure how bad
this would be in terms of storage efficiency.  It may be possible to
implement it using less storage, but I have no specific information on this.

2) I've seen a shared_ptr count implementation that used
InterlockedIncrement instead of the lwm code.  I have not examined this
in detail; however it seems that an approach like this is better in all
respects than trying to manage locks if the only need is to maintain a
count (which is what many mutex primatives are doing anyway).  Is there
some reason this cannot be used?

What do you think?

Aaron W. LaFramboise