Re: [boost] Re: Lightweight Mutex Code

6 Jul 2004

      Mattias Flodin wrote:
...
Quoting "Aaron W. LaFramboise" <aaronrabiddog51@aaronwl.com>:
...
However, despite your examples being slightly far-fetched (Sleep(1) would wait
for ~1ms, not 10ms, and a consumer/producer system would most likely have a
synchronized queue which would avoid any congestion around the smart pointers),
they are a
good enough hint to convince me that there are real-world applications that
would suffer from the problem. As you say, stable performance is more important
for a generic implementation.
Sorry. I was mistakenly thinking that it was not possible to sleep for
less than the system clock granularity (10ms), but now that I tested it,
this assertion appears to not be true.  Maybe that was only true for
older systems.

I will investigate further how bad the degenerate performance might
actually be with smart_ptr.
...
...
2) I've seen a shared_ptr count implementation that used
InterlockedIncrement instead of the lwm code.  I have not examined this
in detail; however it seems that an approach like this is better in all
respects than trying to manage locks if the only need is to maintain a
count (which is what many mutex primatives are doing anyway).  Is there
some reason this cannot be used?
I was a bit surprised by the use of a lock as well, and given that
InterlockedIncrement is an obvious solution for this kind of thing, I assumed
there were non-obvious reasons that it couldn't be used. My guess is exception
safety, but I would like to hear from the original authors (or anybody else in
the know) about this. Perhaps explaining rationale in the documentation would
be in order.
Well, here is the implementation that I have seen:
http://www.pdimov.com/cpp/shared_count_x86_exp2.hpp
This implementation seems like it would be faster than what is presently
being used, in all cases.

I do not know why it is not used presently.  Hopefully Peter Dimov will
comment.
...
Alternative 2 would be a superior solution if it's feasible. Alternative 1 is
not bad performance-wise if implemented using CRITICAL_SECTION. My only worry
is about resource usage, since mutexes are kernel objects. I can imagine
that situations where hundreds of thousands of smart pointers are used may end
up having an
impact on overall performance. In some cases, kernel memory usage is
restricted. I'm not sure if mutexes belong to that category. The number of
outstanding overlapped file operations is an example that does (in the order of
1000 outstanding operations from my measures, on a machine with 512 MB ram).
Well, even critical sections, Windows's fastest mutex primative, are
much slower in the noncontended case than a spinlock.  A two-stage
method is needed to match the performance of the present spinlock: a
lightweight atomic operation followed by a heavy-weight mutex if the
lock is contended.  This is why I was mentioned 8 bytes (one word for
the critical section, one word for the atomic operation) would be necessary.
...
I believe the majority of threaded applications do not need to share smart
pointers between threads to any great extent. Unfortunately the choice to avoid
a policy-based design implies that optional thread safety might add something
like three extra smart pointer classes to the six already existing ones.
Ideally, the smart pointer classes would work for more than the
majority; they would work for everyone.  I also agree that it is
unfortunate that users who do not need threads might have to pay for
threads anyway, or a more complicated smart pointer library.

Aaron W. LaFramboise