Re: [boost] [thread] RFC standard proposed mutex, read-write mutex, condition

23 Aug 2007

      Howard Hinnant wrote:
...
On Aug 22, 2007, at 2:59 PM, Zach Laine wrote:
...
Could the folks who object to the
current design spell it out for me a bit more explicitly -- what in
the design is dangerous/inconvenient enough to throw away one or more
of the 4 goals?
I would like to see an answer to Zach's question too.  I do not know
what the major objection is with the "current proposal".  I only know
that people are suggesting alternatives.
I deliberately only submitted an alternative for consideration instead of 
poking holes in your design or argumentation. But if you insist...

I don't believe that my suggested alternative throws away any of the four 
goals, hypothetical malicious vendor D notwithstanding. It is still possible 
for the vendor to meet the goals. In addition, it adds

Goal 5: the ability to control the level of checking globally;
Goal 6: ... without source changes or even recompilation.

This can help if hypothetical user D, lured by the "no overhead, less L1 
cache misses!" slogan, uses unchecked<> as a matter of habit. "Checking is 
for other people."

On reflection though, I'll change the constructor from

    explicit condition( Mutex * pm = 0 );

to

    explicit condition( Mutex * pm );

as it's too easy to accidentally disable checking by not including the 
condition in the member init list.

Going back to:

class shared_mutex
{
     typedef mutex mutex_t;
     typedef condition<unchecked<mutex_t>> cond_t;

     mutex_t  mut_;
     cond_t   gate1_;
     cond_t   gate2_;
     unsigned state_;
    ...

and the L1 cache miss argument for:

class A
{
    shared_mutex mx_;
    ...
};

vector<A> v;

1. The size of shared_mutex, according to your numbers, is 104. If we assume 
16 bytes of state in A, this makes sizeof(A) 120. The addition of two 
pointers makes it 128. This is a 7% increase, but it also happens to round 
up the size of A to 128, which makes it never straddle a cache line, so the 
"more bloated" version will actually be faster. Note that I picked the 
number 16 before realizing that. :-) If A had 24 bytes of state the two 
pointers would of couse be detrimental.

2. I'm having a hard time imagining a program where the L1 cache misses due 
to the increased size of A would matter. An object of type A allocates a not 
insignificant amount of kernel resources, so it would be hard to keep so 
many A's in memory for the +7% L1 misses to show up.

3. What is sizeof(pthread_rwlock_t) on your platform? Is it not something 
like 48-56? This is half the size of the above shared_mutex, so users who 
are L1 cache miss conscious will not use the shared_mutex anyway.

4. The vector v would need to be protected by its own rwlock as well since 
you can't reallocate it while someone is accessing the A's, which creates a 
central bottleneck. vector< shared_ptr<A> > does not suffer from this 
problem as you can reallocate it while someone is holding a reference to an 
element in the form of shared_ptr.

5. If optimizing the L1 cache friendliness of shared_mutex is an important 
goal, I would consider moving the conditions to the heap as they aren't 
accessed on the fast path.

class shared_mutex
{
    mutex mx_;
    unsigned state_;
    void * rest_;
};

I know that your actual shared_mutex will use atomics on state_, so you 
could even be able to make

class shared_mutex
{
    unsigned state_; // , state2_?
    void * rest_;
};

work. Uncontended access now doesn't touch *rest_ and L1 cache misses are a 
fraction of what they used to be. Contended access does cause more cache 
misses, but this is shadowed by the cost of the contention.

Re: [boost] [thread] RFC standard proposed mutex, read-write mutex, condition

Peter Dimov