Re: [boost] Re: TR2 and C++0x heads up

28 Apr 2005

      On 4/29/05, Alexander Terekhov <terekhov@web.de> wrote:
...
Matt Hurd wrote:
[...]
...
That said, for naive users do you have an opinion on the memory
semantics locking and unlocking a mutex should provide to deliver "the
least surprise" to a user.  x86 and sparc deliver a full fence for
Solaris (including x86-incarnation AFAIK) have locks that don't need
interlocked read-{modify-}write in unlock() (similar to spinlocks).
On x86, such unlock() has pure release semantics, not fully-fenced
silliness.
[...]
...
Your thoughts on
memory vis requirements for locking and unlocking mutexes for the
least surprise?
Uhmm, how about <http://tinyurl.com/77hvz>?
regards,
alexander.
As a short term solution (several years??), until C++0x saves us, I
think a store fence or release semantics requirement for an unlock()
on a mutex will provide for the least surprise for the majority of
cases.  Being able to override this behaviour by policy with something
more sophisticated, such as behaviour from your msync primitives
provides the finer control for more sophisticated users.

Given default release behaviour on unlock(), is there any need to
require the overhead of an acquire for a lock() for the typical use of
a mutex().  I'm tempted to say why not as it seems to be the norm for
most architectures anyway, but it doesn't seem necessary from the
typical case if the unlock is the equivalent of a store
fence/release...  For the special cases, naive users can break down to
a boost::fence::xxx and sophisticated users can try a msync op...

This would get us to the place where chunky memory visibility ops are
available to boost and lock / unlock have defined visibility semantics
(or the appropriate lack thereof ;-)).

Adding traits for atomicity of reads and writes of particular
contiguous sizes and providing basic atomic ops with generic mutex
emulations for missing ops and extended sizes with a primitive cycle
cost model for the arch would provide the necessary tools for a user
of my limited desires.  Further sophistication would require something
more akin to Terekhov msync and friends.

It doesn't solve compiler oriented issues such as optimizers
relocating code beyond barriers but that is outside the realm of what
a lib can do and will have to stay in its current realm of
optimization / test / debug / fool compiler trickery.

Does that seem reasonable without courting too much disaster?  

Perhaps it is a waste of time as the user group may be too small. 
Users like me, moderate concurrency users that want some memory
visibility guarantees and control, but are not at the level of
sophistication of a Dimov or Terekhov, might just be too few in number
to matter.  I dunno.

Matt.