Re: [boost] Re: TR2 and C++0x heads up

27 Apr 2005

      On 4/27/05, Felipe Magno de Almeida <felipe.m.almeida@gmail.com> wrote:
...
I dont know if this thread is the best place to make this question, but...
Where do I find the proposals about threads in C++? libs, core
language modifications, memory model and etc... ? And I've seen
somewhere that there's some group discussing about those things
either, but I had only access to it through a page... maybe there's
somehow for others to receive those mails? Even havent permission to
reply any? I'm really interested about it, but I fear things like
atomicity garantees to volatile and static variables and some others
too strong garantees...
The latest thinking seems to be not to change anything that would
incur overhead as per the latest mailing:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1777.pdf

This is in keeping with the seemingly fundamental c++ principle of not
paying for what you don't use.

<quote>
The consensus within our group (grudgingly for some of us) is that it would
be preferable to leave the semantics of data races undefined, mostly because
it is much more consistent with the current C++ specification, practice, and
especially implementations. In the absence of objections from the committee,
we plan to follow that path.
</quote>

Though I find much of the paper of interest the useful portable
implementation of such is such a long way away that boost cover much
of this territory via libs.

For some architectures such a library approach may not be possible. 
For example, I'm not sure you can do a load_store barrier by library
on ia64 when I look at the JSR133 docs.  This seems to be the only
example of popular architecture where a library approach may not be
possible.

In the meantime so things such as statics in functions should perhaps
just be declared as concurrently unsafe and avoided.

I think boost should do a few things:

1) memory model

Acknowledge there is none.  Therefore must assume a data race.  Have a
practical portable memory barrier available:

Have a load_load, load_store, store_store, store_load primitive set of
functions for memory barriers modelled after the JSR 133 cook book
guidelines:  http://gee.cs.oswego.edu/dl/jmm/cookbook.html

This should also introduce platform labels for architectures in boost
in addition to the current compiler / OS #defines.

I think we need to assume that lock/unlock synchronisation of a
non-null mutex is a full barrier.  Is that correct?  Perhaps having
such primitives queriable by trait is enough...

2) atomic read writes

There should a type trait for a type that returns if the type is
sufficiently small for atomic reading or writing to memory for the
given architecture being compiled for.

This type trait may be used for generic or macro methods of
introducing or, perhaps more importantly, avoiding synchronisation.

I use such a technique in some of my code successfully.  Especially
useful for helping with synch-safe property style interfaces.

3) assuring real memory

Not just a promotion to register.  This is required so that when a
memory barrier is invoked it is acting on the necessary parts of the
code we expect it to and the variable we are interested in sharing is
actually shareable and not aggressively optimized into a cpu specific
feature.

Is volatile a practical way to do this (perhaps in co-operation with
compiler optimization settings)?  Can we assume "volatile" assures
this and the var will not be a register only optimisation.  Is there a
better way.  I think this is the only guarantee we need from volatile.
 Atomicity of reads and writes might be nice for volatile, but this is
out of scope of a library.

4) further synch primitives

a synch lib that provides the usual suspects of atomic ops such as
inc, dec, add, sub, cas.  These are normally defined for an
architecture on a certain width of bit field.  Thus a trait mechanism
should indicate native support for a type.

Perhaps a trait mechanism for what type of memory barrier equivalent
guarantees have been provided by the operation.

Perhaps generic implementations that, at worst, use a full mutex, for
wider types.

5)  better generic synchronization api for mutex operations

Should be able to use null_synch, simple_synch and shared_synch ( or
rw_synch ) primitives in a policy like manner so that we can write
concurrent aware constructs to a shared/exclusive model and have that
work for no concurrency, exclusive concurrency and shared/exclusive
concurrency by policy.  ACE has had something similar for over ten
years.  I use a simple type translation layer over boost::thread to
achieve a similar outcome.

6)  threading api

Then we can worry more about an appropriate threading api and the
religion of cancelling threads, OS specific operations, propagation of
exceptions, and the like.

7)  cost model

It would be nice to include traits with approximate relative costs for
architectures for some operations as these can be vastly different. 
Many ops on many architectures might be zero cost.  For example the
stronger memory model of x86 gives you a NOP, zero cost, for
load_store and store_store barriers.  This will become increasingly
important in developing, so called, lock free methods.  I call them
"so called" as they usually require memory model guarantees through
barriers or atomic ops, such as CAS, and these can be very expensive
on some architectures.  Lock free algorithms are often more
complicated and if the non-locking concurrency costs are sufficiently
high the benefits can quickly dissipate.  For example, locking
operations on an intel P4 are expensive.  On an Athlon and ia64 they
are considerably cheaper by more than a factor of two.  This, and
varying barrier, synch primitive costs, changes the appropriateness of
an algorithm for an architecture but I can imagine fairly simple
compile time formulae for determining 80% of class / algorithm
selection problems transparently to a user.

$0.02,

matt
matthurd@acm.org

Re: [boost] Re: TR2 and C++0x heads up

Matt Hurd