New subject: [statechart] Asynchronous Machines

22 Feb 2006

      ...
Date: Tue, 21 Feb 2006 19:03:11 -0600
From: David Greene <greened@obbligato.org>
Subject: Re: [Boost-users] [statechart] Asynchronous Machines
To: boost-users@lists.boost.org
Message-ID: <43FBB84F.70305@obbligato.org>
Content-Type: text/plain; charset=us-ascii
...
Nowadays both CPUs tend to have caches. Whether or not the cache
contents is guaranteed to be written back to the main memory when thread
A returns depends on the architecture of your hardware. IIRC, on X86
architectures there is such a guarantee. On other architectures you
might need to use mutexes or a similar concept to guarantee that thread
B sees the updates of thread A.
Mutexes don't effect cache coherence.  Likely there will have to be
calls to special intrinsics depending on the architecture.
Short Answer:  Yes they do.

Very long answer:  More correctly the problem isn't really 'cache coherence'
in the traditional meaning of cache coherency  (which is that the cache, for
your cpu, is consistent with main memory, etc), it is the order of memory
reads and writes, and Mutex's are guaranteed to do whatever is necessary to
make sure all queued reads are read before you get the mutex lock (ie they
force a memory 'acquire' barrier) and they make sure all writes are written
before the mutex is released (a 'release' memory barrier).

The important example is this:

Thread 1:
bool object_is_contstructed = false;
Object object(17);  // construct it
object_is_constructed = true;

Thread 2:
if (object_is_constructed)
{
   use_object(object);
}
else
   dont_use_object();  // really we would probably wait, but this is an
example

Problem 1 (Thread 1)
object_is_constructed may be set to true (in shared memory) BEFORE all the
constructed bytes of object are seen in shared memory.
This has NOTHING to do with compiler optimizations or 'volatile' or the OS
or other stuff.  It is the hardware.  (And really, whether you think of this
as cache coherency or read/write order or whatever doesn't matter much, as
long as the rules and concepts make sense.)

Problem 2:
EVEN IF object_is_constructed is written AFTER object is constructed, Thread
2 might read object BEFORE reading object_is_constructed!  (Similar stuff.
Basically, think about how the processor wants to optimize the order of
memory reads/writes so as to not jump around memory too much - similar to
how disk drivers try to optimize read/writes to avoid seeks.)

So what to do?  In a sense, flush the reads/writes similar to what you would
need to do for harddrives.  Any and all thread primitives will do this for
you implicitly.  POSIX threads (pthreads) actually try to document this, but
you can assume it for any thread library otherwise the library is
essentially useless.

So use locks.
Look are writing first:
   lock mutex,
   write object,
   write object_is_constructed
   release a mutex.

note that in whatever order object and is_constructed are actually written
to main memory, they both are written before the release.  ***the order of
those writes CANNNOT come after the order of the writes inside the mutex***
because of the 'release' memory barrier inside the mutex release.

Now on reading,
   lock mutex
   read is_constructed
   read object
   release mutex

So the read of object might still come before the read of is_constructed,
but you CANNOT read either of them before the read of the mutex.  And the
mutex was written in order, so everything stays in order.

To take it a step further, if you had barrier control, you COULD do away
with the mutex (if you wanted to spin lock, or just skip using 'object' as
in the example) as so:

Thread 1:
  write object
  release barrier
  write is_constructed

the barrier prevents the object write to move below the barrier.

Thread 2:
   read is_constructed  (into temp)
   acquire barrier
   if (temp)
       read object

the acquire barrier prevents the read of 'objecrt' to move above the
acquire.

Note that, for example, Win32 now has explicit barrier versions of its
Interlocked functions ie InterlockedIncrementAcquire, etc.  (The old ones,
it can be assumed, did 'full' barriers.)  Mac OSX has similar functions.

x86 never re-orders writes, but it can re-order reads. The IA64 *spec* says
it can reorder both, although the processor doesn't seem to.  Conventional
wisdom is that the spec is explicitly 'looser' so that they can loosen the
implementation in the future.

Oh, also, the are other refinements on the barriers - think of the
combinations - are we only concerned about the order of reads w.r.t. other
reads, writes with other writes?  What about the order of reads vs writes,
etc?

Thus back to the short answer:  mutexes et al, magically do what is
necessary to make it work.

Tony.

P.S. hope I got all that straight.  It is easy to mix up the names of the
barriers relative to their direction and reads vs writes, etc.  I tend to
just remember the general principles.

Re: [Boost-users] [statechart] Asynchronous Machines

Gottlob Frege

David Greene

tags

participants (2)