Date: Tue, 21 Feb 2006 19:03:11 -0600
From: David Greene <
greened@obbligato.org>
Subject: Re: [Boost-users] [statechart] Asynchronous Machines
To: boost-users@lists.boost.org
Message-ID: <
43FBB84F.70305@obbligato.org>
Content-Type: text/plain; charset=us-ascii
> Nowadays both CPUs tend to have caches. Whether or not the cache
> contents is guaranteed to be written back to the main memory when thread
> A returns depends on the architecture of your hardware. IIRC, on X86
> architectures there is such a guarantee. On other architectures you
> might need to use mutexes or a similar concept to guarantee that thread
> B sees the updates of thread A.
Mutexes don't effect cache coherence. Likely there will have to be
calls to special intrinsics depending on the architecture.
Short Answer: Yes they do.
Very long answer: More correctly the problem isn't really 'cache
coherence' in the traditional meaning of cache coherency (which
is that the cache, for your cpu, is consistent with main memory, etc),
it is the order of memory reads and writes, and Mutex's are guaranteed
to do whatever is necessary to make sure all queued reads are read
before you get the mutex lock (ie they force a memory 'acquire'
barrier) and they make sure all writes are written before the mutex is
released (a 'release' memory barrier).
The important example is this:
Thread 1:
bool object_is_contstructed = false;
Object object(17); // construct it
object_is_constructed = true;
Thread 2:
if (object_is_constructed)
{
use_object(object);
}
else
dont_use_object(); // really we would probably wait, but this is an example
Problem 1 (Thread 1)
object_is_constructed may be set to true (in shared memory) BEFORE all
the constructed bytes of object are seen in shared memory.
This has NOTHING to do with compiler optimizations or 'volatile' or the
OS or other stuff. It is the hardware. (And really, whether
you think of this as cache coherency or read/write order or whatever
doesn't matter much, as long as the rules and concepts make sense.)
Problem 2:
EVEN IF object_is_constructed is written AFTER object is constructed,
Thread 2 might read object BEFORE reading object_is_constructed!
(Similar stuff. Basically, think about how the processor wants to
optimize the order of memory reads/writes so as to not jump around
memory too much - similar to how disk drivers try to optimize
read/writes to avoid seeks.)
So what to do? In a sense, flush the reads/writes similar to what
you would need to do for harddrives. Any and all thread
primitives will do this for you implicitly. POSIX threads
(pthreads) actually try to document this, but you can assume it for any
thread library otherwise the library is essentially useless.
So use locks.
Look are writing first:
lock mutex,
write object,
write object_is_constructed
release a mutex.
note that in whatever order object and is_constructed are actually
written to main memory, they both are written before the release.
***the order of those writes CANNNOT come after the order of the writes
inside the mutex*** because of the 'release' memory barrier inside the
mutex release.
Now on reading,
lock mutex
read is_constructed
read object
release mutex
So the read of object might still come before the read of
is_constructed, but you CANNOT read either of them before the read of
the mutex. And the mutex was written in order, so everything
stays in order.
To take it a step further, if you had barrier control, you COULD do
away with the mutex (if you wanted to spin lock, or just skip using
'object' as in the example) as so:
Thread 1:
write object
release barrier
write is_constructed
the barrier prevents the object write to move below the barrier.
Thread 2:
read is_constructed (into temp)
acquire barrier
if (temp)
read object
the acquire barrier prevents the read of 'objecrt' to move above the acquire.
Note that, for example, Win32 now has explicit barrier versions of its
Interlocked functions ie InterlockedIncrementAcquire, etc. (The
old ones, it can be assumed, did 'full' barriers.) Mac OSX has
similar functions.
x86 never re-orders writes, but it can re-order reads. The IA64 *spec*
says it can reorder both, although the processor doesn't seem to.
Conventional wisdom is that the spec is explicitly 'looser' so that
they can loosen the implementation in the future.
Oh, also, the are other refinements on the barriers - think of the
combinations - are we only concerned about the order of reads w.r.t.
other reads, writes with other writes? What about the order of
reads vs writes, etc?
Thus back to the short answer: mutexes et al, magically do what is necessary to make it work.
Tony.
P.S. hope I got all that straight. It is easy to mix up the names
of the barriers relative to their direction and reads vs writes,
etc. I tend to just remember the general principles.