Re: [Boost-users] [statechart] Asynchronous Machines
Date: Tue, 21 Feb 2006 19:03:11 -0600 From: David Greene
Subject: Re: [Boost-users] [statechart] Asynchronous Machines To: boost-users@lists.boost.org Message-ID: <43FBB84F.70305@obbligato.org> Content-Type: text/plain; charset=us-ascii Nowadays both CPUs tend to have caches. Whether or not the cache contents is guaranteed to be written back to the main memory when thread A returns depends on the architecture of your hardware. IIRC, on X86 architectures there is such a guarantee. On other architectures you might need to use mutexes or a similar concept to guarantee that thread B sees the updates of thread A.
Mutexes don't effect cache coherence. Likely there will have to be calls to special intrinsics depending on the architecture.
Short Answer: Yes they do. Very long answer: More correctly the problem isn't really 'cache coherence' in the traditional meaning of cache coherency (which is that the cache, for your cpu, is consistent with main memory, etc), it is the order of memory reads and writes, and Mutex's are guaranteed to do whatever is necessary to make sure all queued reads are read before you get the mutex lock (ie they force a memory 'acquire' barrier) and they make sure all writes are written before the mutex is released (a 'release' memory barrier). The important example is this: Thread 1: bool object_is_contstructed = false; Object object(17); // construct it object_is_constructed = true; Thread 2: if (object_is_constructed) { use_object(object); } else dont_use_object(); // really we would probably wait, but this is an example Problem 1 (Thread 1) object_is_constructed may be set to true (in shared memory) BEFORE all the constructed bytes of object are seen in shared memory. This has NOTHING to do with compiler optimizations or 'volatile' or the OS or other stuff. It is the hardware. (And really, whether you think of this as cache coherency or read/write order or whatever doesn't matter much, as long as the rules and concepts make sense.) Problem 2: EVEN IF object_is_constructed is written AFTER object is constructed, Thread 2 might read object BEFORE reading object_is_constructed! (Similar stuff. Basically, think about how the processor wants to optimize the order of memory reads/writes so as to not jump around memory too much - similar to how disk drivers try to optimize read/writes to avoid seeks.) So what to do? In a sense, flush the reads/writes similar to what you would need to do for harddrives. Any and all thread primitives will do this for you implicitly. POSIX threads (pthreads) actually try to document this, but you can assume it for any thread library otherwise the library is essentially useless. So use locks. Look are writing first: lock mutex, write object, write object_is_constructed release a mutex. note that in whatever order object and is_constructed are actually written to main memory, they both are written before the release. ***the order of those writes CANNNOT come after the order of the writes inside the mutex*** because of the 'release' memory barrier inside the mutex release. Now on reading, lock mutex read is_constructed read object release mutex So the read of object might still come before the read of is_constructed, but you CANNOT read either of them before the read of the mutex. And the mutex was written in order, so everything stays in order. To take it a step further, if you had barrier control, you COULD do away with the mutex (if you wanted to spin lock, or just skip using 'object' as in the example) as so: Thread 1: write object release barrier write is_constructed the barrier prevents the object write to move below the barrier. Thread 2: read is_constructed (into temp) acquire barrier if (temp) read object the acquire barrier prevents the read of 'objecrt' to move above the acquire. Note that, for example, Win32 now has explicit barrier versions of its Interlocked functions ie InterlockedIncrementAcquire, etc. (The old ones, it can be assumed, did 'full' barriers.) Mac OSX has similar functions. x86 never re-orders writes, but it can re-order reads. The IA64 *spec* says it can reorder both, although the processor doesn't seem to. Conventional wisdom is that the spec is explicitly 'looser' so that they can loosen the implementation in the future. Oh, also, the are other refinements on the barriers - think of the combinations - are we only concerned about the order of reads w.r.t. other reads, writes with other writes? What about the order of reads vs writes, etc? Thus back to the short answer: mutexes et al, magically do what is necessary to make it work. Tony. P.S. hope I got all that straight. It is easy to mix up the names of the barriers relative to their direction and reads vs writes, etc. I tend to just remember the general principles.
Gottlob Frege wrote:
Very long answer: More correctly the problem isn't really 'cache coherence' in the traditional meaning of cache coherency (which is that the cache, for your cpu, is consistent with main memory, etc), it is the order of memory reads and writes, and Mutex's are guaranteed to do whatever is necessary to make sure all queued reads are read before you get the mutex lock (ie they force a memory 'acquire' barrier) and they make sure all writes are written before the mutex is released (a 'release' memory barrier).
I understand what you're saying and agree with you in that that's the current way hardware and software is implemented in the vast majority of cases. However, the concepts of serializing access and maintaining memory consistency and conherence are orthogonal. There have been architectures (in academia, mostly) that require explicit software cache control, for example. One would have to include a cache flush in your examples. The theory is that by separating concerns the programmer (or compiler) has more freedom to loosen up implementations based on weaker requirements of the application, thereby gaining performance. We're starting to see this much more in HPC systems, for example, where there are a multitude of synchronization primitives available with varying semantics that imply performance tradeoffs. Some machines cache remote memory (often under software control), others don't. So I agree with you in the case of the typical machine architecture, but it won't necessarily hold in the future. -Dave
participants (2)
-
David Greene
-
Gottlob Frege