Re: [Boost-users] [statechart] Asynchronous Machines
Date: Wed, 22 Feb 2006 10:31:06 -0600 From: David Greene
Subject: Re: [Boost-users] [statechart] Asynchronous Machines To: boost-users@lists.boost.org Message-ID: <43FC91CA.8060801@obbligato.org> Content-Type: text/plain; charset=us-ascii Gottlob Frege wrote:
Very long answer: More correctly the problem isn't really 'cache coherence' in the traditional meaning of cache coherency (which is that the cache, for your cpu, is consistent with main memory, etc), it is the order of memory reads and writes, and Mutex's are guaranteed to do whatever is necessary to make sure all queued reads are read before you get the mutex lock (ie they force a memory 'acquire' barrier) and they make sure all writes are written before the mutex is released (a 'release' memory barrier).
I understand what you're saying and agree with you in that that's the current way hardware and software is implemented in the vast majority of cases.
However, the concepts of serializing access and maintaining memory consistency and conherence are orthogonal. There have been architectures (in academia, mostly) that require explicit software cache control, for example. One would have to include a cache flush in your examples. The theory is that by separating concerns the programmer (or compiler) has more freedom to loosen up implementations based on weaker requirements of the application, thereby gaining performance.
We're starting to see this much more in HPC systems, for example, where there are a multitude of synchronization primitives available with varying semantics that imply performance tradeoffs. Some machines cache remote memory (often under software control), others don't.
So I agree with you in the case of the typical machine architecture, but it won't necessarily hold in the future.
Obviously you understand the problems then. But I don't understand what you don't agree with - my explanation (obviously missing some details), or whether mutexes will work in the future? I'm saying that mutexes will always do whatever is necessary. For example, the pthreads spec tries very hard to describe itself in such a way so that it will 'just work' regardless of the underlying problems. And any other mutex/thread library will also work around the problems (flushing caches, doing whatever is necessary) or else not be worth using. P.S. do you have any links to some of the more esoteric synchronization primatives? I've been looking for variations on the typical acquire/release barriers as well as systems that don't have CAS, etc. All in the hopes of making at least a start on a useful atomics library. -Dave Thanks, Tony.
At 2:08 AM -0500 2/23/06, Gottlob Frege wrote:
All in the hopes of making at least a start on a useful atomics library.
Are you familiar with the atomic operations library at http://www.hpl.hp.com/research/linux/atomic_ops
Gottlob Frege wrote:
I'm saying that mutexes will always do whatever is necessary. For example, the pthreads spec tries very hard to describe itself in such a way so that it will 'just work' regardless of the underlying problems. And any other mutex/thread library will also work around the problems (flushing caches, doing whatever is necessary) or else not be worth using.
I agree that current mutex implementations will do this. I'm simply pointing out that the mutex concept (serializing access to a region of code) is orthogonal to making sure that code is correct in a memory consistency sense. The two concepts are obviously used in a synergistic way. An example of when this can be useful: if I'm on a NUMA architecture, I might like to serialize the threads on my local node (which provides hardware coherence) to make sure updates are ordered but I don't want to pay the cost of an expensive global sync operation to, for example, flush caches of remote nodes because I, as the programmer, know that the data in question is only ever accessed locally.
P.S. do you have any links to some of the more esoteric synchronization primatives? I've been looking for variations on the typical acquire/release barriers as well as systems that don't have CAS, etc. All in the hopes of making at least a start on a useful atomics library.
I'm thinking mostly of some of the things multithreading researchers have been doing with software cache coherence and the instructions vector machines like the Cray X1 line have to order references between the scalar and vector processors for local and remote memory. There are lots of variations to tune the operation to just what the programmer needs, and nothing more. -Dave
participants (3)
-
David Greene
-
Gottlob Frege
-
Kim Barrett