
I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic. For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't. sizeof(void*) works generically for these two platforms as is included in the code below. A default, safety first implementation might return true always for needs lock, or perhaps, true for all sizeof's greater than a byte. "Ifdef" hell for platforms can read to readability problems and big config.hpp equivalents. An alternative is to redirect headers to generic, where possible, or platform specific headers, where necessary, but this may lead to a bit of a maintenance nightmare due to supporting similar code for different platforms even though the code is typically much more readable. Another often intertwined approach is to include a platform independent api/function layer that is then used by the library, e.g. ACE's os.h. Any suggestions on best practice for this? Regards, Matt Hurd matthurd@acm.org www.hurd.com.au /*_____________________________________________________________________ created: 2004-6-7 16:20 filename: needs_lock.hpp author: Matt Hurd _______________________________________________________________________*/ #ifndef NEEDS_LOCK_HPP_200467 #define NEEDS_LOCK_HPP_200467 #include <boost/type_traits.hpp> #include "boost/type_traits/is_integral.hpp" #include "boost/type_traits/is_float.hpp" #include "boost/type_traits/detail/ice_or.hpp" #include "boost/config.hpp" // should be the last #include #include "boost/type_traits/detail/bool_trait_def.hpp" namespace boost { namespace detail { template< typename T > struct needs_lock_impl { BOOST_STATIC_CONSTANT(bool, value = (sizeof(T) > sizeof( void *)) ); }; } // namespace detail BOOST_TT_AUX_BOOL_TRAIT_DEF1(needs_lock,T,::boost::detail::needs_lock_impl<T>::value) } // namespace boost #include "boost/type_traits/detail/bool_trait_undef.hpp" #endif

I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't. sizeof(void*) works generically for these two platforms as is included in the code below. A default, safety first implementation might return true always for needs lock, or perhaps, true for all sizeof's greater than a byte.
Is it the case that we could actually rely on this? I thought for example that on IA32 operations were only atomic and thread safe when the assembly is prefixed by LOCK? John.

John Maddock wrote:
I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't. sizeof(void*) works generically for these two platforms as is included in the code below. A default, safety first implementation might return true always for needs lock, or perhaps, true for all sizeof's greater than a byte.
Is it the case that we could actually rely on this? I thought for example that on IA32 operations were only atomic and thread safe when the assembly is prefixed by LOCK?
AFAIK reads and ordinary writes are atomic without LOCK. Read-modify-write operations need LOCK.

Matt Hurd wrote:
I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't.
What do you mean by "atomic"? Do you expect the change to the value to be visible to other threads?

Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote: I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't.
What do you mean by "atomic"? Do you expect the change to the value to be visible to other threads?
Just memory consistent / atomic. That is, if you write to memory or read from memory all the bits will be guaranteed to make it as an indivisible unit. It doesn't say anything about the timing of the visibility or ordering w.r.t. like memory transactions. matt.

Matt Hurd wrote:
Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote: I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't.
What do you mean by "atomic"? Do you expect the change to the value to be visible to other threads?
Just memory consistent / atomic. That is, if you write to memory or read from memory all the bits will be guaranteed to make it as an indivisible unit. It doesn't say anything about the timing of the visibility or ordering w.r.t. like memory transactions.
How would you use such a thing? :-)

On Mon, 20 Sep 2004 14:25:46 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote:
Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote: I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't.
What do you mean by "atomic"? Do you expect the change to the value to be visible to other threads?
Just memory consistent / atomic. That is, if you write to memory or read from memory all the bits will be guaranteed to make it as an indivisible unit. It doesn't say anything about the timing of the visibility or ordering w.r.t. like memory transactions.
How would you use such a thing? :-)
My main current use is in a macro I use than generates getter and setters with and without locking for simplifying the construction concurrent aware classes. So if it boost::needs_lock<T> for the type of the attribute it will lock, otherwise it goes phew if I'm running on a Pentium 4 I probably just saved a couple of hundred cycles and doesn't lock. matt.

Matt Hurd wrote:
On Mon, 20 Sep 2004 14:25:46 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote:
Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote: I'm interested in getting something like "needs_lock" below into the normal traits, perhaps as part of boost thread. It informs you whether of not an operation on such a contiguous block needs a lock to be atomic.
For example, on ia32 32 bit aligned ops are atomic, on ia64 it is 64 bit. doubles on ia32 need locking to be atomic, on ia64 they don't.
What do you mean by "atomic"? Do you expect the change to the value to be visible to other threads?
Just memory consistent / atomic. That is, if you write to memory or read from memory all the bits will be guaranteed to make it as an indivisible unit. It doesn't say anything about the timing of the visibility or ordering w.r.t. like memory transactions.
How would you use such a thing? :-)
My main current use is in a macro I use than generates getter and setters with and without locking for simplifying the construction concurrent aware classes.
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?

On Mon, 20 Sep 2004 15:17:43 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?
In what way do you mean they not visible? Such aligned memory operations are guaranteed to be atomic on ia32 at a system wide level AFAIK. Here is what Intel has to say: ______________________ 7.1. LOCKED ATOMIC OPERATIONS The 32-bit IA-32 processors support locked atomic operations on locations in system memory. These operations are typically used to manage shared data structures (such as semaphores, segment descriptors, system segments, or page tables) in which two or more processors may try simultaneously to modify the same field or flag. The processor uses three interdependent mechanisms for carrying out locked atomic operations: • guaranteed atomic operations ^^^^^^^^^^^^^^^^^^^ • bus locking, using the LOCK# signal and the LOCK instruction prefix • cache coherency protocols that insure that atomic operations can be carried out on cached data structures (cache lock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 family processors These mechanisms are interdependent in the following ways. Certain basic memory transactions (such as reading or writing a byte in system memory) are always guaranteed to be handled atomically. That is, once started, the processor guarantees that the operation will be completed before another processor or bus agent is allowed access to the memory location. The processor also supports bus locking for performing selected memory operations (such as a read-modify-write operation in a shared area of memory) that typically need to be handled atomically, but are not automatically handled this way. Because frequently used memory locations are often cached in a processor's L1 or L2 caches, atomic operations can often be carried out inside a processor's caches without asserting the bus lock. Here the processor's cache coherency protocols insure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations. Note that the mechanisms for handling locked atomic operations have evolved as the complexity of IA-32 processors has evolved. As such, more recent IA-32 processors (such as the Pentium 4, Intel Xeon, and P6 family processors) provide a more refined locking mechanism than earlier IA-32 processors. These are described in the following sections.

Matt Hurd wrote:
On Mon, 20 Sep 2004 15:17:43 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?
In what way do you mean they not visible?
Such aligned memory operations are guaranteed to be atomic on ia32 at a system wide level AFAIK.
On IA32, sure, but not on other architectures.

On Mon, 20 Sep 2004 15:48:59 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
Matt Hurd wrote:
On Mon, 20 Sep 2004 15:17:43 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?
In what way do you mean they not visible?
Such aligned memory operations are guaranteed to be atomic on ia32 at a system wide level AFAIK.
On IA32, sure, but not on other architectures.
Yup. Which gets back to the subject line... What might be the best way to structure such things in boost for platform specific features. For example, Apache used to have ifdef hell and changed to separate files for platforms. This has pros and cons. Loki's current code base uses a reference implementation with platform specific overrides via headers. Boost has a bit of a mix of both approaches, I was just wondering on the best way to tackle it. Perhaps, like most things, there isn't a best, but its just a judgement call on a case by case basis. In that case, if boost were to have a thing such as boost::needs_lock<T> , what would be the best way to structure the code for the platform specifics. This seems to be beyond the normal compiler specific identity from config.hpp, or is it? Regards, Matt Hurd matthurd@acm.org www.hurd.com.au

Matt Hurd wrote: [...]
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?
In what way do you mean they not visible?
Such aligned memory operations are guaranteed to be atomic on ia32 at a system wide level AFAIK.
On IA32, sure, but not on other architectures.
Yup. Which gets back to the subject line...
On IA32, - stores have release semantics (sink-load/store mbar for preceding loads/stores in the program order); - loads have acquire semantics (hoist-load/store mbar for subsequent loads/stores in the program order); - lock instructions have compound release and acquire semantics (fully fenced). See Plan9 story for an illustration of lockless stuff that needs store-load fence (compound sink-store and hoist-load mbar) on IA32. http://groups.google.com/groups?selm=414E9E40.A66D4F48%40web.de (Subject: std::msync) regards, alexander.

Alexander Terekhov <terekhov@web.de> wrote:
Matt Hurd wrote: [...]
But why is a concurrent-aware setter useful if its effects aren't visible to the other threads?
In what way do you mean they not visible?
Such aligned memory operations are guaranteed to be atomic on ia32 at a system wide level AFAIK.
On IA32, sure, but not on other architectures.
Yup. Which gets back to the subject line...
On IA32,
- stores have release semantics (sink-load/store mbar for preceding loads/stores in the program order);
- loads have acquire semantics (hoist-load/store mbar for subsequent loads/stores in the program order);
- lock instructions have compound release and acquire semantics (fully fenced).
See Plan9 story for an illustration of lockless stuff that needs store-load fence (compound sink-store and hoist-load mbar) on IA32.
http://groups.google.com/groups?selm=414E9E40.A66D4F48%40web.de (Subject: std::msync)
regards, alexander.
Always insightful, thanks Alexander. However, I'm not talking about guarantees of ordering or timing, just "eventual" visibility. Avoiding a lock/fence where possible is a good thing if appropriate, though as you show the subtleties are not to be underestimated. The type trait needs_lock<T> would provide you with a cue to memory transaction atomicity. Timely availability and ordering, as you show, are different beasts. Perhaps a better name would be boost::memory_atomic<T> as needs_lock is almost as misleading as stl::set? By the way, when you say ia32 which architectures are you referring to, as they have varied quite a bit on their capabilities from 386 to P4 & Xeon? Do you have a view on the original question about how to structure code regarding architecture specific specializations? Regards, Matt Hurd matthurd@acm.org www.hurd.com.au

Matt Hurd wrote: [...]
By the way, when you say ia32 which architectures are you referring to, as they have varied quite a bit on their capabilities from 386 to P4 & Xeon?
I'm referring to "All IA-32". See Intel Itanium Architecture Software Developer's Manual, 6.3.4 Memory Ordering Interactions (apart from hoist-and-sink "pirate talk", so to say.).
Do you have a view on the original question about how to structure code regarding architecture specific specializations?
First off, atomocity without memory ordering and visibility protocol is pretty useless. atomic<> with msync is the way to go. But of course you can use its load(msync::none_t) and even store(T, msync:: none_t) where/when it's appropriate and totally safe. I mean for example... apropos std::string ;-) http://groups.google.com/groups?selm=3E4B6227.87DCBA45%40web.de http://groups.google.com/groups?selm=3F741BDC.60E4D173%40web.de regards, alexander.

On Tue, 21 Sep 2004 17:37:03 +0200, Alexander Terekhov <terekhov@web.de> wrote:
Matt Hurd wrote: First off, atomocity without memory ordering and visibility protocol is pretty useless.
For many, perhaps most, tasks yes. However, think of a database without locking... it can still be useful with optimistic concurrency for some styles of application. Atomic memory transactions are just a guarantee that the "bits" are consistent. Adding further guarantees slows performance for the ordering or visibility guarantees that you add. So for minimizing resource usage and maximising performance where optimistic concurrency is a good enough quality of service boost::memory_atomic might be enough, but perhaps the potential for misuse is just too great. Maybe boost::really_dangerous_memory_atomicity_guarantee<T> is a more appropriate name ;-) I can imagine a taxonomy of quality of service guarantees, all with different performance tradeoffs, that are applicable to a C++ program writer for concurrency, which eminent people over on std.c++ seem to be tackling in some form. Consistency, atomicity, and different strengths of visibilty and ordering seem to be the main issues, but then again I didn't know till a week or two ago that some platforms had as many as 15 different explicit operations for memory fencing which makes me feel like a rather old babe in the woods. This is beyond the point I was originally pursuing of how best to organise code for platform specific architectures. Whatever works and survives a review I guess is the practical answer... Regards, Matt Hurd matthurd@acm.org www.hurd.com.au

Matt Hurd <matt.hurd@gmail.com> writes:
On Tue, 21 Sep 2004 17:37:03 +0200, Alexander Terekhov <terekhov@web.de> wrote:
Matt Hurd wrote: First off, atomocity without memory ordering and visibility protocol is pretty useless.
For many, perhaps most, tasks yes. However, think of a database without locking... it can still be useful with optimistic concurrency for some styles of application. Atomic memory transactions are just a guarantee that the "bits" are consistent.
They typically don't exist for "records" of more than one word, if I'm not mistaken. How useful is that for a database? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Wed, 22 Sep 2004 06:41:07 -0400, David Abrahams <dave@boost-consulting.com> wrote:
Matt Hurd <matt.hurd@gmail.com> writes:
On Tue, 21 Sep 2004 17:37:03 +0200, Alexander Terekhov <terekhov@web.de> wrote:
Matt Hurd wrote: First off, atomocity without memory ordering and visibility protocol is pretty useless.
For many, perhaps most, tasks yes. However, think of a database without locking... it can still be useful with optimistic concurrency for some styles of application. Atomic memory transactions are just a guarantee that the "bits" are consistent.
They typically don't exist for "records" of more than one word, if I'm not mistaken. How useful is that for a database?
Sorry Dave, I'm not sure I understand your question which respect to the metaphor I was using. I just saw the C++ memory paper a few hours ago: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1680.pdf I see what I am using as an example is also addressed here. It also splits the memory model into the three aspects I suggested atomicity, visibility and ordering. My non-suggestion , it was just an example for platform specific necessity which hijacked the thread ;-) , boost::needs_lock<T> that perhaps would be better named something that better reflects the potential for memory consistency if alignment is appropriate. boost::maybe_memory_atomic<T> perhaps? This addresses incompletely the atomicity aspect of the memory model. The other main part to determining atomicity is alignment. That is being an rvalue with correct alignment, which may not be determined from the type. This might be assumed in code or an additional helper such as boost::is_memory_atomic_aligned<T>(t) plus, perhaps, a complete package with boost::is_memory_atomic<T>(t). Unfortunately only maybe_memory_atomic would be compile but this would be OK as you'd usually be assuming appropriate alignment anyway. A default implementation for boost::maybe_memory_atomic<T> would return false. Though perhaps a sizeof 1 would be a reasonable assumption until the usual pdp-11 caveats get dragged up ;-) For ia32, Pentium and above, it would return true for sizeof(T) <= 8. For ia32, 486, it would return true for sizeof(T) <= 4. Not sure what the 386 version should return An immediate problem may be that an 8 byte type on P6 might resolve to two 4 byte types and thus two independent reads and writes. Thus perhaps, is fundamental or POD, and an appropriate sizeof might be nessary to consider as part of the specialization framework for a solution. In terms of visibility the approach of redefining what "volatile" means seems useful as discussed in the paper. Meanwhile a simple fence / flushing like structure might be enough to many things to get by as a library solution. Perhaps a scoped fence-like thingy that has acquire / release / flushing semantics... typedef synch::multi_thread threading_policy; A; { boost::memory_guarantee<threading_policy> mg;() B; } C; which guarantees ordering and visibility of all RW operations to be A -> B -> C; Or should this just be limited to A->B, or , alternatively B->C. This may not be possible on some architectures and thus it should assert so at least you know it is unsafe. On a strictly ordered architecture with simple consistency, e.g. only ever a single processor with sequential semantics guaranteed by the hardware and compiler, it would be a NOP and thus optimized away. Any suggestions for the efficient implementation for the constructor and destructor for pentium / p6 etal? Similarly, perhaps a memory_visibility scoped var could guarantee the visibility of B ops. Perhaps a memory_ordering scoped var could guarantee appropriate ordering, say A->B->C or A->B or B->C which ever would be commonly inexpensive across typical platforms, without having to guarantee visibility, but I'm not sure ordering guarantees are too useful without visibility, but combined with volatile for something like hardware register access, perhaps it is. Note that using such a memory_guarantee still requires appropriate memory consistency, this isn't included as part of the memory_guarantee. $0.025 Regards, Matt Hurd matthurd@acm.org www.hurd.com.au

Peter Dimov <pdimov@mmltd.net> wrote:
How would you use such a thing? :-)
Here's a concrete example below. Though, as has been pointed out previously, boost::call_traits are probably not the most appropriate to use here. It assumes some other concepts not encapsulated here. matt. /*_____________________________________________________________________ created: 2004-6-8 12:06 filename: helpers.hpp author: Matt Hurd _______________________________________________________________________*/ #ifndef HELPERS_HPP_200468 #define HELPERS_HPP_200468 #define SYNCH_SIMPLE_ATTRIBUTE( VAR_NAME, VAR_TYPE ) \ \ public: \ template< bool > \ void \ VAR_NAME \ ( \ boost::call_traits< VAR_TYPE >::param_type new_ ## VAR_NAME \ ) \ { \ if ( boost::needs_lock<VAR_TYPE>::value ) { \ lock lk(guard_ ); \ VAR_NAME<false>( new_ ## VAR_NAME); \ } else { \ VAR_NAME<false>( new_ ## VAR_NAME); \ } \ } \ \ void \ VAR_NAME \ ( \ boost::call_traits< VAR_TYPE >::param_type new_ ## VAR_NAME \ ) \ { \ VAR_NAME < true > ( new_ ## VAR_NAME ); \ } \ \ template< > \ void \ VAR_NAME<false> \ ( \ boost::call_traits<VAR_TYPE>::param_type new_ ## VAR_NAME \ ) \ { \ VAR_NAME ## _ = new_ ## VAR_NAME; \ } \ \ template< bool > \ VAR_TYPE \ VAR_NAME( ) const \ { \ if ( boost::needs_lock< VAR_TYPE >::value ) { \ lock lk(guard_, synch::lock_status::shared ); \ return VAR_NAME<false>( ); \ } else { \ return VAR_NAME<false>( ); \ } \ } \ \ VAR_TYPE \ VAR_NAME( ) const \ { \ return VAR_NAME<true>( ); \ } \ \ template< > \ VAR_TYPE \ VAR_NAME <false>() const \ { \ return VAR_NAME ## _; \ } \ \ boost::call_traits<VAR_TYPE>::const_reference \ VAR_NAME ## _ref( ) const \ { \ return VAR_NAME ## _; \ } \ \ private: \ VAR_TYPE VAR_NAME ## _; \ #endif
participants (5)
-
Alexander Terekhov
-
David Abrahams
-
John Maddock
-
Matt Hurd
-
Peter Dimov