[interprocess] Atomic ops supports for Alpha processors.

Hello, attached patch introduces support for the atomic ops needed by interprocess on Tru64/CXX/Alpha. It currently misses support for atomic add and sub, but those are not used right now. There are a few issues I would like to raise regarding the atomic operations: 1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated. 2) atomic_sub32 does not return the old value, but atomic_add32 does. This seems inconsistent to me. 3) Has the use of the low level atomic ops API as proposed in WG21/N2047 (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html) been considered. If yes, why has it been rejected? If no, would it makes sense to use this API? 4) Is there a need for an atomic ops boost library? I seem to remember that other libraries need atomic ops as well. Regards, Markus Index: atomic.hpp =================================================================== --- atomic.hpp (revision 40078) +++ atomic.hpp (working copy) @@ -436,6 +436,82 @@ } //namespace interprocess{ } //namespace boost{ +#elif defined(__osf__) + +#include <machine/builtins.h> +#include <c_asm.h> + +namespace boost{ +namespace interprocess{ +namespace detail{ + +//! Atomically increment an apr_uint32_t by 1 +//! "mem": pointer to the object +//! Returns the old value pointed to by mem +inline boost::uint32_t atomic_inc32(volatile boost::uint32_t *mem) +{ return __ATOMIC_INCREMENT_LONG(mem); } + +//! Atomically decrement an boost::uint32_t by 1 +//! "mem": pointer to the atomic value +//! Returns false if the value becomes zero on decrement, otherwise true +inline bool atomic_dec32(volatile boost::uint32_t *mem) +{ return __ATOMIC_DECREMENT_LONG(mem); } + +// Rational for the implementation of the atomic read and write functions. +// +// 1. The Alpha Architecture Handbook requires that access to a byte, +// an aligned word, an aligned longword, or an aligned quadword is +// atomic. (See 'Alpha Architecture Handbook', version 4, chapter 5.2.2.) +// +// 2. The CXX User's Guide states that volatile quantities are accessed +// with single assembler instructions, and that a compilation error +// occurs when declaring a quantity as volatile which is not properly +// aligned. + +//! Atomically read an boost::uint32_t from memory +inline boost::uint32_t atomic_read32(volatile boost::uint32_t *mem) +{ return *mem; } + +//! Atomically set an boost::uint32_t in memory +//! "mem": pointer to the object +//! "param": val value that the object will assume +inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t val) +{ *mem = val; } + +//! Compare an boost::uint32_t's value with "cmp". +//! If they are the same swap the value with "with" +//! "mem": pointer to the value +//! "with" what to swap it with +//! "cmp": the value to compare it to +//! Returns the old value of *mem +inline boost::uint32_t atomic_cas32 + (volatile boost::uint32_t *mem, boost::uint32_t with, boost::uint32_t cmp) +{ + // Notes: + // + // 1. Branch prediction prefers branches, as we assume that the lock + // is not stolen usually, we branch forward conditionally on success + // of the store, and not conditionally backwards on failure. + // + // 2. The memory lock is invalidated when a branch is taken between + // load and store. Therefore we can only branch if we don't need a + // store. + + return asm("10: ldl_l %v0,(%a0) ;" // load prev value from mem and lock mem + " cmpeq %v0,%a2,%t0 ;" // compare with given value + " beq %t0,20f ;" // if not equal, we're done + " mov %a1,%t0 ;" // load new value into scratch register + " stl_c %t0,(%a0) ;" // store new value to locked mem (overwriting scratch) + " bne %t0,20f ;" // store succeeded, we're done + " br 10b ;" // lock has been stolen, retry + "20: ", + mem, with, cmp); +} + +} //namespace detail{ +} //namespace interprocess{ +} //namespace boost{ + #else #error No atomic operations implemented for this platform, sorry!

Markus Schöpflin escribió:
Hello,
attached patch introduces support for the atomic ops needed by interprocess on Tru64/CXX/Alpha. It currently misses support for atomic add and sub, but those are not used right now.
Applied in SVN. If I've introduced any error (I've applied the patch by hand), please feel free to commit a fix.
There are a few issues I would like to raise regarding the atomic operations:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated.
They should imply a barrier. Since this was an internal header, I haven't documented anything. Now that people is contributing, at least, I should add a comment to the header.
2) atomic_sub32 does not return the old value, but atomic_add32 does. This seems inconsistent to me.
I agree. Nevertheless, I think atomic_add32 is enough since atomic_sub32 can be implemented as with atomic_add32(boost::uint32(-val)). I'm planning on removing some unused operations, to maybe is time to remove atomic_add32/atomic_sub32.
3) Has the use of the low level atomic ops API as proposed in WG21/N2047 (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html) been considered. If yes, why has it been rejected? If no, would it makes sense to use this API?
If you are talking about using N2047 in Interprocess, I always thought that boost/interprocess/detail/atomic.hpp was a temporary solution until an atomic operations library was developed. I just needed some basic operations and a full barriers were good enough for me. Developing an atomic operations library is rocket science for me.
4) Is there a need for an atomic ops boost library? I seem to remember that other libraries need atomic ops as well.
Definitely yes. I'm still surprised that there are no atomic operations on Boost apart from shared_ptr internals. I think we have people with deep knowledge on atomic operations, so let's put some pressure on the mailing list...
Regards, Markus
Regards, Ion

Ion Gaztañaga wrote:
Markus Schöpflin escribió:
Hello,
attached patch introduces support for the atomic ops needed by interprocess on Tru64/CXX/Alpha. It currently misses support for atomic add and sub, but those are not used right now.
Applied in SVN. If I've introduced any error (I've applied the patch by hand), please feel free to commit a fix.
There are a few issues I would like to raise regarding the atomic operations:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated.
They should imply a barrier. Since this was an internal header, I haven't documented anything. Now that people is contributing, at least, I should add a comment to the header.
OK, I need to add memory barriers to the code then.
2) atomic_sub32 does not return the old value, but atomic_add32 does. This seems inconsistent to me.
I agree. Nevertheless, I think atomic_add32 is enough since atomic_sub32 can be implemented as with atomic_add32(boost::uint32(-val)). I'm planning on removing some unused operations, to maybe is time to remove atomic_add32/atomic_sub32.
The interface for add32 takes unsigned integers, so you can't just call add_32(-val). But as both operations are unused, it's probably best to remove them.
3) Has the use of the low level atomic ops API as proposed in WG21/N2047 (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html) been considered. If yes, why has it been rejected? If no, would it makes sense to use this API?
If you are talking about using N2047 in Interprocess, I always thought that boost/interprocess/detail/atomic.hpp was a temporary solution until an atomic operations library was developed. I just needed some basic operations and a full barriers were good enough for me. Developing an atomic operations library is rocket science for me.
Understood.
4) Is there a need for an atomic ops boost library? I seem to remember that other libraries need atomic ops as well.
Definitely yes. I'm still surprised that there are no atomic operations on Boost apart from shared_ptr internals. I think we have people with deep knowledge on atomic operations, so let's put some pressure on the mailing list...
Probably not the best time right now, being close to the next release, but otherwise I agree with you. Regards, Markus

Markus Schöpflin:
Ion Gaztañaga wrote:
Markus Schöpflin escribió:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated.
They should imply a barrier. Since this was an internal header, I haven't documented anything. Now that people is contributing, at least, I should add a comment to the header.
OK, I need to add memory barriers to the code then.
You need to know what kind of memory synchronization is implied. Acquire for atomic load, release for atomic store and acquire+release is a reasonably safe bet in situations such as this one where the author isn't quite sure. :-) Last I looked at the various implementations, most of the routines did not provide these guarantees, though. Your use of volatile is also a bit suspect. Volatile operations may be atomic without implying an acquire (release) constraint, both for the hardware and for the compiler. On Alpha, you'll probably need to add a memory barrier after the loads, a memory barrier before the stores, and one before and one after the read/modify/write operations. This would require that the compiler is smart enough to recognize the barrier and not move code across.

Peter Dimov wrote:
Markus Schöpflin:
Ion Gaztañaga wrote:
Markus Schöpflin escribió:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated. They should imply a barrier. Since this was an internal header, I haven't documented anything. Now that people is contributing, at least, I should add a comment to the header. OK, I need to add memory barriers to the code then.
You need to know what kind of memory synchronization is implied. Acquire for atomic load, release for atomic store and acquire+release is a reasonably safe bet in situations such as this one where the author isn't quite sure. :-)
Thanks for you input, Peter. That's what I was going to do, hoping it is the right thing.
Last I looked at the various implementations, most of the routines did not provide these guarantees, though.
That's why I (incorrectly) thought that maybe no memory barriers were implied.
Your use of volatile is also a bit suspect. Volatile operations may be atomic without implying an acquire (release) constraint, both for the hardware and for the compiler.
Yes, I am aware that these don't imply a memory barrier. To the best of my knowledge, you have to explicitly state each and every memory barrier on Alpha, you will mostly not get an implicit one, especially not when using the built-in atomic ops.
On Alpha, you'll probably need to add a memory barrier after the loads, a memory barrier before the stores, and one before and one after the read/modify/write operations. This would require that the compiler is smart enough to recognize the barrier and not move code across.
I think this is the case here. Thank you very much, Markus

Peter Dimov escribió:
Last I looked at the various implementations, most of the routines did not provide these guarantees, though.
Willing to help on these implementations? We also need to write our own versions to avoid license issues with the original Apache implementation (basically we need our implementation of gcc Intel & PowerPC asm versions). For Interprocess uses, a full barrier is enough (and safe). Regards, Ion

Ion Gazta?aga wrote:
Willing to help on these implementations? We also need to write our own versions to avoid license issues with the original Apache implementation (basically we need our implementation of gcc Intel & PowerPC asm versions).
My (humble) suggestion would be to start with the compiler builtins, rather than by writing assembler. My understanding is that gcc >= 4.1 on virtually all platforms and the Intel compiler have compatible sets of builtins, and the Microsoft compiler has something equivalent. I don't know about the other Boost-supported compilers - can anyone fill in the gaps? The biggest hole that this would leave is gcc < 4.1; what's the policy about ongoing support for older compilers? Phil.

Phil Endecott escribió:
My (humble) suggestion would be to start with the compiler builtins, rather than by writing assembler. My understanding is that gcc >= 4.1 on virtually all platforms and the Intel compiler have compatible sets of builtins, and the Microsoft compiler has something equivalent. I don't know about the other Boost-supported compilers - can anyone fill in the gaps? The biggest hole that this would leave is gcc < 4.1; what's the policy about ongoing support for older compilers?
In the working version (still in my machine) gcc builtins will be used for gcc > 4.1, but I need asm implementations for gcc < 4.1 (Intel + PowerPC).
Phil.
Regards, Ion

Ion Gazta?aga wrote:
In the working version (still in my machine) gcc builtins will be used for gcc > 4.1, but I need asm implementations for gcc < 4.1 (Intel + PowerPC).
Hi Ion, Two (crazy!) options: - Compile with a new gcc, and ship the resulting .s files. Disadvantages: I doubt that the Boost build tools (etc.) are set up for this, and it won't let the atomic operations be inline. - Compile with a new gcc, and copy-and-paste the assembler from the resulting .s file into asm statements. Regards, Phil.

Phil Endecott escribió:
Ion Gazta?aga wrote:
In the working version (still in my machine) gcc builtins will be used for gcc > 4.1, but I need asm implementations for gcc < 4.1 (Intel + PowerPC).
Hi Ion,
Two (crazy!) options:
- Compile with a new gcc, and ship the resulting .s files. Disadvantages: I doubt that the Boost build tools (etc.) are set up for this, and it won't let the atomic operations be inline.
- Compile with a new gcc, and copy-and-paste the assembler from the resulting .s file into asm statements.
Good idea. The only problem is that I have no access to a PowerPC machine (installing a cross-compiler is an option, but last time I tried to build one in cygwin I had no success). I think this could be done better someone that knows how to write asm (and that's not my case;-)). If there are no volunteers, I'll try to do it myself. Regards, Ion

Ion Gaztañaga:
Peter Dimov escribió:
Last I looked at the various implementations, most of the routines did not provide these guarantees, though.
Willing to help on these implementations? We also need to write our own versions to avoid license issues with the original Apache implementation (basically we need our implementation of gcc Intel & PowerPC asm versions).
There are various sp_counted_base implementations in boost/detail that can be used as a starting point. But if you want help for the implementation, you need to specify an interface first. I'm not particularly fond of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2393.html preferring something along the lines of http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2195.html but no matter what (if any) style you choose to follow, you need to be explicit about the constraints. At the very least, the functions should contain the constraint as a suffix, i.e. atomic_add_acqrel instead of atomic_add.

Peter Dimov escribió:
Ion Gaztañaga:
Peter Dimov escribió:
Last I looked at the various implementations, most of the routines did not provide these guarantees, though. Willing to help on these implementations? We also need to write our own versions to avoid license issues with the original Apache implementation (basically we need our implementation of gcc Intel & PowerPC asm versions).
There are various sp_counted_base implementations in boost/detail that can be used as a starting point. But if you want help for the implementation, you need to specify an interface first. I'm not particularly fond of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2393.html
preferring something along the lines of
http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2195.html
I basically need to write (some) functions described here: http://svn.boost.org/trac/boost/browser/trunk/boost/interprocess/detail/atom... Not all functions are necessary (some are not used, so I'm going to remove them). Full barrier semantics would be a safe bet for a non-expert like me, although I'm not sure if they should be applied to atomic_read or atomic_write: //! Atomically increment an apr_uint32_t by 1 //! "mem": pointer to the object //! Returns the old value pointed to by mem inline boost::uint32_t atomic_inc32(volatile boost::uint32_t *mem); //! Atomically decrement an boost::uint32_t by 1 //! "mem": pointer to the atomic value //! Returns false if the value becomes zero on decrement, otherwise true inline bool atomic_dec32(volatile boost::uint32_t *mem); //! Atomically read an boost::uint32_t from memory inline boost::uint32_t atomic_read32(volatile boost::uint32_t *mem); //! Atomically set an boost::uint32_t in memory //! "mem": pointer to the object //! "param": val value that the object will assume inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t val); //! Compare an boost::uint32_t's value with "cmp". //! If they are the same swap the value with "with" //! "mem": pointer to the value //! "with": what to swap it with //! "cmp": the value to compare it to //! Returns the old value of *mem inline boost::uint32_t atomic_cas32 (volatile boost::uint32_t *mem, boost::uint32_t with, boost::uint32_t cmp);
but no matter what (if any) style you choose to follow, you need to be explicit about the constraints. At the very least, the functions should contain the constraint as a suffix, i.e. atomic_add_acqrel instead of atomic_add.
If you prefer to change the names, there is no problem. This is an internal header. Maybe it's better to place them somewhere in boost/detail if someone wants to reuse them, but I think that would require a review. Regards, Ion

Peter Dimov schrieb:
There are various sp_counted_base implementations in boost/detail that can be used as a starting point. But if you want help for the implementation, you need to specify an interface first. I'm not particularly fond of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2393.html
preferring something along the lines of
http://open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2195.html
but no matter what (if any) style you choose to follow, you need to be explicit about the constraints. At the very least, the functions should contain the constraint as a suffix, i.e. atomic_add_acqrel instead of atomic_add.
Peter, I have read your paper referenced above and a few of the other papers dealing with atomic ops, but I failed to find a clear definition of what 'acquire' and 'release' semantics are supposed to mean. Could you point me into the right direction, please? Also, in your paper you're referring to [intro.concur], where could I find this? TIA, Markus

Markus Schöpflin:
Peter,
I have read your paper referenced above and a few of the other papers dealing with atomic ops, but I failed to find a clear definition of what 'acquire' and 'release' semantics are supposed to mean. Could you point me into the right direction, please? Also, in your paper you're referring to [intro.concur], where could I find this?
I think that the relevant paper at the moment is http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm Basically, if a thread A load-acquires a memory location that has been store-released by thread B, then after the load thread A is guaranteed to see the updates that thread B has done prior to the store.

Peter Dimov wrote:
Markus Schöpflin:
Peter,
I have read your paper referenced above and a few of the other papers dealing with atomic ops, but I failed to find a clear definition of what 'acquire' and 'release' semantics are supposed to mean. Could you point me into the right direction, please? Also, in your paper you're referring to [intro.concur], where could I find this?
I think that the relevant paper at the moment is
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm
Basically, if a thread A load-acquires a memory location that has been store-released by thread B, then after the load thread A is guaranteed to see the updates that thread B has done prior to the store.
Thank you. Markus

Peter Dimov wrote:
Markus Schöpflin:
Peter,
I have read your paper referenced above and a few of the other papers dealing with atomic ops, but I failed to find a clear definition of what 'acquire' and 'release' semantics are supposed to mean. Could you point me into the right direction, please? Also, in your paper you're referring to [intro.concur], where could I find this?
I think that the relevant paper at the moment is
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm
Basically, if a thread A load-acquires a memory location that has been store-released by thread B, then after the load thread A is guaranteed to see the updates that thread B has done prior to the store.
Peter, I have been reading your paper again and I'm puzzled by the following sentence (section 'Fences'): "Release, ensures that all preceding operations in program order are performed after all subsequent stores in program order;". Shouldn't that read that all preceding operations are performed _before_ all subsequent stores, instead of after? Markus

Peter Dimov wrote:
Markus Schöpflin:
Ion Gaztañaga wrote:
Markus Schöpflin escribió:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated. They should imply a barrier. Since this was an internal header, I haven't documented anything. Now that people is contributing, at least, I should add a comment to the header. OK, I need to add memory barriers to the code then.
You need to know what kind of memory synchronization is implied. Acquire for atomic load, release for atomic store and acquire+release is a reasonably safe bet in situations such as this one where the author isn't quite sure. :-)
Last I looked at the various implementations, most of the routines did not provide these guarantees, though.
Your use of volatile is also a bit suspect. Volatile operations may be atomic without implying an acquire (release) constraint, both for the hardware and for the compiler.
On Alpha, you'll probably need to add a memory barrier after the loads, a memory barrier before the stores, and one before and one after the read/modify/write operations. This would require that the compiler is smart enough to recognize the barrier and not move code across.
OK, I finally got around to add memory barriers to the atomic primitives on Alpha. The corresponding check-in can be found here: http://svn.boost.org/trac/boost/changeset/40967 I hope I got the acquire/release things sorted out correctly. Ion, could you perhaps check if the semantics now match the intended usage? BTW, I added a copyright note to the top of the file, IIUC I'm required to do this. If not, feel free to remove it again. Regards, Markus

Markus Schöpflin escribió:
I hope I got the acquire/release things sorted out correctly. Ion, could you perhaps check if the semantics now match the intended usage?
I don't know much about acquire/release semantics, but I think your changes are safe.
BTW, I added a copyright note to the top of the file, IIUC I'm required to do this. If not, feel free to remove it again.
I don't know if you are required to do it, but I'm glad you've add it.
Regards, Markus
Thanks for you help! Ion

Hi Markus, Thank you for continuing to support Boost on Tru64. I've asked Rich Peterson to review your implementation. I don't know yet when (and if) Rich will be able to do it. I think #elif defined(__osf__) should be: #elif defined(__osf__) && defined(__DECCXX) to not break compilation with gcc. I'm a bit surprised you did not use __ATOMIC_EXCH_LONG in atomic_write32(). I'm also not sure why you decided to implement atomic_cas32() in asm language instead of using __CMP_STORE_LONG: /* ** Compare, Store Longword/Quadword ** If *source matches old_value, store new_value in *dest, returning ** 0 if no match or if compare and store were not interlocked. ** NOTE: Memory Barrier only within the LDx_L/STx_C sequence. */ int __CMP_STORE_LONG(volatile void *__source, int __old_value, int __new_value, volatile void *__dest); For atomic_cas32(), __source and __dest would be the same. See /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/string_ref for how we use atomic builtins in RW library. Anyway, Rich is an expert in this stuff and, hopefully, he will be able to review your implementation. Thanks, Boris ----- Original Message ----- From: "Markus Schöpflin" <markus.schoepflin@comsoft.de> To: <boost@lists.boost.org> Sent: Thursday, October 18, 2007 10:35 AM Subject: [boost] [interprocess] Atomic ops supports for Alpha processors.
Hello,
attached patch introduces support for the atomic ops needed by interprocess on Tru64/CXX/Alpha. It currently misses support for atomic add and sub, but those are not used right now.
There are a few issues I would like to raise regarding the atomic operations:
1) Currently it is not specified whether an atomic operation implies a memory barrier or not. This should be explicitly stated.
2) atomic_sub32 does not return the old value, but atomic_add32 does. This seems inconsistent to me.
3) Has the use of the low level atomic ops API as proposed in WG21/N2047 (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html) been considered. If yes, why has it been rejected? If no, would it makes sense to use this API?
4) Is there a need for an atomic ops boost library? I seem to remember that other libraries need atomic ops as well.
Regards, Markus
--------------------------------------------------------------------------------
Index: atomic.hpp =================================================================== --- atomic.hpp (revision 40078) +++ atomic.hpp (working copy) @@ -436,6 +436,82 @@ } //namespace interprocess{ } //namespace boost{
+#elif defined(__osf__) + +#include <machine/builtins.h> +#include <c_asm.h> + +namespace boost{ +namespace interprocess{ +namespace detail{ + +//! Atomically increment an apr_uint32_t by 1 +//! "mem": pointer to the object +//! Returns the old value pointed to by mem +inline boost::uint32_t atomic_inc32(volatile boost::uint32_t *mem) +{ return __ATOMIC_INCREMENT_LONG(mem); } + +//! Atomically decrement an boost::uint32_t by 1 +//! "mem": pointer to the atomic value +//! Returns false if the value becomes zero on decrement, otherwise true +inline bool atomic_dec32(volatile boost::uint32_t *mem) +{ return __ATOMIC_DECREMENT_LONG(mem); } + +// Rational for the implementation of the atomic read and write functions. +// +// 1. The Alpha Architecture Handbook requires that access to a byte, +// an aligned word, an aligned longword, or an aligned quadword is +// atomic. (See 'Alpha Architecture Handbook', version 4, chapter 5.2.2.) +// +// 2. The CXX User's Guide states that volatile quantities are accessed +// with single assembler instructions, and that a compilation error +// occurs when declaring a quantity as volatile which is not properly +// aligned. + +//! Atomically read an boost::uint32_t from memory +inline boost::uint32_t atomic_read32(volatile boost::uint32_t *mem) +{ return *mem; } + +//! Atomically set an boost::uint32_t in memory +//! "mem": pointer to the object +//! "param": val value that the object will assume +inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t val) +{ *mem = val; } + +//! Compare an boost::uint32_t's value with "cmp". +//! If they are the same swap the value with "with" +//! "mem": pointer to the value +//! "with" what to swap it with +//! "cmp": the value to compare it to +//! Returns the old value of *mem +inline boost::uint32_t atomic_cas32 + (volatile boost::uint32_t *mem, boost::uint32_t with, boost::uint32_t cmp) +{ + // Notes: + // + // 1. Branch prediction prefers branches, as we assume that the lock + // is not stolen usually, we branch forward conditionally on success + // of the store, and not conditionally backwards on failure. + // + // 2. The memory lock is invalidated when a branch is taken between + // load and store. Therefore we can only branch if we don't need a + // store. + + return asm("10: ldl_l %v0,(%a0) ;" // load prev value from mem and lock mem + " cmpeq %v0,%a2,%t0 ;" // compare with given value + " beq %t0,20f ;" // if not equal, we're done + " mov %a1,%t0 ;" // load new value into scratch register + " stl_c %t0,(%a0) ;" // store new value to locked mem (overwriting scratch) + " bne %t0,20f ;" // store succeeded, we're done + " br 10b ;" // lock has been stolen, retry + "20: ", + mem, with, cmp); +} + +} //namespace detail{ +} //namespace interprocess{ +} //namespace boost{ + #else
#error No atomic operations implemented for this platform, sorry!
--------------------------------------------------------------------------------
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Boris Gubenko wrote:
Hi Markus,
Thank you for continuing to support Boost on Tru64.
I've asked Rich Peterson to review your implementation. I don't know yet when (and if) Rich will be able to do it.
That would be really great if someone knowledgeable could review the implementation.
I think
#elif defined(__osf__)
should be:
#elif defined(__osf__) && defined(__DECCXX)
to not break compilation with gcc.
You are right. I will add it.
I'm a bit surprised you did not use __ATOMIC_EXCH_LONG in atomic_write32().
Two reasons. 1) I didn't think of it. 2) The guarantees given by volatile seemed OK. Do you think I should change it to use __ATOMIC_EXCH_LONG?
I'm also not sure why you decided to implement atomic_cas32() in asm language instead of using __CMP_STORE_LONG:
[...] Because cas32 returns the old value, but __CMP_STORE_LONG does not.
Anyway, Rich is an expert in this stuff and, hopefully, he will be able to review your implementation.
Looking forward to it. Markus

Markus Schoepflin wrote:
Boris Gubenko wrote:
I'm a bit surprised you did not use __ATOMIC_EXCH_LONG in atomic_write32().
Two reasons. 1) I didn't think of it. 2) The guarantees given by volatile seemed OK.
Do you think I should change it to use __ATOMIC_EXCH_LONG?
I do. To see the difference, you can compare the code generated for foo() and bar() in x.cxx below. Note ldl_l/stl_c in bar(). x.cxx ----- #include <machine/builtins.h> void foo (volatile int *mem, int val) { *mem = val; } void bar (volatile int *mem, int val) { __ATOMIC_EXCH_LONG(mem, val); }
I'm also not sure why you decided to implement atomic_cas32() in asm language instead of using __CMP_STORE_LONG:
[...]
Because cas32 returns the old value, but __CMP_STORE_LONG does not.
I see. Thanks, Boris ----- Original Message ----- From: "Markus Schöpflin" <markus.schoepflin@comsoft.de> To: <boost@lists.boost.org> Sent: Friday, October 19, 2007 4:48 AM Subject: Re: [boost] [interprocess] Atomic ops supports for Alpha processors.
Boris Gubenko wrote:
Hi Markus,
Thank you for continuing to support Boost on Tru64.
I've asked Rich Peterson to review your implementation. I don't know yet when (and if) Rich will be able to do it.
That would be really great if someone knowledgeable could review the implementation.
I think
#elif defined(__osf__)
should be:
#elif defined(__osf__) && defined(__DECCXX)
to not break compilation with gcc.
You are right. I will add it.
I'm a bit surprised you did not use __ATOMIC_EXCH_LONG in atomic_write32().
Two reasons. 1) I didn't think of it. 2) The guarantees given by volatile seemed OK.
Do you think I should change it to use __ATOMIC_EXCH_LONG?
I'm also not sure why you decided to implement atomic_cas32() in asm language instead of using __CMP_STORE_LONG:
[...]
Because cas32 returns the old value, but __CMP_STORE_LONG does not.
Anyway, Rich is an expert in this stuff and, hopefully, he will be able to review your implementation.
Looking forward to it.
Markus
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Boris Gubenko wrote:
Markus Schoepflin wrote:
[...]
Do you think I should change it to use __ATOMIC_EXCH_LONG?
I do. To see the difference, you can compare the code generated for foo() and bar() in x.cxx below. Note ldl_l/stl_c in bar().
x.cxx ----- #include <machine/builtins.h> void foo (volatile int *mem, int val) { *mem = val; }
.globl __7foo__FPVii .ent __7foo__FPVii 0000 __7foo__FPVii: .frame $sp, 0, $26 .prologue 0 .context full 0000 trapb 0004 stl val, (r16) 0008 ret (r26) .end __7foo__FPVii Here the compiler generates a trap barrier followed by a store instruction. As of chapter 5.2.2 of the Alpha architecture handbook, the access is guaranteed to be performed in a single atomic operation.
void bar (volatile int *mem, int val) { __ATOMIC_EXCH_LONG(mem, val); }
.globl __7bar__FPVii .ent __7bar__FPVii 0010 __7bar__FPVii: .frame $sp, 0, $26 .prologue 0 0010 L$2: .context full 0010 mov val, r0 0014 ldl_l r1, (r16) 0018 stl_c r0, (r16) 001C unop 0020 beq r0, L$2 0024 ret (r26) Here the compiler generates a 'load locked' and 'store conditionally' sequence, wrapped by a loop repeated until the load/store has succeeded. I don't see why this should give me any advantage over the previous, when all I want is an atomic store, and I am not interested in the previous value. Could you please tell me? Also, I now have two more questions, which you can probably answer: 1) Why is a trap barrier created in the first case, but not in the second? 2) According to the Alpha architecture handbook, branch prediction predicts backward branches to be taken, and it is recommended not to implement the load/store like above. (See documentation for STx_C, chapter 4.2.5.) Is this no longer true? Thank you for your help, Markus

Markus Schoepflin wrote:
void foo (volatile int *mem, int val) { *mem = val; }
.globl __7foo__FPVii .ent __7foo__FPVii 0000 __7foo__FPVii: .frame $sp, 0, $26 .prologue 0 .context full 0000 trapb 0004 stl val, (r16) 0008 ret (r26) .end __7foo__FPVii
Here the compiler generates a trap barrier followed by a store instruction. As of chapter 5.2.2 of the Alpha architecture handbook, the access is guaranteed to be performed in a single atomic operation.
I'm not sure what trapb has to do with the issue at hand. trapb is not memory barrier. According to "Tru64 Unix Assembly Language Programmer's Guide", trapb "Guarantees that all previous *arithmetic* [emphasis mine] instructions are completed, without incurring any arithmetic traps, before any instructions after the trapb instruction are issued.". Besides, I'm not sure it is guaranteed that the compiler will always generate trapb in a function like foo() above.
void bar (volatile int *mem, int val) { __ATOMIC_EXCH_LONG(mem, val); }
.globl __7bar__FPVii .ent __7bar__FPVii 0010 __7bar__FPVii: .frame $sp, 0, $26 .prologue 0 0010 L$2: .context full 0010 mov val, r0 0014 ldl_l r1, (r16) 0018 stl_c r0, (r16) 001C unop 0020 beq r0, L$2 0024 ret (r26)
Here the compiler generates a 'load locked' and 'store conditionally' sequence, wrapped by a loop repeated until the load/store has succeeded. I don't see why this should give me any advantage over the previous, when all I want is an atomic store, and I am not interested in the previous value. Could you please tell me?
I did not pay attention to the fact that atomic_write32() is a void function. If it was returning value of the memory location to be updated, then interlocked memory instructions would be necessary. For a void function, perhaps, just 'stl' is fine meaning that your implementation of atomic_write32() if fine. One advantage of using __ATOMIC_EXCH_LONG I can see is that it enforces proper alignment of its first argument (by aborting the process if it is not properly aligned). Still, looking at atomic.hpp, I'm not sure why on some other architectures atomic_write32() is implemented using special instructions like: winapi::interlocked_exchange((volatile long*)mem, val); or atomic_xchg32(mem, val);
Also, I now have two more questions, which you can probably answer:
1) Why is a trap barrier created in the first case, but not in the second?
2) According to the Alpha architecture handbook, branch prediction predicts backward branches to be taken, and it is recommended not to implement the load/store like above. (See documentation for STx_C, chapter 4.2.5.) Is this no longer true?
Unfortunately, I cannot answer any of these questions, hopefully, somebody more knowledgeable in this area will. For a trapb, as I said before, I don't think it has anything to do with the issue.
Thank you for your help, Markus
Thank you for the interesting discussion and for all your efforts. Much appreciated. Boris P.S. I'm disconnecting shortly (leaving for Rhode Island marathon to be held tomorrow and won't have access to the computer until Sunday).
participants (6)
-
Boris Gubenko
-
Ion Gaztañaga
-
Markus Schöpflin
-
Markus Schöpflin
-
Peter Dimov
-
Phil Endecott