You are right. I fixed it and here is the new implementation: inline void atomic_increment( int * pw ) { _Asm_fetchadd(_FASZ_W, _SEM_ACQ, (void*)pw, +1, _LDHINT_NONE); } inline int atomic_decrement( int * pw ) { int r = static_cast<int>(_Asm_fetchadd(_FASZ_W, _SEM_ACQ, (void*)pw, -1, _LDHINT_NONE)); if (1 == r) { _Asm_mf(); } return r - 1; } inline int atomic_conditional_increment( int * pw ) { int v = *pw; for (;;) { if (0 == v) { return 0; } _Asm_mov_to_ar((_Asm_app_reg)_AREG_CCV, v, (_Asm_fence)(_UP_CALL_FENCE | _UP_SYS_FENCE | _DOWN_CALL_FENCE | _DOWN_SYS_FENCE)); int r =_Asm_cmpxchg((_Asm_sz)_SZ_W, (_Asm_sem)_SEM_ACQ, pw, v + 1, (_Asm_ldhint)_LDHINT_NONE); if (r == v) { return r + 1; } v = r; } } -----Original Message----- From: Peter Dimov [mailto:pdimov@mmltd.net] Sent: Wednesday, July 25, 2007 5:55 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Support for sp_counted_base for HP ItaniumaCCcompiler I'm not familiar with IA64 or the HP intrinsics, but a few quick comments: Baruch Zilber wrote:
inline void atomic_increment( int * pw ) { _Asm_mf(); static_cast<int>(_Asm_fetchadd(_FASZ_W, _SEM_REL, (void*)pw, +1, _LDHINT_NONE) + 1); }
The mf is redundant; _SEM_REL has the same effect. This should probably be inline void atomic_increment( int * pw ) { _Asm_fetchadd(_FASZ_W, _SEM_REL, pw, +1, _LDHINT_NONE); }
inline int atomic_decrement( int * pw ) { _Asm_mf(); return (static_cast<int>(_Asm_fetchadd(_FASZ_W, _SEM_REL, (void*)pw, -1, _LDHINT_NONE) - 1) - 1); }
The leading mf is redundant here, too. In addition, an acquire barrier in the zero case is missing, and there are two -1's where probably just one is needed. In pseudocode, atomic_decrement needs to be: int r = fetchadd4.rel( pw, -1 ); if( r == 1 ) ld4.acq( pw ); // or mf return r - 1; So, a wild guess: inline int atomic_decrement( int * pw ) { int r = (int)_Asm_fetchadd( _FASZ_W, _SEM_REL, pw, -1, _LDHINT_NONE ); if( r == 1 ) _Asm_mf(); return -1; } We might be able to replace the _Asm_mf with _Asm_ld, but I'm not sure whether it will generate ld4.acq.
inline int atomic_conditional_increment( int * pw ) { return _Asm_mov_to_ar((_Asm_app_reg)_AREG_CCV, *pw, (_Asm_fence)(_UP_CALL_FENCE | _UP_SYS_FENCE | _DOWN_CALL_FENCE | _DOWN_SYS_FENCE)), _Asm_mf(), (_Asm_cmpxchg((_Asm_sz)4, (_Asm_sem)_SEM_REL, pw, *pw + 1, (_Asm_ldhint)_LDHINT_NONE)); }
This doesn't look correct to me. In pseudocode, I believe that it needs to be: int v = *pw; for(;;) { if( v == 0 ) return 0; int r = cmpxchg( pw, v /*old*/, v+1 /*new*/ ); if( r == v ) return r+1; v = r; } The above code seems to implement just the cmpxchg primitive. The mf is redundant in this case, too. This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement, you may review at http://www.amdocs.com/email_disclaimer.asp