Patch to boost/detail/interlocked.hpp

I am trying to change my call_once implementation for boost.thread, so it doesn't require windows.h to be included. In doing so, I tried to use boost/detail/interlocked.hpp, and discovered that it doesn't work on gcc-mingw-4.0.1, since the imported functions are declared in boost::detail namespace, and the macros don't qualify the names. Attached is a patch for the file that changes the macros to refer to the imported functions by fully qualified names. Please can someone check and commit this for me. Anthony -- Anthony Williams Software Developer

Anthony Williams wrote:
I am trying to change my call_once implementation for boost.thread, so it doesn't require windows.h to be included. In doing so, I tried to use boost/detail/interlocked.hpp, and discovered that it doesn't work on gcc-mingw-4.0.1, since the imported functions are declared in boost::detail namespace, and the macros don't qualify the names.
Attached is a patch for the file that changes the macros to refer to the imported functions by fully qualified names.
Please can someone check and commit this for me.
Apologies for not being able to respond earlier. I have some (very) minor comments about your changes. Qualifying _Interlocked* with :: appears to work as intended - although I haven't tested it extensively - but looks very odd for someone reading the code; these identifiers are reserved and there shouldn't be any namespace clashes, so it might be best to leave them unqualified. BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp (macro vs inline aside). The aim of this header is only to provide the Interlocked* functions as specified and documented by Microsoft without including <windows.h>; it is not meant to introduce new unspecified and undocumented functionality. Finally, I believe that for correct double-checked locking you only need a load with acquire barrier on the fast path - which maps to an ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a fully locked cmpxchg you're introducing a performance penalty (the philosophical debate of whether InterlockedCompareExchange is guaranteed to enforce memory ordering when the comparison fails aside.)

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
I am trying to change my call_once implementation for boost.thread, so it doesn't require windows.h to be included. In doing so, I tried to use boost/detail/interlocked.hpp, and discovered that it doesn't work on gcc-mingw-4.0.1, since the imported functions are declared in boost::detail namespace, and the macros don't qualify the names.
Attached is a patch for the file that changes the macros to refer to the imported functions by fully qualified names.
Please can someone check and commit this for me.
Apologies for not being able to respond earlier.
I have some (very) minor comments about your changes. Qualifying _Interlocked* with :: appears to work as intended - although I haven't tested it extensively - but looks very odd for someone reading the code; these identifiers are reserved and there shouldn't be any namespace clashes, so it might be best to leave them unqualified.
OK. Changed.
BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp (macro vs inline aside). The aim of this header is only to provide the Interlocked* functions as specified and documented by Microsoft without including <windows.h>; it is not meant to introduce new unspecified and undocumented functionality.
Fair enough. I'll move them elsewhere. I used macros rather than inline functions, for consistency with the rest of the INTERLOCKED stuff. Maybe inline functions are more appropriate, since these are users of the INTERLOCKED functions rather than direct mappings. Moved to boost/thread/detail/interlocked_read_win32.hpp.
Finally, I believe that for correct double-checked locking you only need a load with acquire barrier on the fast path - which maps to an ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a fully locked cmpxchg you're introducing a performance penalty (the philosophical debate of whether InterlockedCompareExchange is guaranteed to enforce memory ordering when the comparison fails aside.)
Is there an intrinsic function for that? I couldn't find one, which is why I left it at InterlockedCompareExchange. I guess it could use InterlockedCompareExchangeAcquire, which reduces the locking penalty. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp (macro vs inline aside). The aim of this header is only to provide the Interlocked* functions as specified and documented by Microsoft without including <windows.h>; it is not meant to introduce new unspecified and undocumented functionality.
Fair enough. I'll move them elsewhere. I used macros rather than inline functions, for consistency with the rest of the INTERLOCKED stuff. Maybe inline functions are more appropriate, since these are users of the INTERLOCKED functions rather than direct mappings.
Moved to boost/thread/detail/interlocked_read_win32.hpp.
Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the thread_rewrite branch closely but a quick glance didn't reveal anything. The semantics of InterlockedRead are probably a fully-fenced read? Few lock-free algorithms need that.
Finally, I believe that for correct double-checked locking you only need a load with acquire barrier on the fast path - which maps to an ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a fully locked cmpxchg you're introducing a performance penalty (the philosophical debate of whether InterlockedCompareExchange is guaranteed to enforce memory ordering when the comparison fails aside.)
Is there an intrinsic function for that? I couldn't find one, which is why I left it at InterlockedCompareExchange. I guess it could use InterlockedCompareExchangeAcquire, which reduces the locking penalty.
No, there is no documented way to implement ld.acq using the Windows API. A volatile read appears to work properly on all Windows targets/compilers, and there are probably thousands of lines of existing code that depend on it, but this wasn't specified anywhere. The newer MSVC 8 documentation finally promises that a volatile read has acquire semantics and that a volatile store has release semantics, even on IA-64, and the compiler also seems to understand these reordering constraints. http://msdn2.microsoft.com/en-us/library/12a04hfd The Intel compiler seems to have an option, serialize-volatile, that appears to be on by default; so it seems to also enforce acq/rel volatiles. As I see it, the implementation options are (1) use a volatile read, live dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly (painful), (3) use a fully-locked implementation and suffer the performance consequences - my preference is InterlockedExchangeAdd with zero. Either way, the actual helper function should be named atomic_load_acq and specified to promise acquire semantics, in my opinion.

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
BOOST_INTERLOCKED_READ doesn't really belong in interlocked.hpp (macro vs inline aside). The aim of this header is only to provide the Interlocked* functions as specified and documented by Microsoft without including <windows.h>; it is not meant to introduce new unspecified and undocumented functionality.
Fair enough. I'll move them elsewhere. I used macros rather than inline functions, for consistency with the rest of the INTERLOCKED stuff. Maybe inline functions are more appropriate, since these are users of the INTERLOCKED functions rather than direct mappings.
Moved to boost/thread/detail/interlocked_read_win32.hpp.
Where is BOOST_INTERLOCKED_READ being used, by the way? I don't follow the thread_rewrite branch closely but a quick glance didn't reveal anything. The semantics of InterlockedRead are probably a fully-fenced read? Few lock-free algorithms need that.
It's used in thread/detail/lightweight_mutex_win32.hpp, thread/detail/read_write_mutex_win32.hpp and thread/detail/condition_win32.hpp I'm using it to ensure that a read from a variable is either before or after an interlocked_exchange or interlocked_increment, not midway through. I figured that if one use of a variable was interlocked, others better had be too. Maybe I'm wrong. I haven't thought about it *that* hard.
Finally, I believe that for correct double-checked locking you only need a load with acquire barrier on the fast path - which maps to an ordinary load on x86(-64) and to ld.acq on IA-64 - and by using a fully locked cmpxchg you're introducing a performance penalty (the philosophical debate of whether InterlockedCompareExchange is guaranteed to enforce memory ordering when the comparison fails aside.)
Is there an intrinsic function for that? I couldn't find one, which is why I left it at InterlockedCompareExchange. I guess it could use InterlockedCompareExchangeAcquire, which reduces the locking penalty.
No, there is no documented way to implement ld.acq using the Windows API. A volatile read appears to work properly on all Windows targets/compilers, and there are probably thousands of lines of existing code that depend on it, but this wasn't specified anywhere.
The newer MSVC 8 documentation finally promises that a volatile read has acquire semantics and that a volatile store has release semantics, even on IA-64, and the compiler also seems to understand these reordering constraints.
http://msdn2.microsoft.com/en-us/library/12a04hfd
The Intel compiler seems to have an option, serialize-volatile, that appears to be on by default; so it seems to also enforce acq/rel volatiles.
As I see it, the implementation options are (1) use a volatile read, live dangerously, be ridiculed by Alexander Terekhov, (2) use inline assembly (painful), (3) use a fully-locked implementation and suffer the performance consequences - my preference is InterlockedExchangeAdd with zero.
Either way, the actual helper function should be named atomic_load_acq and specified to promise acquire semantics, in my opinion.
Thank you for the details. I don't fancy either of the first two options, as I don't know IA-64 or AMD64 assembly, and I don't feel safe relying on volatile semantics unless it's really guaranteed correct on all supported compilers. Is InterlockedExchangeAdd faster/more reliable in some way than InterlockedCompareExchange? Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
participants (2)
-
Anthony Williams
-
Peter Dimov