Re: [boost] [lockfree] review

30 Aug 2011

      Alexander Terekhov <terekhov <at> web.de> writes:
...
Hans Boehm wrote:
[...]
...
...
...
plus a number of real lock-free algorithms (which do matter) don't
work.
I'm really interested to know the exact number/names of algorithms.
I asked Doug Lea this question 3 or 4 years ago when this was still being 
actively discussed.  He's more of an expert on practical concurrent 
algorithms.  I got back a bunch of answers.  One simple and common example is 
the algorithm for releasing a queued lock.  I need to set the lock state to 
unlocked and then check the queue for waiters.  If those are reordered, I get 
lost wakeup problems.  His assessment was that such cases are surprisingly 
common, though there are clearly also a bunch of scenarios, like lazy 
initialization via double-checked locking, for which acquire-release atomics 
suffice.
...
[... "based on Dekker's example" ...]
Yes, acquire/release is free from implicit store-load barrier (same as
with TSO and even IBM 370).
Note that computer hardware vendors never provided and still do not
provide implicit store-load barrier for hardware.
Note that we're not actually requiring a full store-load barrier in the 
implementation.  We only need a barrier that orders atomic stores and atomic 
loads, which could perhaps be appreciably cheaper, though on current 
architectures it isn't.
...
In this respect I really like "Figure 8":
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf
(Figure 8: Simple categorization of relaxed models. ...)
The situation is really quite different when you're talking about 1995 
hardware ISAs as opposed to programming languages like C++.  At the 1995 
hardware level, you can no longer distinguish data and synchronization 
accesses, so you would also add the overhead to data accesses.  And whatever 
you promise is unlikely to be visible to the programmer anyway, since compiler 
reordering is likely to invalidate whatever properties you tried to guarantee 
at the hardware level.  Providing full sequential consistency at the hardware 
level without compiler changes is likely to buy you essentially nothing, and 
do so at significant expense either in hardware complexity and/or performance.

For what it's worth, Sarita Adve is both an author of the report you cite and 
the original and perhaps strongest advocate for the "sequential consistency 
for data-race-free programs" programming model.
...
Why the heck is C++'s default mode for atomics so special in this
respect?
At the programming language level, this isn't at all unique.  I believe Ada 83 
atomic accesses were sequentially consistent.  Certainly Java volatiles are.
...
regards,
alexander.