
Alexander Terekhov <terekhov <at> web.de> writes:
Hans Boehm wrote: [...]
plus a number of real lock-free algorithms (which do matter) don't work.
I'm really interested to know the exact number/names of algorithms.
I asked Doug Lea this question 3 or 4 years ago when this was still being actively discussed. He's more of an expert on practical concurrent algorithms. I got back a bunch of answers. One simple and common example is the algorithm for releasing a queued lock. I need to set the lock state to unlocked and then check the queue for waiters. If those are reordered, I get lost wakeup problems. His assessment was that such cases are surprisingly common, though there are clearly also a bunch of scenarios, like lazy initialization via double-checked locking, for which acquire-release atomics suffice.
[... "based on Dekker's example" ...]
Yes, acquire/release is free from implicit store-load barrier (same as with TSO and even IBM 370).
Note that computer hardware vendors never provided and still do not provide implicit store-load barrier for hardware.
Note that we're not actually requiring a full store-load barrier in the implementation. We only need a barrier that orders atomic stores and atomic loads, which could perhaps be appreciably cheaper, though on current architectures it isn't.
In this respect I really like "Figure 8":
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf (Figure 8: Simple categorization of relaxed models. ...)
The situation is really quite different when you're talking about 1995 hardware ISAs as opposed to programming languages like C++. At the 1995 hardware level, you can no longer distinguish data and synchronization accesses, so you would also add the overhead to data accesses. And whatever you promise is unlikely to be visible to the programmer anyway, since compiler reordering is likely to invalidate whatever properties you tried to guarantee at the hardware level. Providing full sequential consistency at the hardware level without compiler changes is likely to buy you essentially nothing, and do so at significant expense either in hardware complexity and/or performance. For what it's worth, Sarita Adve is both an author of the report you cite and the original and perhaps strongest advocate for the "sequential consistency for data-race-free programs" programming model.
Why the heck is C++'s default mode for atomics so special in this respect?
At the programming language level, this isn't at all unique. I believe Ada 83 atomic accesses were sequentially consistent. Certainly Java volatiles are.
regards, alexander.