Re: [boost] [lockfree] review

31 Aug 2011

      Alexander Terekhov <terekhov <at> web.de> writes:
...
Hans Boehm wrote:
[...]
...
For what it's worth, Sarita Adve is both an author of the report you cite
and
...
...
the original and perhaps strongest advocate for the "sequential consistency
for data-race-free programs" programming model.
I'm not contra "sequential consistency for data-race-free programs"
programming model for programs using locks. On PPC, for example, such
programs don't even need hwsync. For programs with lock-free atomics
OTOH, the races (concurrent accesses to the same locations with loads
competing with concurrent stores) is a feature, not a bug, and SC is
simply way too expensive (e.g. it needs hwsync on PPC) for use in
default mode for lock-free atomics: C/C++ is "you don't pay for what you
don't need".
regards,
alexander.
The question is when the "sequential consistency for data-race-free programs" 
should extend to programs using atomic load, store, and RMW operations.  The 
C++ committee, including me, came to the conclusion that the answer needs to 
be the yes; there are many cases in which the use of atomics is fairly 
straightforward and useful.  And it should be possible to use them without 
leaving this relatively simple programming model.  By doing so, you get a safe 
programming model by default.  Since we do have explicit ordering primitives, 
you have the option of only paying for what you need.  But 90%, or probably 
99% of programmers will not know what they need here.  And that's fine.

This is entirely consistent with many other C++ design decisions.  The default 
operator new allocates memory that lives as long as the process, even though 
that's more expensive than allocating memory local to the current stack frame 
or thread, and often one of those latter two options would be sufficient.  But 
it would be nasty to use that as default behavior. 

The overhead of enforcing sequential consistency is unfortunately currently 
very platform-specific.  On X86, it's increasingly minor, since it's possible 
to confine the added cost to stores and, as far as I can tell, the added cost 
is becoming much less than the cost of a coherence miss.  And if your 
performance is limited by the cost of stores to shared variables, you are 
fairly likely to unavoidably see lots of coherence misses, so there is a hand-
wavy argument that this is likely to be a minor perturbation.  On other 
architectures, the costs are unfortunately larger.  But my impression is that 
they are decreasing everywhere, as architects pay more attention to 
synchronization costs.

Hans