Re: [Boost-users] Subject:[atomic] why atomic<>::store method can not support memory_order_consum memory_order_acquire and memory_order_acq_rel
On Thu, Jul 23, 2015 at 12:00 AM, "class7class@163.com" wrote: I found the the store() method can not support several memory_orders
metioned in the subject , and the other methods can not support all
memory_orders,such as:
1.the load method can not support memory_order_release and
memory_order_acq_rel
2. the compare_exchange_strong method can not support
memory_order_release and memory_order_acq_rel
3. the compare_exchange_weak method can not support memory_order_release
and memory_order_acq_rel I really want to know the reason that why the methods of atomic can not
support all memory_orders. If I need the methods of the atomic<> support all memory_orders,how can i
do? You are asking into the area of C++ where the fewest people in the world
understand, namely relaxed atomics. Before I tell my limited understanding
of the issue, I have an advice:
- Use mutexes like boost::mutex to ensure all your accesses to variables
shared among threads are not accessed concurrently, and you'll be fine.
If your circumstance forces you to ignore the above advice (like me, e.g.,
when you cannot afford the scheduling overheads of mutexes), here is yet
another advice which may save your life:
- Make all your variables that can be accessed concurrently from
multiple threads (with at least one thread writing) atomic. (This makes
your program "data race free".)
- Always use the default memory_order_seq_cst memory ordering for
accesses to your atomics, never use anything else.
If your circumstance forces you to ignore the above advices (unlike me, I
just learn them because I haven't heard about the second advice before;
you'll have to be writing for old ARM (i.e., ones without the LDA and STL
instructions) or POWER for you to have a real reason to use relaxed
atomics, I write for neither), here is what I have learnt.
tl;dr
The boost atomic doesn't support all memory ordering because the C++11
standard doesn't support all of them. The C++ standard doesn't support all
memory ordering because not all make sense in the "memory model" supported
by C++. It is not likely to change in the near future, i.e., unless some
genius devises some novel memory model which changes the status quo.
Here "memory model" means what "loads" from memory is allowed to return.
Because of compiler and hardware optimization, writes (including atomic
writes) to memory locations may be reordered, so the load doesn't always
return "the last value stored". Special hardware instructions are
necessary to ensure a particular ordering that the programmer needs. The
easiest way to understand a program is that "All memory operations by all
threads looks as if they are done one after another, interleaving among
threads". But it proves to be too tricky for hardware to provide any
reasonable performance for this model.
The C++ memory model without relaxed atomics is essentially "All *atomic*
memory operations by all threads looks as if they are done one after
another, interleaving among threads. All other memory operations of each
thread looks as if they are completely done between adjacent atomic memory
operations. But the programmer guarantees not to write a data race. The
compiler uses this assumption in all its work, and your program can break
in all mysterious ways if you break this rule." This is essentially the
illusion provided by sequential consistency. Everybody should want it.
Except some don't, because in some architectures it is essentially
impossible to provide sequential consistency efficiently. To do sequential
consistency, one has to ensure that (1) the compiler doesn't reorder the
statements written by the programmer, which is controllable by compiler
writer; and (2) the hardware doesn't reorder the instructions generated by
the compiler, which is damn hard. It usually means the compiler must
insert memory fences before and after memory accesses (i.e., tell the CPU
"don't return to me until everything I do up to this point are visible by
all other CPUs" or "don't return to me until everything other CPUs do up to
this point are visible by me"), which is very slow. It is particularly bad
because for these architectures, fences must be inserted not just for
atomic stores, but for the supposedly fast atomic loads as well. (Recent
architectures do better because they link the requirement into the accesses
of the particular memory location: the "particular memory location" here
limits the scope where the "fences" have to do, and this makes a big
difference in performance.)
That's why relaxed atomics (i.e., memory_order_acq, etc) exists in the
standard. They allow those architectures to perform better than using full
sequential consistency in some common cases. Because different
architectures require different sort of fences, it doesn't make sense for
these to mean "insert a fence after the load". Instead, the C++ standard
uses a more programmer centric view when defining relaxed memory ordering:
- If a thread X stores a value with release semantics into an atomic
variable, and another thread Y loads this value with acquire semantics from
that same atomic variable, then everything done by thread X before this
releasing store are guaranteed to be visible to thread Y after this
acquiring load.
There is no other semantics attached to the acquire and release. This is
why acquiring stores and releasing loads are meaningless: they are not
assigned any semantics. But the deeper reason for the lack is that the
computer industry cannot find a way to give additional guarantee without
incurring hefty performance overheads. And at the same time, sequentially
consistent atomics are not so expensive in the new architectures.
As for what to do if you want more than the above release-acquire
guarantee: Simple, just use sequentially consistent atomic operations.
Regards,
Isaac
participants (1)
-
Isaac To