On Sun, Jun 1, 2014 at 11:58 AM, Andrey Semashev
Hi,
I'm reviewing (again) Boost.Atomic code and struggling to understand the consume order and in particular what should it mean on architectures other than DEC Alpha.
I read the explanation here:
http://en.cppreference.com/w/cpp/atomic/memory_order
but the point eludes me. Take ARM for example and the explanation in the "Release-Consume ordering" section. The producer thread allocates the string and stores the pointer with a release operation, so that the pointer, the string contents and the 'data' integer are visible to other threads.
Now the consumer thread reads the pointer with a consume operation. According to the explanation in the article, on ARM the consume operation need not issue any specific fences to be able to use the pointer and the string body. In that case, the consume operation becomes equivalent to relaxed (plus prohibiting compiler optimizations). But is there a guarantee that the string body will be visible to the consumer? Shouldn't the consume operation be promoted to acquire instead?
ARM and many other RMO architectures (like PPC and unlike Alpha), guarantee that a load and the load it depends on won't be reordered, so, together with the release operation on the writer side, the load_consume guarantees the visibility of the string body. The exact definition of load dependency (basically the address of the dependent load is computed as a function of the value returned by the depending load) is defined at the instruction level and is quite tricky to recover at the high level C++ language. C++11 tried to do it, but according to a few the current working is both very hard to implement and both not strong enough and too strict in some cases. In the meantime GCC (and a few other compilers) punts on load_consume and simply promotes it to load_acquire. Note that x86, a TSO machine, has even stronger guarantees, any load is a load_aquire.
I guess, that's the ultimate question: how should consume ordering be handled on conventional architectures.
That's hard to do without compiler help unfortunately. Compilers have started doing some quite aggressive optimisations (like value speculation and PGO) that can break loads dependencies. The linux kernel for example gets by by explicitly disabling those optimisations, not doing PGO and targeting a specific compiler. See n2664 and the recent epic thread on gcc-dev. HTH, -- gpd