
Hello Tony, On Thursday, 18. November 2010 05:13:03 Gottlob Frege wrote:
The CPU (not just the compiler, but the CPU, memory subsystem, etc) can reorder that as:
First Thread: flag = function_complete_flag_value; important_data = new ...;
Second Thread: register temp = important_data; // start memory read early, so that we don't wait for reads to complete if (flag == function_complete_flag_value) use_important_data(temp);
See the problem? important_data may be read before it is ready.
Thanks for the detailed explanation! As you pointed out, the second thread will mess up without the barrier. Time to fix my code :) One more small thing: IMHO the write reordering in the first thread can't happen because of the memory barrier created by interlockedExchange(). The first thread translates to this: important_data = new ...; // the init call lock // create memory barrier: // - Don't allow reordering // from below the barrier // - Finish all outstanding writes flag = function_complete_flag_value;
Also one more (silly?) question: "flag" is not a volatile variable. Does boost::detail::interlocked_read_acquire() make sure the value doesn't get cached inside a register? IMHO we lock the mutex and we might still hold a cached, old value of "flag" in a register. -> Do we need an interlocked read here, too? Or mark the flag type "volatile"?
volatile is almost useless is threaded programming. It is typically both insufficient for threads (ie no memory barrier) and at the same time superfluous - when inside a mutex - as the mutex handles the barrier for you.
True that.
volatile only helps with the compiler - it doesn't control what the CPU might then do to your instructions (like reorder them, do speculative execution, etc), so it doesn't help much with threads running on separate CPUs. And MS's version of volatile (which _does_ enforce memory barriers) is non-standard. So don't use it in portable code. ie don't use it at all.
Let me illustrate it a bit more: - register temp = interlocked_read(&flag) // fetch from mem location xyz -> Init flag not set, so execute init code: - create mutex (also creates memory barrier) - Another thread 2 already entered the mutex, executes the init code and does an interlocked write of "flag". Then it leaves the mutex so thread 1 can continue. - Thread 1 re-reads the flag without interlocked read or volatile: The compiler recognizes it's the same memory location xyz and uses the cached value from "register temp". So we would redo the initialization. Don't we need an interlocked read here, too? Or does the mutex/memory barrier ensure the compiler isn't allowed to do register caching?
P.S. I will hopefully be doing another talk on this stuff at BoostCon in May - you should go!
Nice! Too bad "www.boostcon.com" currently issues a "500 - Internal Server Error" :o) Best regards, Thomas Jarosch