
Tobias Schwinger:
AFAIK (not claiming to be an expert in this field, however) it takes a rather atypical processor architecture to pull load instructions out of their regular execution path (logically) before a conditional branch, no?
No. It's not atypical at all to execute loads speculatively, many instructions in advance. The CPU doesn't know whether it will take the branch, it predicts it. Stores that depend on a conditional branch aren't reordered on a PPC, but loads - to the extent of my knowledge - can be. You need an 'isync' instruction after the branch to discard the speculatively executed loads. I'm not a PPC expert either. Feel free to not use barriers on PPC. :-)
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
Where can I find it?
His latest work is here: http://www.justsoftwaresolutions.co.uk/threading/index.html Odd that he's not watching this thread.