
Not a direct response to Anthony who already knows most of that, just a general explanation of the choices behind N2178: Anthony Williams wrote:
Sorry, I misread what you wrote above, and looked at the /implementation/ of pthread_once2_np. I agree that it's not that hard to write call_once in terms of pthread_once2_np, but that isn't what POSIX provides.
The idea behind the _POSIX_CXX09_EXTENSIONS part of N2178 is precisely to allow a pthread implementor (in the Windows case that's us) to avoid these sources of inefficiency. The join2 functions serve as a similar example. On Windows, they are trivially implementable with no overhead using WaitForSingleObject on the thread HANDLE. On non-extended POSIX, the situation is more complicated. The join2 functions illustrate another point where I believe N2178 is superior to N2184: they expose the superior Windows join model at the C++ level instead of the more limited pthread_join. It's trivial to make an n2184::thread provide the same semantics under Windows, but nobody would be able to take advantage of them portably.