
Right-oh. Here's two implementations of call_once (attached), for starters.
once.hpp uses a Semaphore, whereas once_mutex.hpp uses a mutex, the same as the existing boost::thread implementation.
Running the test program, the semaphore version gives a timing of 16s compiled with gcc-mingw-4.0.1, and 22s compiled with MSVC 7.1, on my machine. The mutex version gives timings of 23s (gcc) and 29s (MSVC), so the semaphore version is clearly faster.
OK how about a third version: Pros: * Uses only simple atomic operations, easy to implement as a header only solution using Boost's existing shared_ptr support code. * Much faster than either of the alternatives above (see below for timings). * No need to do anything different on CE, or use stringstream etc. * Exception safe etc. * Accepts template functor. Cons: * If the functor takes a long time to execute, and there are multiple threads racing to call-once, then the wait will be less efficient than a mutex, this should be a very rare occurance though. Here's my timings: // Semaphore method: // Elapsed time for one thread=4.816 // Elapsed time for multiple threads=0.05 // // Mutex method: // Elapsed time for one thread=5.387 // Elapsed time for multiple threads=0.05 // // Atomic method: // Elapsed time for one thread=0.01 // Elapsed time for multiple threads=0.06 So the atomic method is only about 500 times faster, in your rather pathological test case :-) I always did wonder why call_once wasn't implemented this way, but never got around to asking, could be I've missed something really obvious of course... ? Thoughts? John.