Boost threadsafe singleton?

I'm review the serialization library with the eye of making it threadsafe. I think that very useful in this effort would be lightweight threadsafe singleton. There are tons of cyberbits spilled on this topic - but much to my suprise and disappointment, no boost library for this purpose. Someone please tell me its here but that I just haven't found it. Robert Ramey

Robert Ramey wrote:
I think that very useful in this effort would be lightweight threadsafe singleton.
Recently found the same. Do you want a Singleton with thread-safe initialization? Or one with automatically thread-safe access to its members? <...>
Someone please tell me its here but that I just haven't found it.
The archive at http://tinyurl.com/35vlvb contains both of the variants mentioned above. Regards, Tobias

Thanks for pointing this out. It looks to me that this would require linking with the thread library. I was thinking more along the lines of something built with the code in boost/detail/lightweigth_mutex Robert Ramey Tobias Schwinger wrote:
Robert Ramey wrote:
I think that very useful in this effort would be lightweight threadsafe singleton.
Recently found the same. Do you want a Singleton with thread-safe initialization? Or one with automatically thread-safe access to its members?
<...>
Someone please tell me its here but that I just haven't found it.
The archive at
contains both of the variants mentioned above.
Regards, Tobias
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey wrote:
Thanks for pointing this out.
It looks to me that this would require linking with the thread library.
I was thinking more along the lines of something built with the code in boost/detail/lightweigth_mutex
'lightweight_mutex' can't be safely initialized statically and there is no "lightweight_once". Without either of it, the Singleton wouldn't work properly with shared libraries - and so would the libraries using it. I'd have to go for the native threading APIs or extend the "lightweight suite". I put it on my list, for now. Regards, Tobias

I've thought about this some more and looked at the code in you singleton as well as that of the threading library that your code depends upon. I'm motivated by the fact that things are more complex that I anticipated and your package is "boost ready" with tests, documentation, and appropriate boost macro dependcies. This has convinced me that I don't want to re-do any of this and just want to depend on what you (collectively) have done. This means biting the bullet and creating a dependency on the thread library. I hope whoever is maintaining the thread library won't let me down. Soooooooooooo, now I've got some real serious questions. I'm going to need a threadsafe singleton with the following features. a) no extra features. b) if used in an environment which doesn't have threads, it should compile down to almost nothing and not require linking with the threading library. c) good documentation - mostly done as far as I can tell. I would like to see the singleton library have a tutorial section showing examples in a more "chatty" style. I'm aware of the necessity of "formal" documentation - but it's only really useful after one grasps the functionality and necessity of the package in more intuitive terms. Now assuming that the above is OK - (I can live without c) then I have the real problem. Suppose I want to use this to make the serialization library threadsafe. The singleton package is not currently in boost. In fact, its not even on the review schedule. as I see it I have a couple of options: a) don't do anything, just wait until boost has a threadsafe singleton. I see no progress being made here. Currently there is no singleton implemenation in the review queue. b) hack away with some ad hoc version. - built on lightweight_mutex. Problem here is that there are no tests, no documentation, and no promise to keep it maintained and no promise to not change the interface in the future. (I like the size/speed and header only aspect) c) Incorporate your threadsafe singleton as part of the serialization library. See the section of serializaiton library documentation misc. Which has a number of "standalone" utilities which were need to make the serialization library work. This could be made to work. Problem is that I'm going to have a huge hassle unless its in namespace boost::serialization::singleton which is pretty crazy. Hopefully if something like this gets approved eventually, but then I have to go back and change the namespace reference in the serialization library. Seems silly to me to have to go through this. It would also be pretty confusing as there is another singleton in the "memory pool" library - though this one emphatically says its not a general purpose solution. It would also mean that all the tests (Which MUST be run) would appear as part of the serialization library. This is also very confusing and in appropriate. d) Just insert the singleton into boost/detail. I don't like this for the same reason I don't like depending upon boost/detail/lightweight/mutex: it creates a dependency on a component which has no tests, no documentation, and no promise to keep the interface in the future. This is a big proble for the components that are already in there. I run test on utf8_codecvt facet as part of the serialization library. As a practical matter, I'll probably just end up doing d) as it just takes too long any other way. oh well. Robert Ramey

Robert Ramey wrote:
c) good documentation - mostly done as far as I can tell. I would like to see the singleton library have a tutorial section showing examples in a more "chatty" style. I'm aware of the necessity of "formal" documentation - but it's only really useful after one grasps the functionality and necessity of the package in more intuitive terms.
Yes, a brief tutorial is needed (figuring from a question on the usage I received via email). I'm thinking about a text with some simple code snippets for "here's how to basically use it", "you might use multiple inheritance to make some existing class a singleton", and "you might use non-public inheritance to create a purely static interface".
The singleton package is not currently in boost. In fact, its not even on the review schedule. as I see it I have a couple of options:
a) don't do anything, just wait until boost has a threadsafe singleton. I see no progress being made here. Currently there is no singleton implemenation in the review queue.
Well, my implementation has been uploaded just a few weeks ago. Anyway we might as well just request formal review - and, as the crowdedness has lightened, it won't take that long, especially if you volunteer to manage that review ;-).
b) hack away with some ad hoc version. - built on lightweight_mutex. Problem here is that there are no tests, no documentation, and no promise to keep it maintained and no promise to not change the interface in the future. (I like the size/speed and header only aspect)
AFAICT, our "lightweight toolbox" is still insufficient to implement a thread-safe Singleton - I might be missing something, though. How would you initialize 'lightweight_mutex' when you can't know that ctors are run in static context (as within a shared library)? Maybe it's possible to make 'detail::atomic_count' an aggregate and provide a macro for initialization (just as pthread does for its synchronization primitives). Then it would be trivial to implement a 'lightweight_once' on top of it... Thanks for your comments. Regards, Tobias

Tobias Schwinger wrote:
I'm thinking about a text with some simple code snippets for "here's how to basically use it", "you might use multiple inheritance to make some existing class a singleton", and "you might use non-public inheritance to create a purely static interface".
Along with small tests and demos please. Very cool ideas which would never have occurred to me as I've been too focused on an expedient rather than clever solution.
Well, my implementation has been uploaded just a few weeks ago. Anyway we might as well just request formal review - and, as the crowdedness has lightened, it won't take that long, especially if you volunteer to manage that review ;-).
I'm biased because I need something RIGHT NOW. Of course that's not true but it turns out I started out addressing the issues that the serialization library currently has with multi-threading and dynamic loading/unloading of DLLS. So if I don't have it now, I lose my whole train of thought.
AFAICT, our "lightweight toolbox" is still insufficient to implement a thread-safe Singleton - I might be missing something, though. How would you initialize 'lightweight_mutex' when you can't know that ctors are run in static context (as within a shared library)?
Maybe it's possible to make 'detail::atomic_count' an aggregate and provide a macro for initialization (just as pthread does for its synchronization primitives). Then it would be trivial to implement a 'lightweight_once' on top of it...
Please start thinking about this. I don't see why this can't be a header only library. Of course in my case, the expanded headers (instantiated code) will be compiled into the serializatoin library as an implementation detail. I would much prefer that to having to link in another library. Robert Ramey

Hello Robert, Wednesday, August 22, 2007, 12:43:03 AM, you wrote:
Tobias Schwinger wrote:
AFAICT, our "lightweight toolbox" is still insufficient to implement a thread-safe Singleton - I might be missing something, though. How would you initialize 'lightweight_mutex' when you can't know that ctors are run in static context (as within a shared library)?
Maybe it's possible to make 'detail::atomic_count' an aggregate and provide a macro for initialization (just as pthread does for its synchronization primitives). Then it would be trivial to implement a 'lightweight_once' on top of it...
Please start thinking about this. I don't see why this can't be a header only library. Of course in my case, the expanded headers (instantiated code) will be compiled into the serializatoin library as an implementation detail. I would much prefer that to having to link in another library.
Just came across this thread. I had a need of lightweight_call_once in my Boost.FSM library and implemented it. It is not implemented as an internal part of the library, but rather as a common tool, like lightweight_mutex. It can be found here: http://tinyurl.com/yjozfn I hope it will make it to Boost after the library review. -- Best regards, Andrey mailto:andysem@mail.ru

Andrey Semashev wrote:
Hello Robert,
Wednesday, August 22, 2007, 12:43:03 AM, you wrote:
Tobias Schwinger wrote:
AFAICT, our "lightweight toolbox" is still insufficient to implement a thread-safe Singleton - I might be missing something, though. How would you initialize 'lightweight_mutex' when you can't know that ctors are run in static context (as within a shared library)?
Maybe it's possible to make 'detail::atomic_count' an aggregate and provide a macro for initialization (just as pthread does for its synchronization primitives). Then it would be trivial to implement a 'lightweight_once' on top of it...
Please start thinking about this. I don't see why this can't be a header only library. Of course in my case, the expanded headers (instantiated code) will be compiled into the serializatoin library as an implementation detail. I would much prefer that to having to link in another library.
Just came across this thread. I had a need of lightweight_call_once in my Boost.FSM library and implemented it. It is not implemented as an internal part of the library, but rather as a common tool, like lightweight_mutex.
Something you'd like to brush up as a Boost X-File ;-)? See http://article.gmane.org/gmane.comp.lib.boost.devel/162951
It can be found here:
Thanks! It's great you actually wrote a reusable tool. I finally found some time to review your code. Here are my (hopefully not too discouraging) comments: The pthreads implementation seems to be using a global Mutex, which is inefficient, because it causes concurrent initializations (that might have nothing to do with each other) to be queued. To make things worse, that Mutex is initialized with 'pthread_once'. Also, some platforms will not call 'mutex_destroyer' within a dynamic library (you probably know)... The "trigger" could contain the mutex and the macro for initialization would contain PTHREAD_MUTEX_INITIALIZER, so its creation can be done at compile time by setting up the appropriate bytes in the data segment (interestingly, you use a similar technique for the "no atomics variant"). Other implementations use "while (check) sleep; stuff", which seems sorta awkward to me. Can't we use "proper" synchronization? Win32 provides the 'InitOnceExecuteOnce' which seems to do pretty much all we need (it even takes a parameter to get a the state in and boost::function and a downcast from 'PVOID' will do for the type erasure). After putting another guard around it (to avoid the dynamic call and the construction of the boost::function) we're all done. I don't know too much about other threading platforms, but I'm sure there are similar means. We could also use a counter for the guard to use a Semaphore (if it's more handy to do so for some platform) to notify threads waiting for initialization to complete: if (is_init(trigger)) return; if (atomic_inc(cnt) > 1) // <-- gate point { // go to sleep, unless we missed initialization has finished if (! is_init(trigger)) sem.down(); atomic_dec(cnt); } else { client_func(); // make sure no further threads enter the gate (threads that // miss the change and enter anyway will leave again) make_called(trigger); // continue all threads after the gate for (int n; !! (n = cnt - 1) ;) { sem.up(n); yield(); // <-- might or might not be needed } // postcondition: sem >= 0 } For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff. There's some code that throws exceptions with pretty, formatted error messages: So we're out of resources and execute a whole bunch of code to format an error message... That code might run into the same problem we're trying to report, so probably throwing something lightweight (such as an enum) is a more appropriate choice (and also gets us rid of some header dependencies). Regards, Tobias

Its become clear to me that this is an important and way non-trivial subject that I'm not going to get into the details of. Sooo - here's what I'm going to do. I'm going to make / use a very simple lightweight singleton which has an interface similar or identical to your proposal. I can modify all the places in the serialization library which now implement this functionality in an ad-hoc way so that they depend on this single header. There are a number of such cases - at last 7 !!. So this will make my code more clear and shorter. It won't make the serialization thread-safe however. But when or if this ever gets sorted out, It will be trivial to just change a few #includes and I'll done. If anyone can't wait for a Boost threadsafe singleton, they can substitute my trivial one for one of their own choosing. That is, I'm offloading this issue to anyone else Robert Ramey

Robert Ramey wrote:
If anyone can't wait for a Boost threadsafe singleton, they can substitute my trivial one for one of their own choosing.
Thats a step in the right direction!
That is, I'm offloading this issue to anyone else
How about a different way to think about the problem? As I recall, the problem with the thread-safety of the serialization library relates to setting up the type info maps. If you could somehow create a way that one could invoke the setup of this map perhaps by a function call, that would solve the problem, yes? Thanks for looking at the problem in any case. Sohail

Sohail Somani wrote:
Robert Ramey wrote:
If anyone can't wait for a Boost threadsafe singleton, they can substitute my trivial one for one of their own choosing.
Thats a step in the right direction!
That is, I'm offloading this issue to anyone else
How about a different way to think about the problem?
As I recall, the problem with the thread-safety of the serialization library relates to setting up the type info maps. If you could somehow create a way that one could invoke the setup of this map perhaps by a function call, that would solve the problem, yes?
That sounds good to me. If the serialization library is doing something non-trivial at static initialization or destruction time than it can't be used in many embedded environments that either don't perform static destruction or require a explicit initialization of their runtime library, free store/heap, etc. I realize this probably isn't even a concern to most people but it is for me because I like using Boost libraries in my embedded development instead of rolling my own solutions. Selfishly yours, Michael Marcin

Michael Marcin wrote:
As I recall, the problem with the thread-safety of the serialization library relates to setting up the type info maps. If you could somehow create a way that one could invoke the setup of this map perhaps by a function call, that would solve the problem, yes?
Actually that's what happens now under the covers. This occurs in a number of different places - each in its own way. This can be factored out into a common singleton implementation. Replacing this common one with a threadsafe version will make the serialization truely threadsafe.
That sounds good to me. If the serialization library is doing something non-trivial at static initialization or destruction time than it can't be used in many embedded environments that either don't perform static destruction or require a explicit initialization of their runtime library, free store/heap, etc.
Then all you'll have to do is replace the trivial singleton implemenation with your own which addresses you particular situation. Robert Ramey

Robert Ramey wrote:
Michael Marcin wrote:
As I recall, the problem with the thread-safety of the serialization library relates to setting up the type info maps. If you could somehow create a way that one could invoke the setup of this map perhaps by a function call, that would solve the problem, yes?
Actually that's what happens now under the covers. This occurs in a number of different places - each in its own way. This can be factored out into a common singleton implementation. Replacing this common one with a threadsafe version will make the serialization truely threadsafe.
That sounds good. Will you let us know when you have something to look at? Thanks, Sohail

Michael Marcin wrote:
That sounds good to me. If the serialization library is doing something non-trivial at static initialization or destruction time than it can't be used in many embedded environments that either don't perform static destruction or require a explicit initialization of their runtime library, free store/heap, etc.
Those are pretty much the same constraints that apply to most UNIXes for dynamic libraries. So if you want to write a portable library you better don't use non-trivial static initialization or destruction, anyway. Regards, Tobias

Hello Tobias, Friday, August 24, 2007, 4:19:43 AM, you wrote:
Just came across this thread. I had a need of lightweight_call_once in my Boost.FSM library and implemented it. It is not implemented as an internal part of the library, but rather as a common tool, like lightweight_mutex.
Something you'd like to brush up as a Boost X-File ;-)?
See
Hmm, I'm not sure of the purpose of this project. Is it supposed to pass several tools under its umbrella to boost via fast-track review?
The pthreads implementation seems to be using a global Mutex, which is inefficient, because it causes concurrent initializations (that might have nothing to do with each other) to be queued. To make things worse, that Mutex is initialized with 'pthread_once'.
Yes, but consider that this code will be executed only once. The rest of the execution time this mutex is useless. As mutexes may actually take some system resources (not sure whether it's true or not on the wide variety of platforms out there), having a separate mutex for every call_once is a direct waste of them.
Also, some platforms will not call 'mutex_destroyer' within a dynamic library (you probably know)...
No, I'm not aware of this. Could you elaborate, please? Which platforms are those?
The "trigger" could contain the mutex and the macro for initialization would contain PTHREAD_MUTEX_INITIALIZER, so its creation can be done at compile time by setting up the appropriate bytes in the data segment (interestingly, you use a similar technique for the "no atomics variant").
In the "no atomics case" I had no other choice as I needed a mutex to safely read the once flag.
Other implementations use "while (check) sleep; stuff", which seems sorta awkward to me. Can't we use "proper" synchronization?
The fundamental problem arises here - I need to safely create a synchronization object. Non-POSIX APIs don't provide things like PTHREAD_MUTEX_INITIALIZER or I didn't find them in the docs. And, besides that, the aforementioned drawback with waste of resources comes up again.
Win32 provides the 'InitOnceExecuteOnce' which seems to do pretty much all we need (it even takes a parameter to get a the state in and boost::function and a downcast from 'PVOID' will do for the type erasure). After putting another guard around it (to avoid the dynamic call and the construction of the boost::function) we're all done.
InitOnceExecuteOnce is available since Vista and up. I'd like not to introduce such constraints on execution platform. Although, there could be an alternative implementation for WinAPI, for ones that will not execute their apps on XP, for example.
I don't know too much about other threading platforms, but I'm sure there are similar means. We could also use a counter for the guard to use a Semaphore (if it's more handy to do so for some platform) to notify threads waiting for initialization to complete:
See my note above. I can't safely create a single semaphore or mutex by an API-function call (which was not shown in your code sample). You may see my dancing around creating a semaphore in BeOS implementation to feel the problem.
For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff.
The point is not only in atomic reads and writes, but in performing memory barriers too. Otherwise the result of executing the once functor could not have been seen by other CPUs.
There's some code that throws exceptions with pretty, formatted error messages: So we're out of resources and execute a whole bunch of code to format an error message... That code might run into the same problem we're trying to report, so probably throwing something lightweight (such as an enum) is a more appropriate choice (and also gets us rid of some header dependencies).
Well, you may be right here. I could try to reduce memory allocations in error handling. But the only possible problem I see there is memory depletion. In such case you'll get std::bad_alloc which adheres the declared interface of the implementation. So, strictly speaking, if you have enough memory you get a detailed error description. If not, you get bad_alloc. -- Best regards, Andrey mailto:andysem@mail.ru

Andrey Semashev wrote:
Hello Tobias,
Friday, August 24, 2007, 4:19:43 AM, you wrote:
Just came across this thread. I had a need of lightweight_call_once in my Boost.FSM library and implemented it. It is not implemented as an internal part of the library, but rather as a common tool, like lightweight_mutex.
Something you'd like to brush up as a Boost X-File ;-)?
See
Hmm, I'm not sure of the purpose of this project. Is it supposed to pass several tools under its umbrella to boost via fast-track review?
Sort of. It's just an idea, so far. Its purpose is to avoid lots of fast-track reviews (and reviewing overhead) for utility components by grouping them into a "pseudo library", thus encouraging developers to brush up / factor out useful stuff.
The pthreads implementation seems to be using a global Mutex, which is inefficient, because it causes concurrent initializations (that might have nothing to do with each other) to be queued. To make things worse, that Mutex is initialized with 'pthread_once'.
Yes, but consider that this code will be executed only once. The rest of the execution time this mutex is useless.
Consider the deadlock if 'once' is used recursively to initialize different resources... Further, it's quite unintuitive that a trivial initialization might get slowed down by one in another thread that takes a lot of time.
As mutexes may actually take some system resources (not sure whether it's true or not on the wide variety of platforms out there), having a separate mutex for every call_once is a direct waste of them.
You can call 'pthread_mutex_destroy' once you're done with the mutex to free up eventually acquired system resources.
Also, some platforms will not call 'mutex_destroyer' within a dynamic library (you probably know)...
No, I'm not aware of this. Could you elaborate, please? Which platforms are those?
No ctors/dtors are run in static context for shared libraries on most UNIX platforms.
The "trigger" could contain the mutex and the macro for initialization would contain PTHREAD_MUTEX_INITIALIZER, so its creation can be done at compile time by setting up the appropriate bytes in the data segment (interestingly, you use a similar technique for the "no atomics variant").
In the "no atomics case" I had no other choice as I needed a mutex to safely read the once flag.
Other implementations use "while (check) sleep; stuff", which seems sorta awkward to me. Can't we use "proper" synchronization?
The fundamental problem arises here - I need to safely create a synchronization object. Non-POSIX APIs don't provide things like PTHREAD_MUTEX_INITIALIZER or I didn't find them in the docs.
I see. Would it be an option to use 'yield' instead of 'sleep'?
And, besides that, the aforementioned drawback with waste of resources comes up again.
Again, explicit disposal will do the trick.
Win32 provides the 'InitOnceExecuteOnce' which seems to do pretty much all we need (it even takes a parameter to get a the state in and boost::function and a downcast from 'PVOID' will do for the type erasure). After putting another guard around it (to avoid the dynamic call and the construction of the boost::function) we're all done.
InitOnceExecuteOnce is available since Vista and up. I'd like not to introduce such constraints on execution platform. Although, there could be an alternative implementation for WinAPI, for ones that will not execute their apps on XP, for example.
Bummer! Would've been too easy...
I don't know too much about other threading platforms, but I'm sure there are similar means. We could also use a counter for the guard to use a Semaphore (if it's more handy to do so for some platform) to notify threads waiting for initialization to complete:
See my note above. I can't safely create a single semaphore or mutex by an API-function call (which was not shown in your code sample). You may see my dancing around creating a semaphore in BeOS implementation to feel the problem.
I (maybe falsely) assumed that one could obtain one, statically. I'm aware that "by-call initialization" is problematic (as we'd need 'once', once again ;-)).
For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff.
The point is not only in atomic reads and writes, but in performing memory barriers too. Otherwise the result of executing the once functor could not have been seen by other CPUs.
Then the memory barriers will suffice for x86, correct? As this code is executed on every call, any superfluous bus-locking should be avoided. Alternatively, doing an "uncertain read" to check whether we might need initialization before setting up the read barrier might be close enough to optimal.
There's some code that throws exceptions with pretty, formatted error messages: So we're out of resources and execute a whole bunch of code to format an error message... That code might run into the same problem we're trying to report, so probably throwing something lightweight (such as an enum) is a more appropriate choice (and also gets us rid of some header dependencies).
Well, you may be right here. I could try to reduce memory allocations in error handling. But the only possible problem I see there is memory depletion. In such case you'll get std::bad_alloc which adheres the declared interface of the implementation. So, strictly speaking, if you have enough memory you get a detailed error description. If not, you get bad_alloc.
Depending on 'lexical_cast', 'iostream' and 'string' still slightly bugs me, though. Another potential issue: It seems Win32 and MacOS variants are currently not exception-safe. That is, the initialization routine isn't rerun if it has thrown the first time 'once' was called. Regards, Tobias

Hello Tobias, Sunday, August 26, 2007, 1:37:17 PM, you wrote:
Hmm, I'm not sure of the purpose of this project. Is it supposed to pass several tools under its umbrella to boost via fast-track review?
Sort of. It's just an idea, so far.
Its purpose is to avoid lots of fast-track reviews (and reviewing overhead) for utility components by grouping them into a "pseudo library", thus encouraging developers to brush up / factor out useful stuff.
In this particular case the tool will be reviewed during the Boost.FSM review (if it will, since it's not for public use anyway), so including it in X-Files won't reduce the amount of reviews. On the other hand, if Boost.FSM is rejected but there is interest to this tool, I would gladly extract it to the X-Files project.
Yes, but consider that this code will be executed only once. The rest of the execution time this mutex is useless.
Consider the deadlock if 'once' is used recursively to initialize different resources...
The mutex is recursive.
Further, it's quite unintuitive that a trivial initialization might get slowed down by one in another thread that takes a lot of time.
You can call 'pthread_mutex_destroy' once you're done with the mutex to free up eventually acquired system resources.
I'll think of it. The first thing that comes to mind is that I'd have to count threads that are hanging locked in the mutex since destroying it right away would leave those threads in undefined behavior.
Also, some platforms will not call 'mutex_destroyer' within a dynamic library (you probably know)...
No, I'm not aware of this. Could you elaborate, please? Which platforms are those?
No ctors/dtors are run in static context for shared libraries on most UNIX platforms.
That's quite a surprise for me. I didn't encounter such behavior on Linux (Red Hat). Do you have any workaround for this? I'm thinking of GCC-specific attributes for this purpose, but that's one step away from portability.
The fundamental problem arises here - I need to safely create a synchronization object. Non-POSIX APIs don't provide things like PTHREAD_MUTEX_INITIALIZER or I didn't find them in the docs.
I see. Would it be an option to use 'yield' instead of 'sleep'?
The "yield" function is not guaranteed to switch execution context, it may return immediately. If a lower-priority thread entered once functor, you may spin for a relatively long time in a "yield" loop instead of just letting the lower-priority thread finish its job.
For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff.
The point is not only in atomic reads and writes, but in performing memory barriers too. Otherwise the result of executing the once functor could not have been seen by other CPUs.
Then the memory barriers will suffice for x86, correct? As this code is executed on every call, any superfluous bus-locking should be avoided.
Actually, I got the impression that barriers themselves do a major deal of performance impact. Besides, not all compilers support barrier intrinsics.
Alternatively, doing an "uncertain read" to check whether we might need initialization before setting up the read barrier might be close enough to optimal.
Well, that's a tricky point. I'm not an expert in threading issues, but it's not obvious to me whether a memory barrier should act regardless of its scope. For example: void foo(int& x, int& y) { if (x == 0) { read_memory_barrier(); y = 10; x = 1; write_memory_barrier(); } // use y } Now, is it guaranteed that those barriers are in effect regardless of x value? I think not. Either the compiler may reorder statements in such way that y is used before the "if" statement, or the same thing may be done by CPU since the barrier instructions may not be executed.
Well, you may be right here. I could try to reduce memory allocations in error handling. But the only possible problem I see there is memory depletion. In such case you'll get std::bad_alloc which adheres the declared interface of the implementation. So, strictly speaking, if you have enough memory you get a detailed error description. If not, you get bad_alloc.
Depending on 'lexical_cast', 'iostream' and 'string' still slightly bugs me, though.
Ok, I'll change the code that formats the error string not to use lexical_cast. But it will still depend on std::string since it's in the exception class.
Another potential issue: It seems Win32 and MacOS variants are currently not exception-safe. That is, the initialization routine isn't rerun if it has thrown the first time 'once' was called.
Yep, thanks for spotting that. I'll fix that in a couple of days and update the library archive in the Vault. I'll post here a notification when it's done. -- Best regards, Andrey mailto:andysem@mail.ru

Andrey Semashev wrote:
Hello Tobias,
Sunday, August 26, 2007, 1:37:17 PM, you wrote:
Hmm, I'm not sure of the purpose of this project. Is it supposed to pass several tools under its umbrella to boost via fast-track review?
Sort of. It's just an idea, so far.
Its purpose is to avoid lots of fast-track reviews (and reviewing overhead) for utility components by grouping them into a "pseudo library", thus encouraging developers to brush up / factor out useful stuff.
In this particular case the tool will be reviewed during the Boost.FSM review (if it will, since it's not for public use anyway), so including it in X-Files won't reduce the amount of reviews. On the other hand, if Boost.FSM is rejected but there is interest to this tool, I would gladly extract it to the X-Files project.
If your library gets accepted, LWCO is accepted as an implementation detail. AFAIK you need at least a fast-track review to make it a public thing (not sure that's what you want, though). However, I'd at least very much welcome a test suite for LWCO.
Yes, but consider that this code will be executed only once. The rest of the execution time this mutex is useless.
Consider the deadlock if 'once' is used recursively to initialize different resources...
The mutex is recursive.
Sorry, missed it.
Further, it's quite unintuitive that a trivial initialization might get slowed down by one in another thread that takes a lot of time.
You can call 'pthread_mutex_destroy' once you're done with the mutex to free up eventually acquired system resources.
I'll think of it. The first thing that comes to mind is that I'd have to count threads that are hanging locked in the mutex since destroying it right away would leave those threads in undefined behavior.
It might be possible to use the flag for the counter...
Also, some platforms will not call 'mutex_destroyer' within a dynamic library (you probably know)... No, I'm not aware of this. Could you elaborate, please? Which platforms are those?
No ctors/dtors are run in static context for shared libraries on most UNIX platforms.
That's quite a surprise for me. I didn't encounter such behavior on Linux (Red Hat). Do you have any workaround for this? I'm thinking of GCC-specific attributes for this purpose, but that's one step away from portability.
AFAIK there's not much one can do about it except for adding a function for explicit disposal.
The fundamental problem arises here - I need to safely create a synchronization object. Non-POSIX APIs don't provide things like PTHREAD_MUTEX_INITIALIZER or I didn't find them in the docs.
I see. Would it be an option to use 'yield' instead of 'sleep'?
The "yield" function is not guaranteed to switch execution context, it may return immediately. If a lower-priority thread entered once functor, you may spin for a relatively long time in a "yield" loop instead of just letting the lower-priority thread finish its job.
I figured something like that. Does 'Sleep' guarantee preemption - or does it depend on the argument and the resolution of the system timer?
For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff. The point is not only in atomic reads and writes, but in performing memory barriers too. Otherwise the result of executing the once functor could not have been seen by other CPUs.
Then the memory barriers will suffice for x86, correct? As this code is executed on every call, any superfluous bus-locking should be avoided.
Actually, I got the impression that barriers themselves do a major deal of performance impact. Besides, not all compilers support barrier intrinsics.
Alternatively, doing an "uncertain read" to check whether we might need initialization before setting up the read barrier might be close enough to optimal.
Bad wording on my side. Substitute "Alternatively" with "Additionally".
Well, that's a tricky point. I'm not an expert in threading issues, but it's not obvious to me whether a memory barrier should act regardless of its scope. For example:
void foo(int& x, int& y) { if (x == 0) { read_memory_barrier(); y = 10; x = 1; write_memory_barrier(); }
// use y }
Now, is it guaranteed that those barriers are in effect regardless of x value? I think not. Either the compiler may reorder statements in such way that y is used before the "if" statement, or the same thing may be done by CPU since the barrier instructions may not be executed.
That's not quite what I meant: // 'initialized' started being false if (initialized) { // 'initialized' is true for sure } else { // we can't know 'initialized' is still false, so let's // synchronize and check again read_memory_barrier(); if (initialized) { // 'initialized' is true } else { // 'initialized' is false } } Now we only have to cross the bsrrier during (and immediately after) initialization.
Well, you may be right here. I could try to reduce memory allocations in error handling. But the only possible problem I see there is memory depletion. In such case you'll get std::bad_alloc which adheres the declared interface of the implementation. So, strictly speaking, if you have enough memory you get a detailed error description. If not, you get bad_alloc.
Depending on 'lexical_cast', 'iostream' and 'string' still slightly bugs me, though.
Ok, I'll change the code that formats the error string not to use lexical_cast. But it will still depend on std::string since it's in the exception class.
Getting rid of 'lexical_cast' and 'iostream' seems good savings, already...
Another potential issue: It seems Win32 and MacOS variants are currently not exception-safe. That is, the initialization routine isn't rerun if it has thrown the first time 'once' was called.
Yep, thanks for spotting that. I'll fix that in a couple of days and update the library archive in the Vault. I'll post here a notification when it's done.
Looking forward to it! Regards, Tobias

On 8/26/07, Tobias Schwinger <tschwinger@isonews2.com> wrote:
Andrey Semashev wrote:
Hello Tobias,
Sunday, August 26, 2007, 1:37:17 PM, you wrote:
Hmm, I'm not sure of the purpose of this project. Is it supposed to pass several tools under its umbrella to boost via fast-track review?
Sort of. It's just an idea, so far.
Its purpose is to avoid lots of fast-track reviews (and reviewing overhead) for utility components by grouping them into a "pseudo library", thus encouraging developers to brush up / factor out useful stuff.
In this particular case the tool will be reviewed during the Boost.FSM review (if it will, since it's not for public use anyway), so including it in X-Files won't reduce the amount of reviews. On the other hand, if Boost.FSM is rejected but there is interest to this tool, I would gladly extract it to the X-Files project.
If your library gets accepted, LWCO is accepted as an implementation detail. AFAIK you need at least a fast-track review to make it a public thing (not sure that's what you want, though).
I don't like the idea of important and *exposed* items being 'accepted as an implementation detail'. An init_once isn't an easy thing to write. When people review Boost.FSM are they going to take a close look at the 'once' implementation. Probably not. They are going to focus on the central items related to the stated purpose of the library.
However, I'd at least very much welcome a test suite for LWCO.
Unless carefully reviewed, all threading code has bugs. Even if tested. It is the nature of threaded code. It is *extremely* hard to test in such a way that all possible cases are tried. I had a bug where we missed a read barrier in a lock-free allocator. With 10-20 testers hammering on the product, the bug appeared about once a month, if you left it running large projects overnight. (And, after narrowing it down to which pointer was wrong it still took hours of staring at about 4 lines of code to figure out what was going on.)
Yes, but consider that this code will be executed only once. The rest of the execution time this mutex is useless.
Consider the deadlock if 'once' is used recursively to initialize different resources...
The mutex is recursive.
Sorry, missed it.
Further, it's quite unintuitive that a trivial initialization might get slowed down by one in another thread that takes a lot of time.
You can call 'pthread_mutex_destroy' once you're done with the mutex to free up eventually acquired system resources.
I'll think of it. The first thing that comes to mind is that I'd have to count threads that are hanging locked in the mutex since destroying it right away would leave those threads in undefined behavior.
It might be possible to use the flag for the counter...
The fundamental problem arises here - I need to safely create a synchronization object. Non-POSIX APIs don't provide things like PTHREAD_MUTEX_INITIALIZER or I didn't find them in the docs.
I see. Would it be an option to use 'yield' instead of 'sleep'?
The "yield" function is not guaranteed to switch execution context, it may return immediately. If a lower-priority thread entered once functor, you may spin for a relatively long time in a "yield" loop instead of just letting the lower-priority thread finish its job.
I figured something like that. Does 'Sleep' guarantee preemption - or does it depend on the argument and the resolution of the system timer?
Under Windows, Sleep(0) only relinguishes to threads of >= priority. Sleep(1) will relinquish to any thread.
For some platforms (such as x86) memory access is atomic, so atomic operations are just a waste of time for simple read/write operations as the 'is_init' and 'set_called' stuff. The point is not only in atomic reads and writes, but in performing memory barriers too. Otherwise the result of executing the once functor could not have been seen by other CPUs.
Then the memory barriers will suffice for x86, correct? As this code is executed on every call, any superfluous bus-locking should be avoided.
Actually, I got the impression that barriers themselves do a major deal of performance impact. Besides, not all compilers support barrier intrinsics.
Alternatively, doing an "uncertain read" to check whether we might need initialization before setting up the read barrier might be close enough to optimal.
Bad wording on my side. Substitute "Alternatively" with "Additionally".
Well, that's a tricky point. I'm not an expert in threading issues, but it's not obvious to me whether a memory barrier should act regardless of its scope. For example:
void foo(int& x, int& y) { if (x == 0) { read_memory_barrier(); y = 10; x = 1; write_memory_barrier(); }
// use y }
Now, is it guaranteed that those barriers are in effect regardless of x value? I think not. Either the compiler may reorder statements in such way that y is used before the "if" statement, or the same thing may be done by CPU since the barrier instructions may not be executed.
That's not quite what I meant:
// 'initialized' started being false
if (initialized) { // 'initialized' is true for sure
'initialized is true for sure, but, without a read barrier, we can't be sure that the objects initialized are seen that way by the current processor. ie // initialized, so use object: object.foo(); // crash, because object not seen as initted
} else { // we can't know 'initialized' is still false, so let's // synchronize and check again
read_memory_barrier();
if (initialized) { // 'initialized' is true } else { // 'initialized' is false } }
Now we only have to cross the bsrrier during (and immediately after) initialization.
Tony

Gottlob Frege wrote:
On 8/26/07, Tobias Schwinger <tschwinger@isonews2.com> wrote:
I don't like the idea of important and *exposed* items being 'accepted as an implementation detail'.
Yep.
However, I'd at least very much welcome a test suite for LWCO.
Unless carefully reviewed, all threading code has bugs. Even if tested. It is the nature of threaded code. It is *extremely* hard to test in such a way that all possible cases are tried.
I hope you're not saying one shouldn't test multi-threaded code just because it's hard to test ;-).
That's not quite what I meant:
// 'initialized' started being false
if (initialized) { // 'initialized' is true for sure
'initialized is true for sure, but, without a read barrier, we can't be sure that the objects initialized are seen that way by the current processor.
It depends on what processor that is. See e.g: http://ridiculousfish.com/blog/archives/2007/02/17/barrier/ for a discussion Most processors have linear write buffers and if 'initialized' is seen as true the object has been written, too. Regards, Tobias

Tobias Schwinger:
Most processors have linear write buffers and if 'initialized' is seen as true the object has been written, too.
The write buffer in thread 1 doesn't affect the reads in thread 2, which can still be reordered. True if by "most processors" you mean "x86", though, absent compiler optimizations. To be on the safe side one needs: if( atomic_load_acquire( &initialized ) != 0 ) { // access object } in thread 2 and // initialize object atomic_store_release( &initialized, 1 ); in thread 1. Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.

Hello Peter, Monday, August 27, 2007, 8:33:11 PM, you wrote:
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
I see only Win32 implementation in the thread_rewrite branch. Besides, it looks like he uses quite a heavy solution using a mutex. Once the new lighter version of Boost.Thread is released I'd happily switch to it too. -- Best regards, Andrey mailto:andysem@mail.ru

Peter Dimov wrote:
Tobias Schwinger:
Most processors have linear write buffers and if 'initialized' is seen as true the object has been written, too.
The write buffer in thread 1 doesn't affect the reads in thread 2, which can still be reordered.
True if by "most processors" you mean "x86", though, absent compiler optimizations.
Well, the article I referenced states that in the particular case we have been discussing one can get away without a second read barrier for PPC as well. Did you catch the full context (as the code was truncated in the previous post)? AFAIK (not claiming to be an expert in this field, however) it takes a rather atypical processor architecture to pull load instructions out of their regular execution path (logically) before a conditional branch, no?
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
Where can I find it? Regards, Tobias

Tobias Schwinger:
AFAIK (not claiming to be an expert in this field, however) it takes a rather atypical processor architecture to pull load instructions out of their regular execution path (logically) before a conditional branch, no?
No. It's not atypical at all to execute loads speculatively, many instructions in advance. The CPU doesn't know whether it will take the branch, it predicts it. Stores that depend on a conditional branch aren't reordered on a PPC, but loads - to the extent of my knowledge - can be. You need an 'isync' instruction after the branch to discard the speculatively executed loads. I'm not a PPC expert either. Feel free to not use barriers on PPC. :-)
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
Where can I find it?
His latest work is here: http://www.justsoftwaresolutions.co.uk/threading/index.html Odd that he's not watching this thread.

"Peter Dimov" <pdimov@pdimov.com> writes:
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
I have implemented a header-only call_once for Windows --- on the thread_rewrite branch in SVN. I plan to get to pthreads sometime in the near future. The win32 implementation of mutex is also header-only, and I plan to add a header-only pthreads implementation too. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Hello Anthony, Tuesday, August 28, 2007, 12:17:45 PM, you wrote:
"Peter Dimov" <pdimov@pdimov.com> writes:
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
I have implemented a header-only call_once for Windows --- on the thread_rewrite branch in SVN. I plan to get to pthreads sometime in the near future.
The win32 implementation of mutex is also header-only, and I plan to add a header-only pthreads implementation too.
Would you be interested to take a look at my implementation? Maybe after some cleaning up it could be moved to your official library? -- Best regards, Andrey mailto:andysem@mail.ru

Andrey Semashev <andysem@mail.ru> writes:
Hello Anthony,
Tuesday, August 28, 2007, 12:17:45 PM, you wrote:
"Peter Dimov" <pdimov@pdimov.com> writes:
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
I have implemented a header-only call_once for Windows --- on the thread_rewrite branch in SVN. I plan to get to pthreads sometime in the near future.
The win32 implementation of mutex is also header-only, and I plan to add a header-only pthreads implementation too.
Would you be interested to take a look at my implementation? Maybe after some cleaning up it could be moved to your official library?
Sorry for the delay in replying. I've looked at the implementation in FSM.zip from the vault --- is that the one you meant? --- and with a quick glance it looks like you've opted for a check/sleep/check/sleep loop for threads that are waiting for another thread to finish running the routine. This is a bad idea. Blocking of this nature should be done by waiting on an OS primitive rather than with a wait loop. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

-----Original Message----- From: Anthony Williams <anthony_w.geo@yahoo.com> To: boost@lists.boost.org Date: Thu, 20 Sep 2007 16:46:11 +0100 Subject: Re: [boost] lightweight_once
Andrey Semashev <andysem@mail.ru> writes:
Hello Anthony,
Tuesday, August 28, 2007, 12:17:45 PM, you wrote:
Would you be interested to take a look at my implementation? Maybe after some cleaning up it could be moved to your official library?
Sorry for the delay in replying.
I've looked at the implementation in FSM.zip from the vault --- is that the one you meant? --- and with a quick glance it looks like you've opted for a check/sleep/check/sleep loop for threads that are waiting for another thread to finish running the routine. This is a bad idea. Blocking of this nature should be done by waiting on an OS primitive rather than with a wait loop.
Why is it that bad? This is safier since there is no opportunity to get an error on the threading primitive construction, it doesn't use system resources like kernel objects and it solves the fundamental problems of creating and destroying those threading primitives in run time. And it will be run only once after all, so performance is not an issue.

áÎÄÒÅÊ óÅÍÁÛÅ× <andysem@mail.ru> writes:
-----Original Message----- From: Anthony Williams <anthony_w.geo@yahoo.com> To: boost@lists.boost.org Date: Thu, 20 Sep 2007 16:46:11 +0100 Subject: Re: [boost] lightweight_once
Andrey Semashev <andysem@mail.ru> writes:
Hello Anthony,
Tuesday, August 28, 2007, 12:17:45 PM, you wrote:
Would you be interested to take a look at my implementation? Maybe after some cleaning up it could be moved to your official library?
Sorry for the delay in replying.
I've looked at the implementation in FSM.zip from the vault --- is that the one you meant? --- and with a quick glance it looks like you've opted for a check/sleep/check/sleep loop for threads that are waiting for another thread to finish running the routine. This is a bad idea. Blocking of this nature should be done by waiting on an OS primitive rather than with a wait loop.
Why is it that bad? This is safier since there is no opportunity to get an error on the threading primitive construction, it doesn't use system resources like kernel objects and it solves the fundamental problems of creating and destroying those threading primitives in run time. And it will be run only once after all, so performance is not an issue.
I think that performance *is* an issue, even though this will only be run once per thread. A check/sleep polling loop is a bad idea, as it consumes CPU time that could be spent actually running the once routine (or another thread that doesn't need to wait). By waiting on an OS primitive, the OS can take the thread out of the schedule until the primitive is ready to be acquired. Not only that, but a check/sleep loop forces a latency of at least the specified sleep time on the waiting thread. If the initialization being waited for only takes a few microseconds (or less --- if it's just a simple initialization it might take only a few nanoseconds), then waiting a whole millisecond is an unnecessary delay. POSIX provides pthread_once. We should use it. The Windows Vista functions look to supply a similar facility, and do at least allow the passing of a parameter to the routine without using TSS. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony, Anthony Williams wrote:
POSIX provides pthread_once. We should use it. The Windows Vista functions look to supply a similar facility, and do at least allow the passing of a parameter to the routine without using TSS.
did you catch my comments on this one (elsewhere in this thread)? <cite> The pthread-based implementation seems overly simplified: It won't compile for C++ functions that are not 'extern "C"' with some compilers. Further, what about exceptions from within pthread_once? </cite> I agree that TSS is not a good alternative, however (but I don't think it's needed for an improved pthread-based implementation). Regards, Tobias

Tobias Schwinger <tschwinger@isonews2.com> writes:
Anthony,
Anthony Williams wrote:
POSIX provides pthread_once. We should use it. The Windows Vista functions look to supply a similar facility, and do at least allow the passing of a parameter to the routine without using TSS.
did you catch my comments on this one (elsewhere in this thread)?
No. Thanks for repeating them here.
<cite> The pthread-based implementation seems overly simplified: It won't compile for C++ functions that are not 'extern "C"' with some compilers. Further, what about exceptions from within pthread_once? </cite>
I haven't written a pthread-based implementation yet, so I'm not sure what you're referring to. Current boost call_once is not header-only, and doesn't support arbitrary callable objects (like my new Windows version does) --- only functions of a fixed signature.
I agree that TSS is not a good alternative, however (but I don't think it's needed for an improved pthread-based implementation).
I'm interested in what you expect an improved pthread-based implementation to look like. I'm thinking it will look very much like Peter's extended pthread_once that took an additional parameter (from N2178) --- i.e. using TSS to pass the parameter to the routine passed to pthread_once, which can then use this parameter data to invoke the user-supplied callable object. Have you got a better way? Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Anthony Williams wrote:
Tobias Schwinger <tschwinger@isonews2.com> writes:
Anthony,
Anthony Williams wrote:
POSIX provides pthread_once. We should use it. The Windows Vista functions look to supply a similar facility, and do at least allow the passing of a parameter to the routine without using TSS. did you catch my comments on this one (elsewhere in this thread)?
No. Thanks for repeating them here.
<cite> The pthread-based implementation seems overly simplified: It won't compile for C++ functions that are not 'extern "C"' with some compilers. Further, what about exceptions from within pthread_once? </cite>
I haven't written a pthread-based implementation yet, so I'm not sure what you're referring to.
I see. I was referring to the one in the thread_rewrite branch, which is basically just pthread_once. Well, I should've tripped over the interface differences, though :-)...
Current boost call_once is not header-only, and doesn't support arbitrary callable objects (like my new Windows version does) --- only functions of a fixed signature.
I agree that TSS is not a good alternative, however (but I don't think it's needed for an improved pthread-based implementation).
I'm interested in what you expect an improved pthread-based implementation to look like. I'm thinking it will look very much like Peter's extended pthread_once that took an additional parameter (from N2178) --- i.e. using TSS to pass the parameter to the routine passed to pthread_once, which can then use this parameter data to invoke the user-supplied callable object. Have you got a better way?
For someone implementing pthread? Sure, just use the stack. For someone implementing a standard runtime? A piece of assembler (or a tricky cast construct) to pick that value from the caller's stack plus some means to keep the compiler from messing it up will most probably do. (Not considering exception propagation as it seems to me it can be implemented on top of it). For us trying to implement a nicer 'call_once' in a portable way? We probably should not be using 'pthread_once' and just do basically the same thing as on Windows (using an aggregate-initialized mutex instead of a named one). Regards, Tobias

Anthony Williams wrote:
it looks like you've opted for a check/sleep/check/sleep loop for threads that are waiting for another thread to finish running the routine. This is a bad idea. Blocking of this nature should be done by waiting on an OS primitive rather than with a wait loop.
Why is it that bad? This is safier since there is no opportunity to get an error on the threading primitive construction, it doesn't use system resources like kernel objects and it solves the fundamental problems of creating and destroying those threading primitives in run time. And it will be run only once after all, so performance is not an issue.
I think that performance *is* an issue, even though this will only be run once per thread.
A check/sleep polling loop is a bad idea, as it consumes CPU time that could be spent actually running the once routine (or another thread that doesn't need to wait). By waiting on an OS primitive, the OS can take the thread out of the schedule until the primitive is ready to be acquired.
Not only that, but a check/sleep loop forces a latency of at least the specified sleep time on the waiting thread. If the initialization being waited for only takes a few microseconds (or less --- if it's just a simple initialization it might take only a few nanoseconds), then waiting a whole millisecond is an unnecessary delay.
POSIX provides pthread_once. We should use it.
Do have a look at the analysis that I did for my ARM atomic shared_ptr code: http://thread.gmane.org/gmane.comp.lib.boost.devel/164564/focus=164893 If the probability of contention is very low, then on average adding even one instruction to the non-contended case, or occupying more icache space with yield() calls, may slow the program down more than yielding on contention would speed it up. The probability of contention depends crucially on the duration of the critical section, and I imagine that this could vary enormously for "once" functions, i.e. anything from a couple of instructions to seconds. So it might be worthwhile having different types of "once" for these different cases - and the same could also be said of mutexes. Take care with the pthreads option. I spent a while trying to understand what the Linux pthreads implementation (in glibc) does (for ARM), and it eventually boils down to much the same as I had written. However it's almost an order of magnitude slower, and I believe that's because it involves a couple of function calls while mine is inline. Since pthreads is a C API, I think that the function call overhead is inevitable. So I have put investigating replacing the pthreads mutexes used by boost.threads with asm on my to-do list (though it may never reach the top). Having said all that, does anyone really worry much about "once" performance? It's not like shared_ptr, where code that uses it may be doing atomic reference count changes fairly continuously. Regards, Phil.

Why is it that bad? This is safier since there is no opportunity to get an error on the threading primitive construction, it doesn't use system resources like kernel objects and it solves the fundamental problems of creating and destroying those threading primitives in run time. And it will be run only once after all, so performance is not an issue.
I think that performance *is* an issue, even though this will only be run once per thread.
A check/sleep polling loop is a bad idea, as it consumes CPU time that could be spent actually running the once routine (or another thread that doesn't need to wait). By waiting on an OS primitive, the OS can take the thread out of the schedule until the primitive is ready to be acquired.
That's true, but consider that: a) sleeping happens only on contention b) the contention is unlikely to happen unless the once routine takes a long time. Besides I'd like to note that spinning in a sleeping loop a couple of rounds may be equivalent (in CPU cycles wasted) to constructing the synchronization object, locking/unlocking and destroying it. These are all calls to kernel and are all expensive. Having said that sleeping may never ever happen, the dance with the synchronization object looks even less performant to me in majority of cases.
Not only that, but a check/sleep loop forces a latency of at least the specified sleep time on the waiting thread. If the initialization being waited for only takes a few microseconds (or less --- if it's just a simple initialization it might take only a few nanoseconds), then waiting a whole millisecond is an unnecessary delay.
With all due respect, I doubt that. It all comes down to the scheduler resolution. Blocking on a mutex doesn't mean you'll wake right when it's unlocked. You may end up with the same latency as sleeping for a tiny period.
POSIX provides pthread_once. We should use it. The Windows Vista functions look to supply a similar facility, and do at least allow the passing of a parameter to the routine without using TSS.
I strongly disagree. If there's something out there it doesn't mean we ought to use it. Especially if it doesn't suit us. I agree with Tobias on his point that it is a really bad idea to propagate exceptions through OS API like pthread_once or InitOnceExecuteOnce. This makes such API of little worth for C++. Finally, I'd like to underline it once more. Out of my experience (and as I see in other postings, not only my) the expences on the first call_once execution are expendable. The main requirements are reliability (in the way it won't fail for some reason unless your code fails) and minimal resource usage after the first execution (this means especially CPU, memory and system resources like kernel objects). Actually, I haven't seen the case being otherwise.

Peter Dimov wrote:
Tobias Schwinger:
Most processors have linear write buffers and if 'initialized' is seen as true the object has been written, too.
The write buffer in thread 1 doesn't affect the reads in thread 2, which can still be reordered.
True if by "most processors" you mean "x86", though, absent compiler optimizations.
Well, the article I referenced states that in the particular case we have been discussing one can get away without a second read barrier for PPC as well. AFAIK (not claiming to be an expert in this field, however) reordering of this kind (that is, crossing branches) is unlikely to happen with most processor architectures (seems it would take redundant pipelines executing both alternatives of a branch - probably an obsolete concept given effective branch prediction).
Hasn't Anthony Williams already implemented a header-only call_once? I'm not sure I see a reason to reinvent that particular wheel. Once boost::mutex is made header-only, there'd be no need for lightweight_mutex either and I'll be able to retire it as well.
Interesting. I very much like the clean-looking code and using a named mutex seems better to me than calling 'Sleep(1)' in a loop for Windows. But again, concurrent calls to 'call_once' block each other. It might make sense to add another variant that (possible comes at a higher cost and) does not have this limitation... The pthread-based implementation seems overly simplified: It won't compile for C++ functions that are not 'extern "C"' with some compilers. Further, what about exceptions from within pthread_once? The coverage of the test-suite seems a bit thin: No recursive application of 'once', no tests with exceptions being thrown, ... All in all, it's a step forward but not mature enough to call it "wheel" that is reinvented by an alternative implementation ;-). Regards, Tobias

Hello Gottlob,
In this particular case the tool will be reviewed during the Boost.FSM review (if it will, since it's not for public use anyway), so including it in X-Files won't reduce the amount of reviews. On the other hand, if Boost.FSM is rejected but there is interest to this tool, I would gladly extract it to the X-Files project.
If your library gets accepted, LWCO is accepted as an implementation detail. AFAIK you need at least a fast-track review to make it a public thing (not sure that's what you want, though).
I don't like the idea of important and *exposed* items being 'accepted as an implementation detail'.
An init_once isn't an easy thing to write. When people review Boost.FSM are they going to take a close look at the 'once' implementation. Probably not. They are going to focus on the central items related to the stated purpose of the library.
This lightweight_once is not a public component and is not a part of Boost.FSM interface. No users will ever see it unless they dig into the library implementation. And I had no intent to make it public anyway - there is Boost.Thread implementation for that purpose. The only reason why I implemented it is that I want my library to be header-only. Therefore I don't see much sense in asking a separate review for something that is an implementation detail of some another library. The fact that I tried to make this detail general enough to be able to be reused somewhere else (e.g. in another library) doesn't mean that it is public.
However, I'd at least very much welcome a test suite for LWCO.
Unless carefully reviewed, all threading code has bugs. Even if tested. It is the nature of threaded code. It is *extremely* hard to test in such a way that all possible cases are tried.
Agreed. -- Best regards, Andrey mailto:andysem@mail.ru

Hello Tobias,
Another potential issue: It seems Win32 and MacOS variants are currently not exception-safe. That is, the initialization routine isn't rerun if it has thrown the first time 'once' was called.
Yep, thanks for spotting that. I'll fix that in a couple of days and update the library archive in the Vault. I'll post here a notification when it's done.
Looking forward to it!
Ok, I reworked the lightweight call_once implementations for most of the supported platforms. Changes include: - Removed lexical_cast and iostreams. - Added Vista API support. This implementation does not emulate mutex by sleeping, thus improving performance. - Fixed exception safety issues. - Explicit memory barriers usage is now enabled by default (if the appropriate support is discovered during compilation). The implementation is uploaded to the Vault and available through the same URL: http://tinyurl.com/yjozfn Since there were major changes, I would really appreciate if someone experienced take a look at it. -- Best regards, Andrey mailto:andysem@mail.ru

Serialization library uses spirit to parse XML. For thread-safety, BOOST_SPIRIT_THREADSAFE should be defined while building serialization library. It will cause linking to thread library anyway. "Robert Ramey" <ramey@rrsd.com> wrote in message news:fabbr2$35q$1@sea.gmane.org...
Thanks for pointing this out.
It looks to me that this would require linking with the thread library.
I was thinking more along the lines of something built with the code in boost/detail/lightweigth_mutex
Robert Ramey
Tobias Schwinger wrote:
Robert Ramey wrote:
I think that very useful in this effort would be lightweight threadsafe singleton.
Recently found the same. Do you want a Singleton with thread-safe initialization? Or one with automatically thread-safe access to its members?
<...>
Someone please tell me its here but that I just haven't found it.
The archive at
contains both of the variants mentioned above.
Regards, Tobias
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Sergey Skorniakov wrote:
Serialization library uses spirit to parse XML. For thread-safety, BOOST_SPIRIT_THREADSAFE should be defined while building serialization library. It will cause linking to thread library anyway.
Hmmm, I run all my tests here and the threading library isn't even built, much less linked in. Even when I use <threading>multi in the Jamfile. So I don't know what to say about this. Robert Ramey

I'm not sure that BOOST_SPIRIT_THREADSAFE is really required in the case of serialization - I'm not very familiar with spirit. But when I build boost with the following user-config jam file: using msvc : 8.0 : : <cxxflags>"-wd4996 -wd4103 -Zp4 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_SECURE_SCL=0 -DBOOST_SPIRIT_THREADSAFE -DBOOST_THREAD_USE_DLL " <linkflags>"-LIBPATH:%BOOSTROOT%/lib" ; the resulting boost_serialization*.dll depends on boost_thread*.dll "Robert Ramey" <ramey@rrsd.com> wrote in message news:faeuaf$vle$1@sea.gmane.org...
Sergey Skorniakov wrote:
Serialization library uses spirit to parse XML. For thread-safety, BOOST_SPIRIT_THREADSAFE should be defined while building serialization library. It will cause linking to thread library anyway.
Hmmm, I run all my tests here and the threading library isn't even built, much less linked in. Even when I use <threading>multi in the Jamfile. So I don't know what to say about this.
Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I could be that I since serialization isn't currently threadsafe anyway, No one builds with this switch and the issue doesn't come up. I would have hoped that this would be automatically taken care of by the following sequence of conditions: bjam <threading>multi => config/BOOST_HAS_THREADS =>inclusion of reference to threading library => link with threading library. Also, I haven't looked at the threading library in enough depth to understand why the whole library can't be a header only library. After all, isn't is basically a wrapper around native OS primitives? Robert Ramey Sergey Skorniakov wrote:
I'm not sure that BOOST_SPIRIT_THREADSAFE is really required in the case of serialization - I'm not very familiar with spirit. But when I build boost with the following user-config jam file:
using msvc : 8.0 : : <cxxflags>"-wd4996 -wd4103 -Zp4 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_SECURE_SCL=0 -DBOOST_SPIRIT_THREADSAFE -DBOOST_THREAD_USE_DLL " <linkflags>"-LIBPATH:%BOOSTROOT%/lib" ;
the resulting boost_serialization*.dll depends on boost_thread*.dll

I could be that I since serialization isn't currently threadsafe anyway, No one builds with this switch and the issue doesn't come up.
Yes. But I had patched library to make it thread-safe and also define BOOST_SPIRIT_THREADSAFE to be on the safe side. And I'm wrong - BOOST_SPIRIT_THREADSAFE is not required here. It should be used only if grammar instance shared between threads.
I would have hoped that this would be automatically taken care of by the following sequence of conditions:
bjam <threading>multi => config/BOOST_HAS_THREADS =>inclusion of reference to threading library => link with threading library.
No, as it mentioned in documentation (http://www.boost.org/libs/spirit/doc/grammar.html#multithreading).
Also, I haven't looked at the threading library in enough depth to understand why the whole library can't be a header only library. After all, isn't is basically a wrapper around native OS primitives?
I think, the whole threading library can't header-only, at least under windows, because it should handle TSS cleanup, and the simplest and the safiest way to achieve it automatically is to process DLL_THREAD_ATTACH / DLL_THREAD_DETACH notifications into DllMain. However, I see no reasons why simple mutexes can't resides in headers completely.
Robert Ramey
Sergey Skorniakov wrote:
I'm not sure that BOOST_SPIRIT_THREADSAFE is really required in the case of serialization - I'm not very familiar with spirit. But when I build boost with the following user-config jam file:
using msvc : 8.0 : : <cxxflags>"-wd4996 -wd4103 -Zp4 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_SECURE_SCL=0 -DBOOST_SPIRIT_THREADSAFE -DBOOST_THREAD_USE_DLL " <linkflags>"-LIBPATH:%BOOSTROOT%/lib" ;
the resulting boost_serialization*.dll depends on boost_thread*.dll
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (12)
-
Andrey Semashev
-
Anthony Williams
-
Gottlob Frege
-
Kim Barrett
-
Michael Marcin
-
Peter Dimov
-
Phil Endecott
-
Robert Ramey
-
Sergey Skorniakov
-
Sohail Somani
-
Tobias Schwinger
-
Андрей Семашев