[spirt] question regarding BOOST_SPIRIT_THREADSAFE and BOOST_SPIRIT_SINGLE_GRAMMAR_INSTANCE

From this I want to conclude that phase d) does not change
I've just discovered the existence of these manifest constants in spirit and checked the documentation and code to determine what they do. Of course figuring what spirt code does and how it does is not easy to get right for the casual observer so feel free to correct and any mis-conceptions / conclusions that I 've got. My general conception is that parsing can be consider as various quasi - independent phases. a) specification of grammar - source code preparation b) compiliation c) instantiontion/construction of grammar definition d) usage of c) to parse a text string e) destruction of instance. the grammar definition so so that in a multi-threading environment, the invocation of the scoped_lock would be limited to phases c) and e). That is, I want to believe that the following example adapted from the spirit documentation would work: const my_grammar g; // note addition of const if (parse(first, last, g, space_p).full) cout << "parsing succeeded\n"; else cout << "parsing failed\n"; The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries. I never realised that spirt wasn't thread-safe. I would like this approach to include xml_archives. So my quesion is: Can I instantiate my xml grammar as a static variable at pre-execution time and permit it to be used by mulitipe threads without locking? Robert Ramey

Robert Ramey wrote:
The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries.
What happens if one explicitly loads a dynamic library, which uses serialization? I'd guess the types serialized by that library should be added to the global table of serialized type (at least for types marked with BOOST_CLASS_EXPORT), so it's non-const operation done after main. Am I missing something? - Volodya

You're absolutely correct. So technically, the library is not thread safe. One cannot dynamically load/unload dlls while archives are being save or loaded. So I guess I should say its thread safe except for a small set of specific operations which must be sequenced. I started out looking at various ways of using mutexes. But they all led to the requirement that I link in the threading library and a large slow down in the save/load operations. After much consideration I felt that "aiming lower" was the correct choice in this case. This experience has made me a little wary of the mutex approach to multi-core processing and increased my interest in lock-free data structures. And when I looked into spirit, for the first time in a very long time, it occured to me that spirit might benefit from using a similar approach. If it can't, the usage of spirit for xml input has made the serialization library again dependent on the threading library. I anxiously await more information on this. Robert Ramey Vladimir Prus wrote:
Robert Ramey wrote:
The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries.
What happens if one explicitly loads a dynamic library, which uses serialization? I'd guess the types serialized by that library should be added to the global table of serialized type (at least for types marked with BOOST_CLASS_EXPORT), so it's non-const operation done after main. Am I missing something?
- Volodya
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

on Mon Jul 28 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:
Robert Ramey wrote:
The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries.
What happens if one explicitly loads a dynamic library, which uses serialization? I'd guess the types serialized by that library should be added to the global table of serialized type (at least for types marked with BOOST_CLASS_EXPORT), so it's non-const operation done after main. Am I missing something?
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization. HTH, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
on Mon Jul 28 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:
Robert Ramey wrote:
The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries.
What happens if one explicitly loads a dynamic library, which uses serialization? I'd guess the types serialized by that library should be added to the global table of serialized type (at least for types marked with BOOST_CLASS_EXPORT), so it's non-const operation done after main. Am I missing something?
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
How deferring registration of serialization is better/easier than deferring loading of DLL itself. Suppose that application, in response to user's action, decides to load a plugin and call a function there, *now*. Then, either you need to: - load DLL - call a function or: - load DLL - tell it to register all serialized classes - call a function and if should be done while some other thread is potentially busy doing the same, I don't really see practical difference between those two approaches. - Volodya

David Abrahams wrote:
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething. b) loading/unloading DLLS occurred one at at time. It seems to me that this is enough to avoid problems. Am I missing something? Robert Ramey

Hi Robert.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to
a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething.
b) loading/unloading DLLS occurred one at at time.
It seems to me that this is enough to avoid problems. Am I missing something?
Sound good to me. Just be sure to make this explicitly documented. :-) I think Windows internally serializes DLL loading unloading also, but you'd have to check the DLLMain() documentation in MSDN for more information on that. I seem to recall it giving some list of things you can or can not do there or you risk causing a dead-lock. Hope this helps. Best regards, Jurko Gospodnetić

Robert Ramey wrote:
David Abrahams wrote:
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to
a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething.
Is this viable? Plugins are not necessary loaded at startup, they may be opened during normal work of a program, and I don't know how an application can reasonably check that some other thread is in "midst of serialization something". Do you suggest that users employ a global mutex that will be help when either: - serializing anything - loading a DLL ? - Volodya

Vladimir Prus wrote:
Robert Ramey wrote:
David Abrahams wrote:
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to
a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething.
Is this viable?
I believe it is
Plugins are not necessary loaded at startup, they may be opened during normal work of a program,
I've presumed hat DLLS loaded at startup present no problems. So the concern would be when a user invokes load_dll from different threads.
and I don't know how an application can reasonably check that some other thread is in "midst of serialization something".
well, that would depend upon the application. mult-threaded applications which don't explicitly use load/unload DLL should be able to ignore this. That covers a lot (most) of the cases. The interesting case is where a user has used BOOST_CLASS_EXPORT and has not explicitly instantiated his classes at compile time. This is the case of "plug-ins".
Do you suggest that users employ a global mutex that will be help when either serializing anything - loading a DLL
In this case one will have to ensure that load_dll and serialization occur sequentially. Not a huge problem as one needs to load the dll before using it anyway. Also, one will have to ensure that dlls containing serialization code are not loaded/unloaded while serializaiton is underway in another thread. One way for users to do this would be through a global mutex. Whether this is the best or only way would be an issue that the application would have to address. The idea of including polymorphic serialization along with class implementation code in a dynamically loaded DLL is a powerful one that I wanted to support. I believe I've accomplished this in the serialization library with the recent improvements in extended_type_info. I've tested this functionality "by hand" with windows and it seems to work as I would hoped. I don't know if anyone else will actually find it useful - but hope springs eternal. Robert Ramey

on Mon Jul 28 2008, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
David Abrahams wrote:
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to
a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething.
b) loading/unloading DLLS occurred one at at time.
It seems to me that this is enough to avoid problems. Am I missing something?
No; you're probably right. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Sorry I'm late to this - been very slowly trying to catch up on my reading. Robert Ramey wrote:
David Abrahams wrote:
Suggestion: handle that by deferring the availability of the registrations associated with that DLL until the user explicitly says it's safe to add them. The idea is that every DLL adds things to its local registry and in a MT application, that is only combined with the global registry via an explicit call, when presumably the user knows no threads are doing serialization.
I have always presumed that loading of DLLS would be under the control of the user program so that he could take appropriate steps to
There are places where this sort of thing is not under the control of the application, or the person writing the code using serialisation has no control of. In our code we expose functionality through COM in a number of DLLs so the load order and timing of the DLLs is under control of the programs that use our objects. They can end up loading a DLL at any time and as our COM instances are used to implement web site functionality the timing of when COM objects are required is under control of the users visiting the sites. We have a similar requirement for a static library of factories for serialisation (in our case for O/RM, but the principle is the same) and when I put all of this together it seemed that the only way to do it reliably was to use a read/write mutex around accesses to the static data. This way most of the reads can still happen in parallel with the very occasional writes taking exclusive control of the structure.
a) be sure that loading/unloading of DLLS wasn't occurring while in the midst of serialization stomething.
b) loading/unloading DLLS occurred one at at time.
It seems to me that this is enough to avoid problems. Am I missing something?
I think that the lower level libraries don't have the information to tell this for sure -- your library users who are writing application level code can make a decision based on this, but anybody writing a library on top of yours still may not know enough to be able to control this either. Kirit

Kirit Sælensminde wrote:
We have a similar requirement for a static library of factories for serialisation (in our case for O/RM, but the principle is the same) and when I put all of this together it seemed that the only way to do it reliably was to use a read/write mutex around accesses to the static data. This way most of the reads can still happen in parallel with the very occasional writes taking exclusive control of the structure.
I looked into this very carefully. I could find no way to do it without a very large performance penalty. It would also require inclusion of the boost threading library - which might conflict with other libraries. This price would be paid by all users - unless there was a way to turn it off - another complication. It just wasn't worth it. I take great pains to avoid adding so many disjoint features to the library that it becomes too bloated and too hard to understand. The current system requires that DLLS not be loaded/unloaded while an archive is open. I presumed that such a restriction would be easy to live by. If for some case it's not, you could make a semaphore which you would acquire for the actions dll load/dll unload/archive load/save. Of course you'll have to worry about deadlocks, but you've probably already taking steps to avoid that. It would be possible to make an archive adaptor for which could be added to any archive (through multiple derivation) which would automate the semaphore aquisition. If all your code used these enhanced archives (and you dll loading/unloading was similarly enhanced), you could "fire and forget" with almost no performance penalty as the semaphores would be accessed only once per archve open/close rather than for every access. If this suggestion isn't good enough for you, I'm sorry, its the best I can do. Robert Ramey

On Fri, Aug 29, 2008 at 10:17 AM, Robert Ramey <ramey@rrsd.com> wrote:
We have a similar requirement for a static library of factories for serialisation (in our case for O/RM, but the principle is the same) and when I put all of this together it seemed that the only way to do it reliably was to use a read/write mutex around accesses to the static data. This way most of the reads can still happen in parallel with the very occasional writes taking exclusive control of the structure.
[snip]
It would be possible to make an archive adaptor for which could be added to any archive (through multiple derivation) which would automate the semaphore aquisition. If all your code used these enhanced archives (and you dll loading/unloading was similarly enhanced), you could "fire and forget" with almost no performance penalty as the semaphores would be accessed only once per archve open/close rather than for every access.
Sorry, if I'm being too late, but I second that request. Why wouldn't the library provide such an adapter? I would really like to see the library that "just works" (tm), out of box. If not with configuration macros, but at least with a specific toolset. I am sure there are cases where performance matters less than thread safety and plugin support.

Robert Ramey wrote:
My general conception is that parsing can be consider as various quasi - independent phases.
a) specification of grammar - source code preparation b) compiliation c) instantiontion/construction of grammar definition d) usage of c) to parse a text string e) destruction of instance.
From this I want to conclude that phase d) does not change the grammar definition so so that in a multi-threading environment, the invocation of the scoped_lock would be limited to phases c) and e).
That is, I want to believe that the following example adapted from the spirit documentation would work:
const my_grammar g; // note addition of const
if (parse(first, last, g, space_p).full) cout << "parsing succeeded\n"; else cout << "parsing failed\n";
The reason I ask is that the serialization library uses spirit to parse xml archives. Recently, I made changes to make the library thread-safe. I did this without using mutexes/locks by confine all non-const operations to construction and destruction and using static object constructed/destructed at pre-main and post main time time. Although the jury is still out on this, I believe it will make the serialization library thread safe without the need for using threading primitives and libraries.
Alas, it's not as simple as that. The grammar itself is thread-safe. It's the instantiation of the grammar::definition that is not, and requires BOOST_SPIRIT_THREADSAFE, and hence, reliance on Boost.Threads. (FWIW, Spirit-2 solves the thread-safety problems and does away with Threads. I won't expect you to switch to Spirit2 any time soon though.) A potential solution is to "prime" the grammar at pre-main. There's no direct way to do it ATM, but you can do a dummy parse to achieve the effect (e.g. have a static object that does that in its constructor). Another potential solution (one that Spirit2 does) is to instantiate the grammar::definition directly in the stack and call its "start" rule directly. Also, if you do it this way, don't use closures (I recall you are using ASTs, right?). Spirit-1 has problems with thread-safety in 2 areas: grammar-definitions and closures. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net
participants (7)
-
Andrey Semashev
-
David Abrahams
-
Joel de Guzman
-
Jurko Gospodnetić
-
Kirit Sælensminde
-
Robert Ramey
-
Vladimir Prus