[serialization] shared libraries & extended_type_info registration

Hi, I have some trouble about extended_type_info registration using BOOST_CLASS_EXPORT macro with shared library. Indeed an assertion forbides the registration of the same extended_type_info many times. This cases occurs when defining serialization of derivated classes in different libraries for example considers the 2 libraries : * the first one define a class and its serialization scheme * the second one define a class derived from the class in the first libs and its serialization scheme A program using this two libraries to serialize classes raise an assertion at runtime due to double registration of the base class [ file serialization/extended_type_info.cpp line 74 // make sure that attempt at registration is done only once assert(lookup(eti) == m_self->m_map.end()); ] Here the detail implementation of libraries LIB1 * files : Base.hpp ; Base.cpp defining a base class ( no serialization code here) * file : BaseSerilization.hxx which contains the definition of the non intrusive template function serialize ( no macro BOOST_CLASS_EXPORT here ) * file BaseSerilizationInstanciation.cpp which explicit instanciate the previous function with a fixed Archive class, this file also contain the macro BOOST_CLASS_EXPORT(Base) LIB2 * file Derivated.hpp, Derivated.cpp defining derivated class from Base ( no serialization code here) * file : DerivatedSerilization.hxx which contains the definition of the non intrusive template function serialize ( no macro BOOST_CLASS_EXPORT here ) ** #include "lib1/BaseSerilization.hxx" ** in fonction serialize .... use of base_object< lib1::Base > * file DerivatedInstanciation.cpp which explicit instanciate the previous function with a fixed Archive class, this file also contain the macro BOOST_CLASS_EXPORT(Derivated) a program using these 2 libraries raise an assertion file serialization/extended_type_info.cpp line 74. Indeed the macro BOOST_CLASS_EXPORT(Base) register the extended_type_info of Base. the macro BOOST_CLASS_EXPORT(Derivated) register the extended_type_info of Base *and* the void_caster register in serialize(... Derivated ...) register *again* Base class raising the assertion. So I have some questions 1. Is my design correct ? My programme work fine (Linux&Windows) in release mode. 2. why forbides multi-registration of the same extended_type_info ? 3. what is the interest using std::multiset for unique element instead std::set for tkmap::m_map type ? Regards, Vincent Agnus

There are two issues with the serialization library a) Its not currently thread safe b) Serialization code can be duplicated in different shared libraries. It's not clear what this implies. The first is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens). I'm still thinking about the second. What it means and the best way to address it. Robert Ramey

On Fri, 2007-05-10 at 08:29 -0700, Robert Ramey wrote:
The first (problem: thread-safety,) is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens).
Awesome! Hands up everyone who has one! /me raises hand Sohail

On Fri, 5 Oct 2007 08:29:50 -0700 "Robert Ramey" wrote:
There are two issues with the serialization library
a) Its not currently thread safe b) Serialization code can be duplicated in different shared libraries. It's not clear what this implies.
The first is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens).
Hi Robert, I'd like to use Boost serialization in threaded applications and am thrilled to hear that you have a solution in mind. Are there any (possibly simple?) rules that you could state that would guarantee thread-safe use with existing Boost serialization versions? I've run some simple examples using Boost serialization with threads and it seems to work correctly if all the objects to be serialized and the serialization streams are owned (write-able) by exactly one thread. Are there any guarantees that a one-thread-owns-it-all or perhaps some other approaches are safe with the current implementation? Many thanks! Ed -- Edward H. Hill III, PhD | ed@eh3.com | http://eh3.com/

On Fri, 5 Oct 2007 08:29:50 -0700 "Robert Ramey" wrote:
There are two issues with the serialization library
a) Its not currently thread safe b) Serialization code can be duplicated in different shared libraries. It's not clear what this implies.
The first is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens).
Hi Robert, I'd like to use Boost serialization in threaded applications and am thrilled to hear that you have a solution in mind. Are there any (possibly simple?) rules that you could state that would guarantee thread-safe use with existing Boost serialization versions? *** Just refrain from creating more than one archive at a time. Actually, since the need from multiple open archives is not a common scenarrio, This has been a problem only very few people have actually had. Of course this is small consolation for those that actually have it. Are there any guarantees that a one-thread-owns-it-all or perhaps some other approaches are safe with the current implementation? *** one thread owns all serialization would work as would using your semaphore to guarentee the above Robert Ramey

Hi, robert Thank you for your answer. My application is mono thread and duplicated code for seralization is granted to be the same. Our code actually forbide this design. What about replacing line 74&75 of extended_type_info.cpp : 74 assert(lookup(eti) == m_self->m_map.end()); 75 m_self->m_map.insert(eti); by if ( lookup(eti) == m_self->m_map.end() ) { m_self->m_map.insert(eti); } ? With this new behavior the same type_info can be registred several time. With this modification my design work fine. I can send you a sample of code if you want to investigate the problem Best regards, Vincent Agnus Robert Ramey wrote:
There are two issues with the serialization library
a) Its not currently thread safe b) Serialization code can be duplicated in different shared libraries. It's not clear what this implies.
The first is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens).
I'm still thinking about the second. What it means and the best way to address it.
Robert Ramey
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

I am aware of this issue and its solution. I not convinced that this solution doesn't have some detrimental side effects or whether it will create some very difficult to find problem. That's where it stands now. Robert Ramey Vincent Agnus wrote:
Hi, robert
Thank you for your answer. My application is mono thread and duplicated code for seralization is granted to be the same. Our code actually forbide this design. What about replacing line 74&75 of
extended_type_info.cpp : 74 assert(lookup(eti) == m_self->m_map.end()); 75 m_self->m_map.insert(eti);
by
if ( lookup(eti) == m_self->m_map.end() ) { m_self->m_map.insert(eti); }
? With this new behavior the same type_info can be registred several time.
With this modification my design work fine. I can send you a sample of code if you want to investigate the problem
Best regards,
Vincent Agnus
Robert Ramey wrote:
There are two issues with the serialization library
a) Its not currently thread safe b) Serialization code can be duplicated in different shared libraries. It's not clear what this implies.
The first is on a path to final solution. It will depend upon the existence a thread-safe lazily initialized singleton which will have to be provided by the user until boost has such a thing (if that ever happens).
I'm still thinking about the second. What it means and the best way to address it.
Robert Ramey
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Thank you for your answer. My application is mono thread and duplicated code for seralization is granted to be the same. Our code actually forbide this design. What about replacing line 74&75 of
extended_type_info.cpp : 74 assert(lookup(eti) == m_self->m_map.end()); 75 m_self->m_map.insert(eti);
by
if ( lookup(eti) == m_self->m_map.end() ) { m_self->m_map.insert(eti); }
? With this new behavior the same type_info can be registred several time.
With this modification my design work fine. I can send you a sample of code if you want to investigate the problem
This modification will just leads to hard-to-find bugs. Take a look how this information is used in the library - for example, void_caster_compare uses extended_type_info::operator< and it will works wrong (results will be similar as derived class is not registered), because same type infos will be treated as different. Comparison of extended_type_info are also used in basic_archive_impl::helper_compare, in basic_serializer::operator< and may be in some other places. Thus, to support multiple registration, you should provide such extended_type_info::operator< implementation that duplicated type infos from different libs will be equivalent. I had just added double-linked list of extended_type_info to extended_type_info and adds duplicated type infos to the tail of this list. For operator< implemented as comparison of lists heads (first registered type info). Also, modification of basic_serializer::operator< required because it compares addresses of type infos instead of using extended_type_info::operator<.

This is my concern as well. I am interested in addressing this in a definitive way. But its not trivial to do a good job which can be proven to work while at the same time not adding a performance hit to every user of the package. Robert Ramey Sergey Skorniakov wrote:
Thank you for your answer. My application is mono thread and duplicated code for seralization is granted to be the same. Our code actually forbide this design. What about replacing line 74&75 of
extended_type_info.cpp : 74 assert(lookup(eti) == m_self->m_map.end()); 75 m_self->m_map.insert(eti);
by
if ( lookup(eti) == m_self->m_map.end() ) { m_self->m_map.insert(eti); }
? With this new behavior the same type_info can be registred several time.
With this modification my design work fine. I can send you a sample of code if you want to investigate the problem
This modification will just leads to hard-to-find bugs. Take a look how this information is used in the library - for example, void_caster_compare uses extended_type_info::operator< and it will works wrong (results will be similar as derived class is not registered), because same type infos will be treated as different. Comparison of extended_type_info are also used in basic_archive_impl::helper_compare, in basic_serializer::operator< and may be in some other places. Thus, to support multiple registration, you should provide such extended_type_info::operator< implementation that duplicated type infos from different libs will be equivalent. I had just added double-linked list of extended_type_info to extended_type_info and adds duplicated type infos to the tail of this list. For operator< i mplemented as comparison of lists heads (first registered type info). Also, modification of basic_serializer::operator< required because it compares addresses of type infos instead of using extended_type_info::operator<.

I think that performace penalty for my solution (http://lists.boost.org/boost-users/att-28092/multireg.zip) is miserable. If type is registered only once, I see 3 thing that are not priceless:
1) operations on extended_type_info are slightly slower. If type registered only once, the price is about just checking one pointer (m_prev) for equality to zero.
2) size of extended_type_info is bigger (two additional pointers).
3) performance of basic_serializer::operator< is most inflicted, because comparison of two pointer is changed to comparison of extended_type_info. However, it is still indistinguishable in the background of streams i/o performance.
And, finally, all code for multy-registration support can be easily cutted off with #ifdef.
Unfortunately, multiple registration of types is not the only problem if serialization code is spreaded between several Dlls - there are also some problems with tracking. For example, this problem: http://lists.boost.org/boost-users/2007/08/30275.php can be directly translated to situation with multiple shared libs. Different dlls/exe in this case can be considered as different versions of serialization code. I had failed to find a cheap way to fix this issue on library level. As a temporary workaround (for MSVC only) I had developed an utility that checks symbols exported from exe/dlls and finds suspicious situations (if type serialized by pointer and by value (tracking enabled) in one module and only by value (tracking disabled) in another one), but such approach is ugly and not portable.
"Robert Ramey"
This is my concern as well.
I am interested in addressing this in a definitive way. But its not trivial to do a good job which can be proven to work while at the same time not adding a performance hit to every user of the package.
Robert Ramey
Sergey Skorniakov wrote:
Thank you for your answer. My application is mono thread and duplicated code for seralization is granted to be the same. Our code actually forbide this design. What about replacing line 74&75 of
extended_type_info.cpp : 74 assert(lookup(eti) == m_self->m_map.end()); 75 m_self->m_map.insert(eti);
by
if ( lookup(eti) == m_self->m_map.end() ) { m_self->m_map.insert(eti); }
? With this new behavior the same type_info can be registred several time.
With this modification my design work fine. I can send you a sample of code if you want to investigate the problem
This modification will just leads to hard-to-find bugs. Take a look how this information is used in the library - for example, void_caster_compare uses extended_type_info::operator< and it will works wrong (results will be similar as derived class is not registered), because same type infos will be treated as different. Comparison of extended_type_info are also used in basic_archive_impl::helper_compare, in basic_serializer::operator< and may be in some other places. Thus, to support multiple registration, you should provide such extended_type_info::operator< implementation that duplicated type infos from different libs will be equivalent. I had just added double-linked list of extended_type_info to extended_type_info and adds duplicated type infos to the tail of this list. For operator< i mplemented as comparison of lists heads (first registered type info). Also, modification of basic_serializer::operator< required because it compares addresses of type infos instead of using extended_type_info::operator<.
participants (5)
-
Ed Hill
-
Robert Ramey
-
Sergey Skorniakov
-
Sohail Somani
-
Vincent Agnus