Serialization: BOOST_CLASS_EXPORT changes between 1.38 and 1.52

I'm upgrading a large data analysis project from boost 1.38 to 1.52 (see this 2008 Boostcon talk for details: http://www.icecube.umd.edu/~troy/talks/icefishing.pdf). Everything has gone remarkably smoothly, with a minimal degree of #if, except for one thing: a change in the semantics of BOOST_CLASS_EXPORT. The software is organized into a number of shared libraries each of which have serializable classes that generally inherit from a shared base class, through pointers to which the derived classes are almost always [de]serialized to/from a custom archive type. When using 1.38, the BOOST_CLASS_EXPORT directive was kept in the implementation file for each serializable class. This almost still works for 1.52 but can result in strange behavior that took me two days to track down. Some of our classes would report through unregistered class errors on [de]serialization. Which ones would depend on circumstances, compiler version, linker, and operating system. The reason turns out to be changes in handling of the class GUID. This is associated to the pointer serializer by BOOST_CLASS_EXPORT -- but also now apparently by the first instantiated code [de]serializing the class through that archive directly based on the presence of a template specialization of boost::serialization::guid<T>() currently in scope, which is NULL by default. Once NULL is set as the GUID, it stays that way, regardless of a proper BOOST_CLASS_EXPORT later. This occurred in our case as a result of def_pickle() calls in boost.python bindings that used the same classes and archive type. This can be fixed by BOOST_CLASS_EXPORT_KEY in all appropriate header files. Adding it is a sizeable amount of work (hundreds of classes) but I'm more concerned the mechanism is very fragile in the case of templates and I'm not sure what to do there. For example, we have a generic serializable container. Any specialization of it needs to be registered with BOOST_CLASS_EXPORT, which is fine, but anyone using that specialization needs to also include a header file BOOST_CLASS_EXPORT_KEY() in it for that particular specialization as well. In other words, you can't just use container<Foo> and expect it to work reliably and, moreover, if you use it *once* without having seen the right in-scope macro, it will break serialization of that type globally if you happen to have linked against a library that made this mistake. Probably. Depending on initialization order. So my questions: 1. Is it possible for things that would return GUIDs of NULL to try harder and look in a global registry instead of silently breaking things? This kind of global lookup was how 1.38 always worked and it seems considerably less fragile. 2. Is there a way to handle BOOST_CLASS_EXPORT_KEY() sanely in the case of templates without the risk of silent serialization failures -- in all instances of that class -- that depend on global initialization order? 3. Is it possible to change the GUID set in the extended type info object of a pointer_[i/o]serializer at runtime after the class has been added to the export registry? 4. Are there any suggested mechanisms for local hacks, given that we control the archive implementation, to implement 1-3 without changes to boost? -Nathan

Nathan Whitehorn wrote:
The reason turns out to be changes in handling of the class GUID. This is associated to the pointer serializer by BOOST_CLASS_EXPORT -- but also now apparently by the first instantiated code [de]serializing the class through that archive directly based on the presence of a template specialization of boost::serialization::guid<T>() currently in scope, which is NULL by default. Once NULL is set as the GUID, it stays that way, regardless of a proper BOOST_CLASS_EXPORT later. This occurred in our case as a result of def_pickle() calls in boost.python bindings that used the same classes and archive type.
This can be fixed by BOOST_CLASS_EXPORT_KEY in all appropriate header files. Adding it is a sizeable amount of work (hundreds of classes) but I'm more concerned the mechanism is very fragile in the case of templates and I'm not sure what to do there. For example, we have a generic serializable container. Any specialization of it needs to be registered with BOOST_CLASS_EXPORT, which is fine, but anyone using that specialization needs to also include a header file BOOST_CLASS_EXPORT_KEY() in it for that particular specialization as well. In other words, you can't just use container<Foo> and expect it to work reliably and, moreover, if you use it *once* without having seen the right in-scope macro, it will break serialization of that type globally if you happen to have linked against a library that made this mistake. Probably. Depending on initialization order.
It's been a while since I did this, but I'll try to respond to the best of my recollection. BOOST_CLASS_EXPORT in it's original form created a lot of problems. This is/was expecially true in the case of DLLS where instances of the "guid record" were created every time a class was referred to. This created a bunch of "dangling" guid records. The fix was to make clear the distiction between declaring a key and instantiating the guid record. This sounds simple and obvious as I explain it here, but in practice it took a while to figure out exactly what to do. On top of that, there is the issue of getting the right stuff instantiated which required a bunch of wierd TMP to implement. Much of this code was contributed by our own TMP guru - David Abrahams. The current situation is "more correct". So though I appreciate what a pain it is to change all the BOOST_CLASS_EXPORT to BOOST_CLASS_EXPORT_KEY - I think this is the best solution as it will make your system better and less dependent on the "quirky" behavior of BOOST_CLASS_EXPORT.
So my questions: 1. Is it possible for things that would return GUIDs of NULL to try harder and look in a global registry instead of silently breaking things? This kind of global lookup was how 1.38 always worked and it seems considerably less fragile.
The old method did instantiation by default so it worked then in some cases where the current one won't. So it seems "less fragile". But I think that's sort of an illusion. It does so at a cost of gratuitous instantiations which often are harmless - though non-optimal. But the real problem is that it left this out of the hands of the programmer. This could lead to silent and surprising behavior. Now we have the situtation where this behavior can't happen - we have to explicitly plan for it. I believe that this leads to less surprising programs - albiet at the cost of some surprising behavior at build time.
2. Is there a way to handle BOOST_CLASS_EXPORT_KEY() sanely in the case of templates without the risk of silent serialization failures -- in all instances of that class -- that depend on global initialization order?
I believe that the best way to do this is to just do an explicit instantiation in a cpp file which imports the header containing BOOST_CLASS_EXPORT_KEY() and includes BOOST_CLASS_EXPORT_IMPL(). Once compiled, this can be added to a library or DLL. This will result in one and only instance of the class serialization existing in the program rather than mulitple ones (in the case of DLLS). Less code and better yet, this eliminates the possibility that the mainline module and the dll have different versions of the code which would be agony of the worst type to track down.
3. Is it possible to change the GUID set in the extended type info object of a pointer_[i/o]serializer at runtime after the class has been added to the export registry?
I have never considered this. I don't see what this would be used for. The singleton class table is never modified after it is constructed (before main is called). This is necessary for the serialization library to be thread-safe.
4. Are there any suggested mechanisms for local hacks, given that we control the archive implementation, to implement 1-3 without changes to boost?
to re-summarize my suggestion above. a) change all the headers to use BOOST_CLASS_EXPORT_KEY() b) make a small *.cpp file for each header which imports the header and invokes BOOST_CLASS_EXPORT_IMPL(). c) add your small *.cpp file to your library - either static library or dll. d) while you're at it, you might want to consider adding the serialize, save, and load functions for the class to the *.cpp file and not making them inline. This will eliminate any code bloat generated by the serialization library. If your DLLS are dynamically loadable, they will only occupy memory when the the classes they refer to are actually being used at runtime. (just don't load/unload the DLLS while multi-threading - use a mutex!) it seems you've touched upon the issue regarding serialization of template classes. This was also touched upon in a previous email. Currently we have to explicitly instantiate any templates we want to serialize. Automatically instantion of template generated classes using some combination of enable_if, partial specialization and who knows what else is interesting to consider, but likely much trickier than first meets eye. Also our "guid" is a string which can only be processed at runtime. Replacing this with a "guid" generated at compile time from the class name, might make somethings possible which weren't before. This is sort of irrelevant to your current situation, but I like to keep the pot boiling. Robert Ramey
-Nathan

On 02/07/13 10:36, Robert Ramey wrote:
Nathan Whitehorn wrote:
[trimmed]
So my questions: 1. Is it possible for things that would return GUIDs of NULL to try harder and look in a global registry instead of silently breaking things? This kind of global lookup was how 1.38 always worked and it seems considerably less fragile.
The old method did instantiation by default so it worked then in some cases where the current one won't. So it seems "less fragile". But I think that's sort of an illusion. It does so at a cost of gratuitous instantiations which often are harmless - though non-optimal. But the real problem is that it left this out of the hands of the programmer. This could lead to silent and surprising behavior. Now we have the situtation where this behavior can't happen - we have to explicitly plan for it. I believe that this leads to less surprising programs - albiet at the cost of some surprising behavior at build time.
Thanks for the explanation! What I'm running into is surprising behavior at *run time* despite a build that apparently works. We've managed to avoid all the compile and link time redefinition issues so far by some combination of planning and luck.
2. Is there a way to handle BOOST_CLASS_EXPORT_KEY() sanely in the case of templates without the risk of silent serialization failures -- in all instances of that class -- that depend on global initialization order?
I believe that the best way to do this is to just do an explicit instantiation in a cpp file which imports the header containing BOOST_CLASS_EXPORT_KEY() and includes BOOST_CLASS_EXPORT_IMPL(). Once compiled, this can be added to a library or DLL. This will result in one and only instance of the class serialization existing in the program rather than mulitple ones (in the case of DLLS). Less code and better yet, this eliminates the possibility that the mainline module and the dll have different versions of the code which would be agony of the worst type to track down.
This is basically what we were already doing (there were never
BOOST_CLASS_EXPORT() or -- with the exception of some templated things
-- inlineable serialize() routines in header files for the reasons you
mention). What I'm concerned about is a situation like this:
Main library:
Header:
template<typename T>
class I3Vector : public std::vector<T>, public I3FrameObject (our base
class) {
private:
serialize();
};
BOOST_CLASS_EXPORT_KEY(I3Vector<T>) for a variety of T
Implementation:
3. Is it possible to change the GUID set in the extended type info object of a pointer_[i/o]serializer at runtime after the class has been added to the export registry?
I have never considered this. I don't see what this would be used for. The singleton class table is never modified after it is constructed (before main is called). This is necessary for the serialization library to be thread-safe.
What I was hoping to do is to replace (still before main is called) any possible NULL GUIDs for a class with a non-NULL one if the relevant extended_type_info ever gets instantiated with a non-NULL GUID (which would require some changes to how the extended_type_info_typeid constructor works, but that's a separate issue). The idea would be that the GUID attached to a BOOST_CLASS_EXPORT(), when it runs, would be the final word on the matter instead of a leftover NULL from an instantiation in some place that didn't know about the GUID.
4. Are there any suggested mechanisms for local hacks, given that we control the archive implementation, to implement 1-3 without changes to boost?
to re-summarize my suggestion above.
a) change all the headers to use BOOST_CLASS_EXPORT_KEY() b) make a small *.cpp file for each header which imports the header and invokes BOOST_CLASS_EXPORT_IMPL(). c) add your small *.cpp file to your library - either static library or dll. d) while you're at it, you might want to consider adding the serialize, save, and load functions for the class to the *.cpp file and not making them inline. This will eliminate any code bloat generated by the serialization library. If your DLLS are dynamically loadable, they will only occupy memory when the the classes they refer to are actually being used at runtime. (just don't load/unload the DLLS while multi-threading - use a mutex!)
it seems you've touched upon the issue regarding serialization of template classes. This was also touched upon in a previous email. Currently we have to explicitly instantiate any templates we want to serialize. Automatically instantion of template generated classes using some combination of enable_if, partial specialization and who knows what else is interesting to consider, but likely much trickier than first meets eye. Also our "guid" is a string which can only be processed at runtime. Replacing this with a "guid" generated at compile time from the class name, might make somethings possible which weren't before. This is sort of irrelevant to your current situation, but I like to keep the pot boiling.
Thanks for the suggestions. They mostly reflect what we were already doing -- people get beaten with a stick if they try to instantiate serialize methods in header files or multiple times and they are usually kept in implementation files for that reason. We've mostly given up on automatic instantiation, but kept it easy to extend if you want to (see the problem with I3Vector above) -- although it would be amazing if you figured out how to do it. -Nathan

Nathan Whitehorn wrote:
On 02/07/13 10:36, Robert Ramey wrote:
Nathan Whitehorn wrote:
What I'm concerned about is a situation like this:
Main library: Header: template<typename T> class I3Vector : public std::vector<T>, public I3FrameObject (our base class) { private: serialize(); };
BOOST_CLASS_EXPORT_KEY(I3Vector<T>) for a variety of T
Implementation:
Second library (a reasonably standard part of the software): Header: BOOST_CLASS_EXPORT_KEY(I3Vector<A>) for some other type A
Implementation:
The above looks good to me.
Third library (written by someone else as an addon): Implementation: Serialize an I3Vector<A> *without* including the second library's header
Hmmm - so library three includes it's own BOOST_CLASS_EXPORT_IMPL(I3Vector<A>) ? as a static library - no problem. as a dll - extra guid records created. Probably not a problem. If library three doesn't include BOOST_CLASS_EXPORT_IMPL(I3Vector<A>) it would likely use the one from from library two which depending on cirucumstances could produce unexpected results. <aside> At one point I included code to trap multiple guid records of the same type. The code is still in there - which a big comment. This enforced the ODR regarding guid types. But I had to comment it out because it required users to felt it was too difficult to limit instantiations and depended upon things "Just working" even with multiple instantiations. </aside>
What happens here is that library #3 will usually appear to work -- and certainly compile and link without issue -- because all the serialization instantiation was done in library #2. *However*, if library 3 is loaded before library 2, both library 3 *and* 2's attempt to serialize I3Vector<A> will starting failing with an unregistered class exception.
I'm not really sure about that. the guid table permits multiple records. (Though I think it shouldn't - see aside above). So I would expect that library 3 would work if library 2 was already loaded and fail if it wasn't - again - nasty behavior to try to resolve)
This is because extended_type_info_typeid<A> is a singleton and the first instance of it (as well as the now competing definitions of the classes if not fully inlined) came from library 3 where the GUID template specialization had not happened.
extended_type_info_typeid<A> is a singleton - BUT the concept of singleton in the presence of dynamically loaded DLLS is another surprise! Turns out that there is a separate table of guids for each DLL. Since static variables are created at the module level - not at the program level. This is why one can dynamically unload a DLL with static variables in it. To summarize DLL or Mainline is called a module for each Module there is one static table of guids for each instatiation in each module a guid record is added to the appropriate static table when a record is looked up - it's looked up in the guid table which corresponds to the module where the look up is invoked. and it's not always obvious where the looking up is happening class base - in a dll class derived : public base // in some other module. etc. And of course just when you've got it all figured out - some one raises the issue of multi-threading and thread safety in the presence of dynamically loaded DLLS. (hint, unload DLLS in reverse order of loading them - another thing to keep track of). Note that the standard says nothing - as far as I can tell - about the existence, lifetime, location, etc. of DLLs. In fact, it says nothing at all about DLLS so one has to sort of guess what all the compilers do. This isn't as bad as it sounds as requirements more or less dictate how things have to work - but it takes a while to come to that conclusion.
This is an awful problem to have to debug, especially when libraries 2 and 3 are written by unrelated parties and the behavior happens to depend on the user's choice of load order (the libraries come in through RTLD at runtime). It means that all possible I3Vector<T> need to have keys exported in a common header somewhere that you can't possibly avoid including if you ever use an I3Vector<T>. Otherwise, everything can break everywhere if you happen not to have included that header in wherever happens to be the first occurrence of the type from the perspective of the (potentially dynamic) linker.
I'm sympathetic. But I don't know what I can do to help. Basically I think it's a problem bigger than serialization - it just happens to come up here more frequently. Since it's infrequent one is never really prepared for it. I would step back and consider the more generall problem. If I ship a DLL which includes an implementation of I3Vector<A> then I should find a way to prohibit/discourage users from re-defiining this. If a user does this, there is a risk of accidently using part of the implemention from library 2 and library 3 together without knowing it. Maybe it's a question of puting library 2 in a private namespace or having a "trapping header" with declares all the library 2 interface with static assert if something is re-instantiated. I don't see a magic solution - just trick work. Of course this is why we earn the big bucks. <aside> Note this has already happened to utf-code-cvt. Due to some inconsistencies in library function signatures accross standard library implementations, in some cases code will impord from the standard library functions which are defined inside of utf8-codecvt -a boost library. This manifests itself as a tesst failure in the serialization library. This took a while to find. </aside>
What I was hoping to do is to replace (still before main is called) any possible NULL GUIDs for a class with a non-NULL one if the relevant extended_type_info ever gets instantiated with a non-NULL GUID (which would require some changes to how the extended_type_info_typeid constructor works, but that's a separate issue). The idea would be that the GUID attached to a BOOST_CLASS_EXPORT(), when it runs, would be the final word on the matter instead of a leftover NULL from an instantiation in some place that didn't know about the GUID.
I don't think we create a NULL guid - I think we just fail to create one. (forgive me if I'm mis-remembering). I understand the appeal trying to make the code more clever to address the situation. But I think the code is already on the verge of being so clever that it's in danger of being too hard to understand how to use. (much less make work and verify that it does!) To me this is more hiding the source of the problem (violation of ODR in DLLS) rather than fixing it (multiple instantiations). Basically I think the best long term solution is to bite the bullet and restructure the code somewhat to avoid the problem. One thing that might be useful is to re-enable or make optional the trap which throws an exception when the ODR is violated at runtime when the DLLS are loaded. I remember looking into making this and option - but I think it was too deep to make it easily accessible. And besides - most people who encountered the trap considered it a bug in the library. Though it might surprise some on this list, I can't fight every battle so I just commented it out and moved on.
Thanks for the suggestions. They mostly reflect what we were already doing -- people get beaten with a stick if they try to instantiate serialize methods in header files or multiple times and they are usually kept in implementation files for that reason.
It's mostly a question of scale. For a smaller simple program inline is just fine and very simple and convenient. But once one moves into dynamically loaded/unloaded code there are whole new set of issues to consider - and they are not obvious.
We've mostly given up on automatic instantiation, but kept it easy to extend if you want to (see the problem with I3Vector above) -- although it would be amazing if you figured out how to do it.
lol - When I set the goal of building a system which would permit non-intrusive serialization (ie. no special base class), I totally mis-underestimated how difficult it would be create such a system. Looking back at what we've created - actually, I'm personally amazed how far we've been able to come towards a system which mostly "just works". Robert Ramey
-Nathan
participants (2)
-
Nathan Whitehorn
-
Robert Ramey