crash in boost serialization (1.44)

Hello, I’ve noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well. I have the following classes INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node) The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode. All classes are inside a Dll and correctly exported (I have no “unregistered class” exceptions). If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when trying to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works. I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly. I can send the example solution if needed. The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); } Best regards, Rico

I had the same problem, i could pin point to the fact that when serialization is in a DLL, the i/oserializer and pointer_i/oserializer are created for each DLL that uses serialization of a class (and in the main exe if there is code there that uses serialization for this class). The singleton are always dll exported, never imported. To make a long story short, in the serialization libs, the method basic_iarchive_impl::register_type(const basic_iserializer & bis) is called several times and may overwrite the bpis_ptr that was set earlier. To fix this, I had to rebuild boost.serialization by and modify basic_iarchive.cpp. I changed the line: coid.bpis_ptr = bis.get_bpis_ptr(); to if (coid.bpis_ptr == 0) coid.bpis_ptr = bis.get_bpis_ptr(); But, there are a few issues with serialization methods when they are implemented inside a dll and called from outside. Guy -- Guy Prémont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of rico.cadetg@noser.com Sent: Wednesday, December 08, 2010 11:20 AM To: boost-users@lists.boost.org Subject: [Boost-users] crash in boost serialization (1.44)
Hello,
Ive noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well.
I have the following classes
INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node)
The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode.
All classes are inside a Dll and correctly exported (I have no unregistered class exceptions).
If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when trying to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works.
I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly.
I can send the example solution if needed.
The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); }
Best regards, Rico _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Guy Prémont wrote:
I had the same problem, i could pin point to the fact that when serialization is in a DLL, the i/oserializer and pointer_i/oserializer are created for each DLL that uses serialization of a class (and in the main exe if there is code there that uses serialization for this class). The singleton are always dll exported, never imported.
To make a long story short, in the serialization libs, the method basic_iarchive_impl::register_type(const basic_iserializer & bis) is called several times and may overwrite the bpis_ptr that was set earlier.
To fix this, I had to rebuild boost.serialization by and modify basic_iarchive.cpp. I changed the line: coid.bpis_ptr = bis.get_bpis_ptr(); to if (coid.bpis_ptr == 0) coid.bpis_ptr = bis.get_bpis_ptr();
But, there are a few issues with serialization methods when they are implemented inside a dll and called from outside.
Guy
-- Guy Prémont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of rico.cadetg@noser.com Sent: Wednesday, December 08, 2010 11:20 AM To: boost-users@lists.boost.org Subject: [Boost-users] crash in boost serialization (1.44)
Hello,
I’ve noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well.
I have the following classes
INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node)
The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode.
All classes are inside a Dll and correctly exported (I have no “unregistered class” exceptions).
If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when trying to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works.
I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly.
I can send the example solution if needed.
The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); }
Have you passed this on to Robert? Any downside to doing this? Thanks, Jeff

Jeff Flinn wrote:
Guy Prémont wrote:
I had the same problem, i could pin point to the fact that when serialization is in a DLL, the i/oserializer and pointer_i/oserializer are created for each DLL that uses serialization of a class (and in the main exe if there is code there that uses serialization for this class). The singleton are always dll exported, never imported. To make a long story short, in the serialization libs, the method basic_iarchive_impl::register_type(const basic_iserializer & bis) is called several times and may overwrite the bpis_ptr that was set earlier. To fix this, I had to rebuild boost.serialization by and modify basic_iarchive.cpp. I changed the line: coid.bpis_ptr = bis.get_bpis_ptr(); to if (coid.bpis_ptr == 0) coid.bpis_ptr = bis.get_bpis_ptr();
But, there are a few issues with serialization methods when they are implemented inside a dll and called from outside.
Guy
-- Guy Prémont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of rico.cadetg@noser.com Sent: Wednesday, December 08, 2010 11:20 AM To: boost-users@lists.boost.org Subject: [Boost-users] crash in boost serialization (1.44)
Hello,
Ive noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well.
I have the following classes
INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node)
The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode.
All classes are inside a Dll and correctly exported (I have no unregistered class exceptions).
If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when trying to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works.
I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly.
I can send the example solution if needed.
The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); }
Have you passed this on to Robert? Any downside to doing this?
First of all, I'm impressed with your understanding of the subtleities of the library implemenation. This is not easy to achieve. I noticed this message and in fact marked it as important (an exceedingly rare occurance for me). I wanted to think about it some. My current thought is that this is a bad idea. The problems occurs when the same code is found more than one execution module (dll or main). I feel that, besides being wasteful, there is the potential that the code gets out of sync - e.g. when a newer version of the program invokes and older version DLL. There is code in the library which traps this condition, but I had to comment it out because avoiding this condition required more re-organization of user code than users could handle. So now we have the case were one includes a potentially very difficult to find error in one's program in order to avoid organizing one's code so that the situation can't happen. I'm sure that this can occur in other scenarios which use template code in multiple execution modles, but it seems that it's come up more frequently with the serialization library. The problem with the above fix is that it leaves the source of the problem in the code with the potential that it could arise somewhere else which I (and I don't think anyone else) can forsee. So I still recommend that users organize their code so that this situation cannot occur. However, whenever I do this, someone will always say "What's wrong with doing this way? What could go wrong?" The answer is "I don't know - and will never know". The answer to the answer is "well, then it must be OK". Which doesn't follow as far as I'm concerned. So, what I plan to do is to re-enable the trap which detects violation of the ODR, and permit an explicit override on the part of the user. The idea is that I'll be able to avoid being responsable for what practices that I can't recommend. Robert Ramey
Thanks, Jeff

Guy Primont wrote:
I had the same problem, i could pin point to the fact that when serialization is in a DLL, the i/oserializer and pointer_i/oserializer are created for each DLL that uses serialization of a class (and in the main exe if there is code there that uses serialization for this class). The singleton are always
exported, never imported. To make a long story short, in the serialization libs, the method basic_iarchive_impl::register_type(const basic_iserializer & bis) is called several times and may overwrite the bpis_ptr that was set earlier. To fix this, I had to rebuild boost.serialization by and modify basic_iarchive.cpp. I changed the line: coid.bpis_ptr = bis.get_bpis_ptr(); to if (coid.bpis_ptr == 0) coid.bpis_ptr = bis.get_bpis_ptr();
But, there are a few issues with serialization methods when they are implemented inside a dll and called from outside.
Guy
-- Guy Primont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of rico.cadetg@noser.com Sent: Wednesday, December 08, 2010 11:20 AM To: boost-users@lists.boost.org Subject: [Boost-users] crash in boost serialization (1.44)
Hello,
Ive noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well.
I have the following classes
INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node)
The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode.
All classes are inside a Dll and correctly exported (I have no unregistered class exceptions).
If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when
Jeff Flinn wrote: dll trying
to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works.
I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly.
I can send the example solution if needed.
The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); }
Have you passed this on to Robert? Any downside to doing this?
First of all, I'm impressed with your understanding of the subtleities of the library implemenation. This is not easy to achieve.
I noticed this message and in fact marked it as important (an exceedingly rare occurance for me). I wanted to think about it some. My current thought is that this is a bad idea. The problems occurs when the same code is found more than one execution module (dll or main). I feel that, besides being wasteful, there is the potential that the code gets out of sync - e.g. when a newer version of the program invokes and older version DLL. There is code in the library which traps this condition, but I had to comment it out because avoiding this condition required more re-organization of user code than users could handle. So now we have the case were one includes a potentially very difficult to find error in one's program in order to avoid organizing one's code so that the situation can't happen. I'm sure that this can occur in other scenarios which use template code in multiple execution modles, but it seems that it's come up more frequently with the serialization library.
The problem with the above fix is that it leaves the source of the problem in the code with the potential that it could arise somewhere else which I (and I don't think anyone else) can forsee. So I still recommend that users organize their code so that this situation cannot occur.
However, whenever I do this, someone will always say "What's wrong with doing this way? What could go wrong?" The answer is "I don't know - and will never know". The answer to the answer is "well, then it must be OK". Which doesn't follow as far as I'm concerned.
So, what I plan to do is to re-enable the trap which detects violation of the ODR, and permit an explicit override on the part of the user. The idea is that I'll be able to avoid being responsable for what practices that I can't recommend.
Robert Ramey
In my opinion, the actual reason for the problem is the duplication (in DLLs) of i/oserializer and pointer_i/oserializer for the same class. Those should be exported and instantiated only in the DLL that contain the actual serialization code. In the current implementation, the various serializers are instantiated, through a singleton, at the point of use. If two classes from two different DLLs have a member of a certain type, both DLLs will instantiate singletons for serializers. When you say that the same code is found more than one execution module, you are not talking about the T::serialize(Archive&, int) for each class, are you? Because that code is indeed in only one DLL. The code that goes from ar & t; // DLL A to t->serialize(ar,version); // DLL B is all generated by templates. In this case, it would be in DLL A. Any DLL, or application, that serialize a type T will contains that code. I think the problem lies in that murky area. I'm switching to boost 1.45 now (was using 1.40). Maybe the change in implementation will alleviate a few of these problems. Thanks Guy

Guy Prémont wrote:
Guy Primont wrote:
I had the same problem, i could pin point to the fact that when serialization is in a DLL, the i/oserializer and pointer_i/oserializer are created for each DLL that uses serialization of a class (and in the main exe if there is code there that uses serialization for this class). The singleton are always
exported, never imported. To make a long story short, in the serialization libs, the method basic_iarchive_impl::register_type(const basic_iserializer & bis) is called several times and may overwrite the bpis_ptr that was set earlier. To fix this, I had to rebuild boost.serialization by and modify basic_iarchive.cpp. I changed the line: coid.bpis_ptr = bis.get_bpis_ptr(); to if (coid.bpis_ptr == 0) coid.bpis_ptr = bis.get_bpis_ptr();
But, there are a few issues with serialization methods when they are implemented inside a dll and called from outside.
Guy
-- Guy Primont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of rico.cadetg@noser.com Sent: Wednesday, December 08, 2010 11:20 AM To: boost-users@lists.boost.org Subject: [Boost-users] crash in boost serialization (1.44)
Hello,
Ive noticed a crash (accessing a null pointer) inside the boost serialization (version 1.44). I use visual studio 2005, the code to serialize is in a Dll. I have a small visual studio solution that reproduces the crash, I guess it is a bug and should cause a crash in other environments as well.
I have the following classes
INode (Interface) Node (Abstract, derived from INode) LeafNode(derived from Node) TableNode(derived from Node)
The TableNode has a list of INode*, the children. A TableNode itself can be a child of a TableNode.
All classes are inside a Dll and correctly exported (I have no unregistered class exceptions).
If I create a TableNode with some entries including another TableNode and serialize it as an object, I get the crash when
Jeff Flinn wrote: dll trying
to deserialize later. If I do the same, but serialize the parent TableNode as a pointer, it works.
I guess the serialization framework does not recognize that a TableNode gets serialized through a pointer and therefore the object tracking does not work correctly.
I can send the example solution if needed.
The crash is in the basic_iarchive.cpp, line 456 (bpis_ptr is null): if(! tracking){ bpis_ptr->load_object_ptr(ar, t, co.file_version); }
Have you passed this on to Robert? Any downside to doing this?
First of all, I'm impressed with your understanding of the subtleities of the library implemenation. This is not easy to achieve.
I noticed this message and in fact marked it as important (an exceedingly rare occurance for me). I wanted to think about it some. My current thought is that this is a bad idea. The problems occurs when the same code is found more than one execution module (dll or main). I feel that, besides being wasteful, there is the potential that the code gets out of sync - e.g. when a newer version of the program invokes and older version DLL. There is code in the library which traps this condition, but I had to comment it out because avoiding this condition required more re-organization of user code than users could handle. So now we have the case were one includes a potentially very difficult to find error in one's program in order to avoid organizing one's code so that the situation can't happen. I'm sure that this can occur in other scenarios which use template code in multiple execution modles, but it seems that it's come up more frequently with the serialization library.
The problem with the above fix is that it leaves the source of the problem in the code with the potential that it could arise somewhere else which I (and I don't think anyone else) can forsee. So I still recommend that users organize their code so that this situation cannot occur.
However, whenever I do this, someone will always say "What's wrong with doing this way? What could go wrong?" The answer is "I don't know - and will never know". The answer to the answer is "well, then it must be OK". Which doesn't follow as far as I'm concerned.
So, what I plan to do is to re-enable the trap which detects violation of the ODR, and permit an explicit override on the part of the user. The idea is that I'll be able to avoid being responsable for what practices that I can't recommend.
Robert Ramey
In my opinion, the actual reason for the problem is the duplication (in DLLs) of i/oserializer and pointer_i/oserializer for the same class.
agreed.
Those should be exported and instantiated only in the DLL that contain the actual serialization code.
As far as I know - and I spent a lot of time on this - there is no way to do this with current compilers.
In the current implementation, the various serializers are instantiated, through a singleton, at the point of use. If two classes from two different DLLs have a member of a certain type, both DLLs will instantiate singletons for serializers.
This is the behavior of all current compiler/linker combinations. It is not addressable from within a library or application.
When you say that the same code is found more than one execution module, you are not talking about the T::serialize(Archive&, int) for each class, are you? Because that code is indeed in only one DLL. The code that goes from ar & t; // DLL A to t->serialize(ar,version); // DLL B is all generated by templates. In this case, it would be in DLL A. Any DLL, or application, that serialize a type T will contains that code. I think the problem lies in that murky area.
any time you use ar << t in more than one runtime module, you'll get multiple implemenations generated. The only way to avoid this is to use a different idiom: in the header: class mytype { ... template<class Archive> serialize(Archive &ar, const unsigned int version); ...}; in the dll template class::serialize(text_iarchive & ar, const unsigned version){ ... ar << ... ... }; and even that might not be enough since one has to watch the classes from which mytype is derived. Robert Ramey
I'm switching to boost 1.45 now (was using 1.40). Maybe the change in implementation will alleviate a few of these problems.
I doubt it. Robert Ramey
Thanks Guy

Robert Ramey
I'm switching to boost 1.45 now (was using 1.40). Maybe the change in implementation will alleviate a few of these problems.
I doubt it.
Well... something that worked fine in 1.40 does not seem to work anymore. I have a base class in a DLL, a derived class in another. An object in Base.dll serializes an vector of pointer to base class. When reading the file, it complains about an unregistered class when encountering a class that is registered through the derived classes DLL. I may have done something wrong when upgrading to 1.45 though, but it seems the list of keys is not common, although it should be a static of boost_serialization.dll. Guy

Guy Prémont wrote:
Robert Ramey
I'm switching to boost 1.45 now (was using 1.40). Maybe the change in implementation will alleviate a few of these problems.
I doubt it.
Well... something that worked fine in 1.40 does not seem to work anymore. I have a base class in a DLL, a derived class in another. An object in Base.dll serializes an vector of pointer to base class. When reading the file, it complains about an unregistered class when encountering a class that is registered through the derived classes DLL. I may have done something wrong when upgrading to 1.45 though, but it seems the list of keys is not common, although it should be a static of boost_serialization.dll.
I've included tests for this scenario so I believe it should work. Of course it's quite possible that my test isn't exhaustive. It's also possible that I've overlooked something regarding this subject. Robert Ramey
Guy

My program consists of only one Dll which contains all classes and its serialization code. I implemented the serialization the way Robert suggested: in the header: class mytype { ... template<class Archive> serialize(Archive &ar, const unsigned int version); ...}; in the dll template class::serialize(text_iarchive & ar, const unsigned version){ ... ar << ... ... }; There is no serialization code in any header. I just call the serialization from the main program, but there is no duplicated code. What do I have to change to avoid the crash?

Rico wrote:
My program consists of only one Dll which contains all classes and its serialization code.
I implemented the serialization the way Robert suggested:
in the header: class mytype { ... template<class Archive> serialize(Archive &ar, const unsigned int version); ...};
in the dll
template class::serialize(text_iarchive & ar, const unsigned version){ ... ar << ... ... };
There is no serialization code in any header. I just call the serialization from the main program, but there is no duplicated code.
And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough. using something like ar << x // where x is an type mytype * in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more? Robert Ramey
What do I have to change to avoid the crash?

Robert Ramey
And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash. I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes. (1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes --- If the main program contains the serialization of a Node* too, it works. (2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now --- I'd like you to have a look at my sample solution, can I send it to you? Rico

Rico wrote:
Robert Ramey
writes: And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash.
I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes.
(1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes ---
If the main program contains the serialization of a Node* too, it works.
(2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now ---
I'd like you to have a look at my sample solution, can I send it to you?
Sure, send me a zip file Robert Ramey
Rico

Robert Ramey
writes: And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash.
I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes.
(1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes ---
If the main program contains the serialization of a Node* too, it works.
(2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now ---
It is exactly the setup I have and it also produces the crash. The way I understand what is happening, it is the invocation of serialization in the main program that causes singleton to be instantiated in the main executable, in addition to those in the DLL. Even though the actual serialization code is only the DLL. The fix I did in basic_iarchive::register_type, posted previously, prevents the overriding of an existing pointer_iserializer by a NULL. It is somewhat hacky as it does not address the cause of the problem, but it is an effective fix. There is an implicit assumption that serialization code that is in a DLL will only be invoked by code that DLL. I think it is too limitating, maybe some other symbols need to be exported, beside void T::serialize(). Guy

Guy Prémont wrote:
Robert Ramey
writes: And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash.
I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes.
(1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes ---
If the main program contains the serialization of a Node* too, it works.
(2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now ---
It is exactly the setup I have and it also produces the crash. The way I understand what is happening, it is the invocation of serialization in the main program that causes singleton to be instantiated in the main executable, in addition to those in the DLL. Even though the actual serialization code is only the DLL. The fix I did in basic_iarchive::register_type, posted previously, prevents the overriding of an existing pointer_iserializer by a NULL. It is somewhat hacky as it does not address the cause of the problem, but it is an effective fix.
We've got different problems. You're interested in getting your application to work, while my concern is getting to the root of the problem. If I add your "fix" without really knowing is going on, I end up hiding the problem which will only show up again in a form even harder to discover. In this particular example, I want to know why the singleton is geting created in the main line if there is no serialization code there. When I see this, I can suggest a fix for the user, but more importantly perhaps figure out a way to trap this compile time with a static_error or static_warning. If I can't do that, it will give me another reason to enable the trapping at runtime. (with user option to override).
There is an implicit assumption that serialization code that is in a DLL will only be invoked by code that DLL.
not quite. There is an implicit requirement that serialization code be defined in only one place.
I think it is too limitating,
I agree that it is limitiing. But that's not the same as being a bad idea. It's limiting to inhibit linkage to both static and dynamic runtime libraries but doing so is a bad idea. So trapping and prohibiting this behavior makes one's programs more robust. That is, not everything that might be doable is a good idea to do.
maybe some other symbols need to be exported, beside void T::serialize().
I've looked at this in some detail, and I don't think that there is a simple universal fix which won't break some user programs in a way which is impossible to track down. To summarize, I want to trap this behavior. I realize that this breaks a lot of programs using DLLS. I would argue that they are likely broken anyway (at a minimum they suffer from code bloat). But I also recognise that overriding this trap is the most expedient solution in many cases. Robert Ramey Robert Ramey

Guy Primont wrote:
Robert Ramey
writes: And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash.
I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes.
(1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes ---
If the main program contains the serialization of a Node* too, it works.
(2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now ---
It is exactly the setup I have and it also produces the crash. The way I understand what is happening, it is the invocation of serialization in the main program that causes singleton to be instantiated in the main executable, in addition to those in the DLL. Even though the actual serialization code is only the DLL. The fix I did in basic_iarchive::register_type, posted previously, prevents the overriding of an existing pointer_iserializer by a NULL. It is somewhat hacky as it does not address the cause of the problem, but it is an effective fix.
We've got different problems. You're interested in getting your application to work, while my concern is getting to the root of the problem. If I add your "fix" without really knowing is going on, I end up hiding the problem which will only show up again in a form even harder to discover.
I know that what I proposed is not a cure for the actual problem. But in my case, the application must work... ;) One way to implement the real fix would be to have a global repository of (i|o)serializer and pointer_(i|o)serializer for each archive type. Each singleton, as it is created, could be registered into another singleton, held by the serialization library. I think the key is to have a central and unique access point for these objects, instead of relying on the compiler creating singletons appropriately which in case of shared library does not always produce the right result. Does .so in linux builds have the same problems? I haven't tried it, but I guess it does.
In this particular example, I want to know why the singleton is geting created in the main line if there is no serialization code there. When I see this, I can suggest a fix for the user, but more importantly perhaps figure out a way to trap this compile time with a static_error or static_warning. If I can't do that, it will give me another reason to enable the trapping at runtime. (with user option to override).
From what I understand, the singleton creation for serialization of a type (Node here) is a side effect of calling on serialization for that type.
Node table2; ... inputArchive >> table2; Calling serialization of Node* will, in turn, create the singleton for
pointer_iserializer. I don't think there is any other way of doing it. Adding an indirection level for the serializer, i.e. putting them in a repository, will avoid keeping track of several singletons for the same type that leads to conflicting behaviours.
To summarize, I want to trap this behavior. I realize that this breaks a lot of programs using DLLS. I would argue that they are likely broken anyway (at a minimum they suffer from code bloat). But I also recognise that overriding this trap is the most expedient solution in many cases.
I haven't found the trap that you mentioned a few times. Could you give me a pointer to it? I'd like to enable it and see what it says in my application. Thanks, -- Guy Prémont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext

Guy Prémont wrote:
Guy Primont wrote:
Robert Ramey
writes: And you still have the crash? - maybe I'm wrong about the cause. Or maybe my advice doesn't go far enough.
using something like
ar << x // where x is an type mytype *
in the mainline while the dll contains similar code might also create problems. I'd have to think about this some more?
Robert Ramey
Yes, I still have the crash.
I have a very small sample solution that causes the crash. I have only one Dll containing one interface (INode) and one concrete class (Node). The Node contains a list of INode*. If I call the serialization of one Node (as an object) from the main program, it crashes.
(1)--- Node table; ... outputArchive << (const Node&)table; Node table2; ... inputArchive >> table2; // crashes ---
If the main program contains the serialization of a Node* too, it works.
(2)--- Node* pTable = ...; ... outputArchive << pTable ; Node* pTable2; ... inputArchive >> pTable2; // the code (1) works without a crash now ---
It is exactly the setup I have and it also produces the crash. The way I understand what is happening, it is the invocation of serialization in the main program that causes singleton to be instantiated in the main executable, in addition to those in the DLL. Even though the actual serialization code is only the DLL. The fix I did in basic_iarchive::register_type, posted previously, prevents the overriding of an existing pointer_iserializer by a NULL. It is somewhat hacky as it does not address the cause of the problem, but it is an effective fix.
We've got different problems. You're interested in getting your application to work, while my concern is getting to the root of the problem. If I add your "fix" without really knowing is going on, I end up hiding the problem which will only show up again in a form even harder to discover.
I know that what I proposed is not a cure for the actual problem. But in my case, the application must work... ;)
One way to implement the real fix would be to have a global repository of (i|o)serializer and pointer_(i|o)serializer for each archive type.
I've found no way to make a singleton which is global across execution modules. ...
Does .so in linux builds have the same problems? I haven't tried it, but I guess it does.
I'd be very, very surprised if it didn't.
In this particular example, I want to know why the singleton is geting created in the main line if there is no serialization code there. When I see this, I can suggest a fix for the user, but more importantly perhaps figure out a way to trap this compile time with a static_error or static_warning. If I can't do that, it will give me another reason to enable the trapping at runtime. (with user option to override).
From what I understand, the singleton creation for serialization of a type (Node here) is a side effect of calling on serialization for that type.
more or less. actually it's a side-effect of exporting a type.
Calling serialization of Node* will, in turn, create the singleton for pointer_iserializer. I don't think there is any other way of doing it. Adding an indirection level for the serializer, i.e. putting them in a repository, will avoid keeping track of several singletons for the same type that leads to conflicting behaviours.
To summarize, I want to trap this behavior. I realize that this breaks a lot of programs using DLLS. I would argue that they are likely broken anyway (at a minimum they suffer from code bloat). But I also recognise that overriding this trap is the most expedient solution in many cases.
I haven't found the trap that you mentioned a few times. Could you give me a pointer to it? I'd like to enable it and see what it says in my application.
It's in (I think) basic_serializer_map.?pp). It's commented out. Robert Ramey

From what I understand, the singleton creation for serialization of
a
type (Node here) is a side effect of calling on serialization for that type.
more or less. actually it's a side-effect of exporting a type.
I haven't found the trap that you mentioned a few times. Could you give me a pointer to it? I'd like to enable it and see what it says
in
my application.
It's in (I think) basic_serializer_map.?pp). It's commented out.
Robert Ramey
Thanks for the tip. I re-activated the trap... and indeed i found some serialization code that was in several DLLs. However, even though I fixed the problems I found, the trap did spring up again at code that I know is not duplicated across DLLs. I still think that the singleton creation is not only a side-effect of exporting a type (using BOOST_CLASS_EXPORT_IMPLEMENT) but also happens when using the serialization for a given type ( à la "ar & t"). It is a quite complex behavior, I don't understand it completely because from my understanding I expected a lot more trapping to happen. -- Guy Prémont, D.Sc. Architecte logiciel senior / Senior software architect CM Labs Simulations Inc. http://www.cm-labs.com/ Tel. 514-287-1166 ext. 237

Guy Prémont wrote:
From what I understand, the singleton creation for serialization of a type (Node here) is a side effect of calling on serialization for that type.
more or less. actually it's a side-effect of exporting a type.
I haven't found the trap that you mentioned a few times. Could you give me a pointer to it? I'd like to enable it and see what it says
in
my application.
It's in (I think) basic_serializer_map.?pp). It's commented out.
Robert Ramey
Thanks for the tip. I re-activated the trap... and indeed i found some serialization code that was in several DLLs. However, even though I fixed the problems I found, the trap did spring up again at code that I know is not duplicated across DLLs. I still think that the singleton creation is not only a side-effect of exporting a type (using BOOST_CLASS_EXPORT_IMPLEMENT) but also happens when using the serialization for a given type ( à la "ar & t"). It is a quite complex behavior, I don't understand it completely because from my understanding I expected a lot more trapping to happen.
As I suppose it's obvious by now, this is uncharted territory. So I appreciate your efforts to clarify this. Eventually I want to include code to improve handling of this issue and hopefully you'll be able to provide some insight. Here are some other things to look at. a) You can set your debugger to trap each time an entry is added to a serializer_map singleton. A backtrace should (?) lead you to the source which get's added twice. b) If you haven't already, you might want to look at the tests of DLLS in the test suite. I don't know if that will help as they are pretty simple but it wouldn't hurt to look. Robert Ramey

Hi,
I've had another crash related to basic_iarchive, bpis_ptr being NULL, when
using serialization in DLLs. I found two occurences of this bug. The basic
reason is the same, singletons are created in each DLL that serializes the
type, but the occurence happens at different places in the basic_iarchive
code.
Case 1:
DLL_1 has serialization code for VxVector3 (amongst other classes).
DLL_1 creates singletons iserializer

On Fri, Dec 10, 2010 at 12:35 PM, Robert Ramey
> In the current implementation,
the various serializers are instantiated, through a singleton, at the point of use. If two classes from two different DLLs have a member of a certain type, both DLLs will instantiate singletons for serializers.
This is the behavior of all current compiler/linker combinations.
Mostly just on Windows, though Linux has its own quirks that come up when you use dlopen().
It is not addressable from within a library or application.
Well, yes and no. Given the current structure of Boost.Serialization, it's definitely a problem that only the library integrator can solve, by structuring his application appropriately. However, if you were willing to change the structure of Boost.Serialization, you could have all those DLLs link to a common DLL which is where the singletons are "registered" (you could create them there for a reduction in overall code size, but I assume you want to keep that DLL as lightweight as possible). You'd have to explicitly *not* deliver that common DLL as a static library. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Dave Abrahams wrote:
On Fri, Dec 10, 2010 at 12:35 PM, Robert Ramey
wrote: In the current implementation, the various serializers are instantiated, through a singleton, at the point of use. If two classes from two different DLLs have a member of a certain type, both DLLs will instantiate singletons for serializers.
This is the behavior of all current compiler/linker combinations.
Mostly just on Windows, though Linux has its own quirks that come up when you use dlopen().
Hmmm - that is a huge surprise to me. My normal method is to develop on windows (I love the MS IDE/Debugger), then run tests with cygwin then upload to the trunk and watch the linux results. The last round of development included tests of DLLs and everything worked as expected so I had assumed that I had the correct understanding of what compilers and linkers do under linux. Also, I reasoned that since each DLL (shared library) is built and linked independently, the behavior I observed in windows where each DLL initializes it's own statics, would have to be the same for all shared library setups. I conceded this is a guessing game - but I'm not sure what else to do.
It is not addressable from within a library or application.
Well, yes and no. Given the current structure of Boost.Serialization, it's definitely a problem that only the library integrator can solve, by structuring his application appropriately.
That's my understanding as well.
However, if you were willing to change the structure of Boost.Serialization, you could have all those DLLs link to a common DLL which is where the singletons are "registered" (you could create them there for a reduction in overall code size, but I assume you want to keep that DLL as lightweight as possible). You'd have to explicitly *not* deliver that common DLL as a static library.
Just to clarify things (maybe): The serialization library fits into one library. (the wide character version fits in another but depends up the first one. As I remember, the DLL contains no static initializers and there for no singletons. The problem come when a user puts serialization code inside his own DLL. This instantiates template code which in turn provokes the creation of certain static objects which are used for things like tracking of some object types. There is a set of static objects for each execution module (dll or exe). When serialization is invoked from different modules, a "singleton" is created for each module and this can lead to ambiguities. I'm sure you know this, I'm just stating for the benefit of other which might be reading this thread. So, I see the conflict coming not from the serialization DLL but rather from conflict between the user's execution modules. I can't see how any restructuring the the serialization library could address this. My advice is that the user's DLLs should/must be structured to avoid this problem. I believe that in doing this, the users code will likely avoid repetition of other code between his modules as well (assuming he's using templated code). So likely it's a good thing anyway. My preferred solution is to: a) clarify/document the circumstances under which this can occur. b) enable code (now commented out) to trap some (or maybe all) of these conflict when a DLL is loaded. c) provide a method to suppress the traps for users who prefer to do so and, in my view, take their chances. That's the way I see it now. Robert Ramey
participants (6)
-
Dave Abrahams
-
Guy Prémont
-
Jeff Flinn
-
Rico
-
rico.cadetg@noser.com
-
Robert Ramey