[Serialization] possible solns to vector of vectors ptr corruption
Hi Robert, Any thoughts about my ideas for fixing the vector of vectors ptr corruption issue? I've reiterated my early msg below. If you don't have time to investigate, I would be willing to work on fixing the problem, if you don't mind me bugging you with questions occassionally. regards, Dan Notestein Original Message: ----------------------- Robert Ramey wrote:
Dan Notestein wrote:
The other bug appears to be in the way moveable_objects_recent and moveable_objects_end are being set prior to calling reset_object_address. My assumption is the intent here is to modify the addresses of "trackable" sub-objects contained within the vector element being moved, so that ptrs will be hooked back up correctly, but the moveable ptrs are being set up in such a way that the vector elements (TObject) of the sub-vector (ObjectVector) are getting their addresses modified when the stack version of TObjectContainer is copied to the vector.
This seems wrong, because the TObjects are allocated on the heap, so their addresses should not be updated in the object_vector_id table when the ObjectVector is copied. The end result is that I'm getting bad pointers to the TObjects after deserialization.
TObjects are de-serialized to the heap then added to the vector. Since the tracking saves the address TObject is serialized to, the address would be on the stack. reset_object_address sets the tracked address to the heap address after the item is appended to the vector.
I thought I considered the case of a vector of vectors - but maybe not. I would expect that the objects get theire addresses fixed up twice- once when they are moved from the stack to the heap and once when the vector container itself is moved from the stack to the heap. Its possible that there is something missng .
Yes, the TObjects are fixed up twice, but I think the second fixup is an error because vector<> doesn't really contain the data elements, it contains a ptr to the data elements. When the vector container is "copied" from the stack to the heap, the newly allocated internal vector element array will be randomly allocated from somewhere else in the heap, so using an offset based off the address of the new copy of the vector container won't work. I can think of two possible theoretical solutions to the problem: 1) Use the container::swap function to exchange internal data ptrs of the stack and heap copy of the vector. This gives the heap vector the ownership of the stack vector's elements. This seems the most efficient technique since it avoid reallocating and copy constructing the elements. There's also no need to fixup the ptrs to the elements using this technique. 2) Continue to use the copy constructor for the vector when transferring the vector from stack to heap, and use the internal array ptr inside the new copy of the vector to fixup the elements. This definitely seems like the lesser of the two options. I would have tried to make a fix for this, but the algorithm for handling the moveable ptrs that define the range of addresses to fixup was a little complicated for me to follow. In particular, I wasn't sure how to make the fixups take place so that truly "contained" data gets fixed up, while avoiding fixing up the data that is merely "pointed" to by data members (and hence shouldn't be fixed up when the container is copied). One idea I had was to keep around the size of the data objects being moved, so that the reset_address function could look for ptrs in the moveable_ptr range that have addresses between old_address and old_address+object_size. But I also figured there was a good chance that there is already information available somewhere on which items in the object_id table are "contained" vs "pointed to". best regards, Dan Notestein
Dan Notestein wrote:
Hi Robert,
Any thoughts about my ideas for fixing the vector of vectors ptr corruption issue? I've reiterated my early msg below. If you don't have time to investigate, I would be willing to work on fixing the problem, if you don't mind me bugging you with questions occassionally.
That would be just great!!! It would be helpful to make a test case using one of the tests in the package as a model. Or better and easier would be to just enhance test_vector to include this problem case. This is a tricky situtaton and you've already made the investment to understand it. Offhand, I would think that the swap solution below would be the best. Of course nothing is simple. This immediatly opens up the probability that other collections of pointers to other collections will have problems too. So that should be considered also. Its making my head hurt. Robert Ramey
regards, Dan Notestein
Original Message: ----------------------- Robert Ramey wrote:
Dan Notestein wrote:
The other bug appears to be in the way moveable_objects_recent and moveable_objects_end are being set prior to calling reset_object_address. My assumption is the intent here is to modify the addresses of "trackable" sub-objects contained within the vector element being moved, so that ptrs will be hooked back up correctly, but the moveable ptrs are being set up in such a way that the vector elements (TObject) of the sub-vector (ObjectVector) are getting their addresses modified when the stack version of TObjectContainer is copied to the vector.
This seems wrong, because the TObjects are allocated on the heap, so their addresses should not be updated in the object_vector_id table when the ObjectVector is copied. The end result is that I'm getting bad pointers to the TObjects after deserialization.
TObjects are de-serialized to the heap then added to the vector. Since the tracking saves the address TObject is serialized to, the address would be on the stack. reset_object_address sets the tracked address to the heap address after the item is appended to the vector.
I thought I considered the case of a vector of vectors - but maybe not. I would expect that the objects get theire addresses fixed up twice- once when they are moved from the stack to the heap and once when the vector container itself is moved from the stack to the heap. Its possible that there is something missng .
Yes, the TObjects are fixed up twice, but I think the second fixup is an error because vector<> doesn't really contain the data elements, it contains a ptr to the data elements. When the vector container is "copied" from the stack to the heap, the newly allocated internal vector element array will be randomly allocated from somewhere else in the heap, so using an offset based off the address of the new copy of the vector container won't work.
I can think of two possible theoretical solutions to the problem:
1) Use the container::swap function to exchange internal data ptrs of the stack and heap copy of the vector. This gives the heap vector the ownership of the stack vector's elements. This seems the most efficient technique since it avoid reallocating and copy constructing the elements. There's also no need to fixup the ptrs to the elements using this technique.
2) Continue to use the copy constructor for the vector when transferring the vector from stack to heap, and use the internal array ptr inside the new copy of the vector to fixup the elements. This definitely seems like the lesser of the two options.
I would have tried to make a fix for this, but the algorithm for handling the moveable ptrs that define the range of addresses to fixup was a little complicated for me to follow. In particular, I wasn't sure how to make the fixups take place so that truly "contained" data gets fixed up, while avoiding fixing up the data that is merely "pointed" to by data members (and hence shouldn't be fixed up when the container is copied). One idea I had was to keep around the size of the data objects being moved, so that the reset_address function could look for ptrs in the moveable_ptr range that have addresses between old_address and old_address+object_size. But I also figured there was a good chance that there is already information available somewhere on which items in the object_id table are "contained" vs "pointed to".
best regards, Dan Notestein
After thinking about it some more, I think we can avoid the whole
need for a fixup in the vector case. The trick is to put the object
in it's final container first, then deserialize using that location, thus
avoiding the need to call reset_object_address for any vector of vectors
data structure.
The above technique won't work for the non-sequential containers,
unfortunately, as their insertion point is set by the element's data contents.
I do have some ideas about how to handle the non-sequential containers
and also for making more complex data structures serializable in general,
such as enabling ptrs to data to be stored before storing
containers that directly "contain" the data and enabling ptrs into
the internal contents of objects, but the techniques for this
will take a bit of time to implement and I needed to get a fix for the
vector case ASAP, as it's holding up some work stuff.
Anyways, here's the code change I'm proposing for the
vectors of vectors case:
// sequential container input
template
Dan Notestein wrote:
Hi Robert,
Any thoughts about my ideas for fixing the vector of vectors ptr corruption issue? I've reiterated my early msg below. If you don't have time to investigate, I would be willing to work on fixing the problem, if you don't mind me bugging you with questions occassionally.
That would be just great!!!
It would be helpful to make a test case using one of the tests in the package as a model. Or better and easier would be to just enhance test_vector to include this problem case.
This is a tricky situtaton and you've already made the investment to understand it. Offhand, I would think that the swap solution below would be the best.
Of course nothing is simple. This immediatly opens up the probability that other collections of pointers to other collections will have problems too. So that should be considered also.
Its making my head hurt.
Robert Ramey
regards, Dan Notestein
Original Message: ----------------------- Robert Ramey wrote:
Dan Notestein wrote:
The other bug appears to be in the way moveable_objects_recent and moveable_objects_end are being set prior to calling reset_object_address. My assumption is the intent here is to modify the addresses of "trackable" sub-objects contained within the vector element being moved, so that ptrs will be hooked back up correctly, but the moveable ptrs are being set up in such a way that the vector elements (TObject) of the sub-vector (ObjectVector) are getting their addresses modified when the stack version of TObjectContainer is copied to the vector.
This seems wrong, because the TObjects are allocated on the heap, so their addresses should not be updated in the object_vector_id table when the ObjectVector is copied. The end result is that I'm getting bad pointers to the TObjects after deserialization.
TObjects are de-serialized to the heap then added to the vector. Since the tracking saves the address TObject is serialized to, the address would be on the stack. reset_object_address sets the tracked address to the heap address after the item is appended to the vector.
I thought I considered the case of a vector of vectors - but maybe not. I would expect that the objects get theire addresses fixed up twice- once when they are moved from the stack to the heap and once when the vector container itself is moved from the stack to the heap. Its possible that there is something missng .
Yes, the TObjects are fixed up twice, but I think the second fixup is an error because vector<> doesn't really contain the data elements, it contains a ptr to the data elements. When the vector container is "copied" from the stack to the heap, the newly allocated internal vector element array will be randomly allocated from somewhere else in the heap, so using an offset based off the address of the new copy of the vector container won't work.
I can think of two possible theoretical solutions to the problem:
1) Use the container::swap function to exchange internal data ptrs of the stack and heap copy of the vector. This gives the heap vector the ownership of the stack vector's elements. This seems the most efficient technique since it avoid reallocating and copy constructing the elements. There's also no need to fixup the ptrs to the elements using this technique.
2) Continue to use the copy constructor for the vector when transferring the vector from stack to heap, and use the internal array ptr inside the new copy of the vector to fixup the elements. This definitely seems like the lesser of the two options.
I would have tried to make a fix for this, but the algorithm for handling the moveable ptrs that define the range of addresses to fixup was a little complicated for me to follow. In particular, I wasn't sure how to make the fixups take place so that truly "contained" data gets fixed up, while avoiding fixing up the data that is merely "pointed" to by data members (and hence shouldn't be fixed up when the container is copied). One idea I had was to keep around the size of the data objects being moved, so that the reset_address function could look for ptrs in the moveable_ptr range that have addresses between old_address and old_address+object_size. But I also figured there was a good chance that there is already information available somewhere on which items in the object_id table are "contained" vs "pointed to".
best regards, Dan Notestein
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hello all,
I'm currently trying to make a library serializable and would like to be
as non-intrusive as possible.
Unfortunately this library (ODE) uses its own forms of linked lists to
manage its objects and the links include a "pointer-to-pointer-to-type"
(type**).
How do I go about serializing these? Why doesn't this compile?
#include <fstream>
#include
Jan Tinkhof wrote:
Hello all,
I'm currently trying to make a library serializable and would like to be as non-intrusive as possible.
Unfortunately this library (ODE) uses its own forms of linked lists to manage its objects and the links include a "pointer-to-pointer-to-type" (type**).
How do I go about serializing these? Why doesn't this compile?
#include <fstream> #include
struct mystruct { int i; };
template<class Archive> void serialize(Archive& ar, mystruct& ms, const unsigned int version) { ar & ms.i; } BOOST_CLASS_TRACKING(mystruct, boost::serialization::track_always)
int main( int argc, char* argv[], char* envp[] ) { std::ofstream ofilestream("filename.txt"); boost::archive::text_oarchive oarchive(ofilestream);
mystruct a; mystruct* pa = &a; // *** replace this // mystruct** ppa = &pa; // *** with this mysturct ** const ppa = &pa;
oarchive << ppa;
}
*** read the rationale section on why this trap occurs. Robert Ramey
Robert Ramey wrote: [snip]
// *** replace this // mystruct** ppa = &pa; // *** with this mysturct ** const ppa = &pa;
oarchive << ppa;
}
*** read the rationale section on why this trap occurs.
Robert Ramey
Thanks for your reply, Robert. :) However, I had already dealt with the trap by removing it from the code. (Yes, I know it is not recommended. Sorry for misleading you here.) Either way, once I get past the trap, I still have another error to deal with. It looks like this on my compiler (VC 8.0):
d:\projekt\boost_1_33_1\boost\serialization\access.hpp(109) : error C2228: left of '.serialize' must have class/struct/union type is 'mystruct *' did you intend to use '->' instead? d:\projekt\boost_1_33_1\boost\serialization\serialization.hpp(81) : see reference to function template instantiation 'void boost::serialization::access::serialize
(Archive &,T &,const unsigned int)' being compiled with [ Archive=boost::archive::text_oarchive, T=mystruct * ] [snip 16 more template expansions]
This only happens with the pointer-to-pointer type, serializing the regular pointer works ok.
participants (3)
-
Dan Notestein
-
Jan Tinkhof
-
Robert Ramey