[serialization]Runtime Exception While Loading Old Serialized XML Files w/ 1.38

Hello list, I have a bunch of XML files that have been serialized using Boost.Serialization (the standard XML archive) and were working fine until we upgraded to the boost trunk a few days ago (unfortunately, I can't tell exactly what repository revision were used to generate the archives, but I know they were custom builds of pre-release 1.37 trunk, and the file_version attribute in the XML is set to 4.) Also, we use MSVC9.0sp1 for building everything. As soon as we upgraded to the trunk, we started getting runtime exceptions (invalid archive format) on some of our larger files. I think I have tracked it down to an incompatibility between version 4 and version 5 files that the newer build fails to take into consideration, the details of which (as far as I could find out) follows. In our data structures there is an std::list of pointers to objects previously serialized in the same serialization operation (which means these repeated pointers are stored as a reference to the previous object and the data is not duplicated.) Using the old library, when this list was being serialized, apparently the "item_version" was not being written (only the "count" filed and the "item"s were.) The old library apparently didn't expect to read this either, so everything worked out nicely. The new library however, expects an "item_version" field and throws when it doesn't find one. As I trace into the serialization library, I can see that there is logic there for new libraries to be able to handle archive format changes through time. Now my question is whether this "discrepancy" is actually a bug in either the older or the new code, or failing to handle the format change is a bug, or this issue has already come up and been dealt with in some other way? (how?) I should also mention that it is quite probably that the problem be in our code, but I have no idea how. I would appreciate any help in locating it. Any pointers, help, clues and general wisdom is greatly appreciated. -yzt

First of all, any failure to read any archive created under a previous version of the library is a bug. Thanks for pointing this out. This is likely fixable, but we would need a little more information. a) I'm not sure what version 4 and version 5 refers to - expand upon this. b) I'm not sure what type the "item_version" refers to. Is it the list type or (more likely) the type which is being pointed to. c) what type is being pointed to? d) can we assume no changes in the serialization traits? Robert Ramey Hello list, I have a bunch of XML files that have been serialized using Boost.Serialization (the standard XML archive) and were working fine until we upgraded to the boost trunk a few days ago (unfortunately, I can't tell exactly what repository revision were used to generate the archives, but I know they were custom builds of pre-release 1.37 trunk, and the file_version attribute in the XML is set to 4.) Also, we use MSVC9.0sp1 for building everything. As soon as we upgraded to the trunk, we started getting runtime exceptions (invalid archive format) on some of our larger files. I think I have tracked it down to an incompatibility between version 4 and version 5 files that the newer build fails to take into consideration, the details of which (as far as I could find out) follows. In our data structures there is an std::list of pointers to objects previously serialized in the same serialization operation (which means these repeated pointers are stored as a reference to the previous object and the data is not duplicated.) Using the old library, when this list was being serialized, apparently the "item_version" was not being written (only the "count" filed and the "item"s were.) The old library apparently didn't expect to read this either, so everything worked out nicely. The new library however, expects an "item_version" field and throws when it doesn't find one. As I trace into the serialization library, I can see that there is logic there for new libraries to be able to handle archive format changes through time. Now my question is whether this "discrepancy" is actually a bug in either the older or the new code, or failing to handle the format change is a bug, or this issue has already come up and been dealt with in some other way? (how?) I should also mention that it is quite probably that the problem be in our code, but I have no idea how. I would appreciate any help in locating it. Any pointers, help, clues and general wisdom is greatly appreciated. -yzt

Robert Ramey wrote:
First of all, any failure to read any archive created under a previous version of the library is a bug. Thanks for pointing this out.
Thanks for your interest. Since my last post, I have verified that the problem was indeed fixable (or rather, circumventable) by adding an "item_version" tag where it was missing. I'll explain more below.
This is likely fixable, but we would need a little more information.
I'm sorry if my first description was vague. I'll try to be more precise, but I'm afraid I'm quite unfamiliar with the boostian terminology!
a) I'm not sure what version 4 and version 5 refers to - expand upon this.
OK. At the beginning of a serialized XML file, after the "<?xml..." tag and the "<!DOCTYPE...", there's a "boost_serialization" tag with two attributes; "signature" which is "serialization::archive" and "version". I was referring to this version, which used to be "4" when my files were serialized, and now with the updated libs from the trunk, it is "5".
b) I'm not sure what type the "item_version" refers to. Is it the list type or (more likely) the type which is being pointed to. c) what type is being pointed to?
Well, I can't say I'm positive either! Basically, the relevant part of my type structure looks like this: //BEGIN_CODE class Component { ... }; class CompA : Component {...} class CompB : Component {...} class CompC : Component {...} class Object { ... std::map<std::string, Component *> name_comp_table; std::vector<Component *> interesting_comps; }; //END_CODE Basically, I serialize instances of Object. The interesting_comps member holds pointers to Components that already exist in name_comp_table. The CompX classes are what actually stored in Objects using pointers to their polymorphic base class Component. It is in the serialization of the interesting_comps member of the Object class that this possible bug happens. Since all the objects pointed to by the elements of the interesting_comps vector are already serialized, the serialization system (rightfully) just stores an "object_id_reference" in the file, like this: //BEGIN_DATA ... <interesting_comps class_id="12" tracking_level="0" version="0"> <count>10</count> <item class_id_reference="13" object_id_reference="_13"></item> <item class_id_reference="15" object_id_reference="_14"></item> <item class_id_reference="16" object_id_reference="_15"></item> <item class_id_reference="17" object_id_reference="_16"></item> <item class_id_reference="17" object_id_reference="_17"></item> <item class_id_reference="17" object_id_reference="_18"></item> <item class_id_reference="17" object_id_reference="_19"></item> <item class_id_reference="19" object_id_reference="_22"></item> <item class_id_reference="20" object_id_reference="_23"></item> <item class_id_reference="21" object_id_reference="_25"></item> </interesting_comps> ... //END_DATA I think the above lacks an "item_version" tag after the "count" tag. Like so: //BEGIN_DATA ... <count>10</count> <item_version>0</item_version> <item class_id_reference="13" object_id_reference="_13"></item> ... //END_DATA In fact, I wrote a script that went through all my serialized XML files, checked whether this "item_version" tag existed or not, and added it if it didn't. This solved my problem completely and now the files load as they did before. Also, I found out that the example I gave you above was the only place that this wrong behavior happened (I have several hundred of these Objects per XML file on average, and this happened in all of them and nowhere else.) I also use serialization for several other STL containers, including vectors and maps, but none of them contains pointers to previously serialized data and none of them behaves erratically. I can post real header files and real serialized data if you want, but I doubt they'll be useful to anyone due to excessive noise. I may be able to reproduce the problem using simpler data structures if really needed (however, since my immediate problem is solved and due to a tight schedule in this project, that may take a few days.)
d) can we assume no changes in the serialization traits?
I guess so. During these tests, I did not touch any of my own code. I only replaced older Boost header and library files with freshly checked-out and built ones.
Robert Ramey
Again thanks for your time and interest. -yzt

Robert Ramey wrote:
First of all, any failure to read any archive created under a previous version of the library is a bug. Thanks for pointing this out.
Thanks for your interest. Since my last post, I have verified that the problem was indeed fixable (or rather, circumventable) by adding an "item_version" tag where it was missing. I'll explain more below.
This is likely fixable, but we would need a little more information.
I'm sorry if my first description was vague. I'll try to be more precise, but I'm afraid I'm quite unfamiliar with the boostian terminology!
a) I'm not sure what version 4 and version 5 refers to - expand upon this.
OK. At the beginning of a serialized XML file, after the "<?xml..." tag and the "<!DOCTYPE...", there's a "boost_serialization" tag with two attributes; "signature" which is "serialization::archive" and "version". I was referring to this version, which used to be "4" when my files were serialized, and now with the updated libs from the trunk, it is "5".
b) I'm not sure what type the "item_version" refers to. Is it the list type or (more likely) the type which is being pointed to. c) what type is being pointed to?
Well, I can't say I'm positive either! Basically, the relevant part of my type structure looks like this: //BEGIN_CODE class Component { ... }; class CompA : Component {...} class CompB : Component {...} class CompC : Component {...} class Object { ... std::map<std::string, Component *> name_comp_table; std::vector<Component *> interesting_comps; }; //END_CODE Basically, I serialize instances of Object. The interesting_comps member holds pointers to Components that already exist in name_comp_table. The CompX classes are what actually stored in Objects using pointers to their polymorphic base class Component. It is in the serialization of the interesting_comps member of the Object class that this possible bug happens. Since all the objects pointed to by the elements of the interesting_comps vector are already serialized, the serialization system (rightfully) just stores an "object_id_reference" in the file, like this: //BEGIN_DATA ... <interesting_comps class_id="12" tracking_level="0" version="0"> <count>10</count> <item class_id_reference="13" object_id_reference="_13"></item> <item class_id_reference="15" object_id_reference="_14"></item> <item class_id_reference="16" object_id_reference="_15"></item> <item class_id_reference="17" object_id_reference="_16"></item> <item class_id_reference="17" object_id_reference="_17"></item> <item class_id_reference="17" object_id_reference="_18"></item> <item class_id_reference="17" object_id_reference="_19"></item> <item class_id_reference="19" object_id_reference="_22"></item> <item class_id_reference="20" object_id_reference="_23"></item> <item class_id_reference="21" object_id_reference="_25"></item> </interesting_comps> ... //END_DATA I think the above lacks an "item_version" tag after the "count" tag. Like so: //BEGIN_DATA ... <count>10</count> <item_version>0</item_version> <item class_id_reference="13" object_id_reference="_13"></item> ... //END_DATA In fact, I wrote a script that went through all my serialized XML files, checked whether this "item_version" tag existed or not, and added it if it didn't. This solved my problem completely and now the files load as they did before. Also, I found out that the example I gave you above was the only place that this wrong behavior happened (I have several hundred of these Objects per XML file on average, and this happened in all of them and nowhere else.) I also use serialization for several other STL containers, including vectors and maps, but none of them contains pointers to previously serialized data and none of them behaves erratically. I can post real header files and real serialized data if you want, but I doubt they'll be useful to anyone due to excessive noise. I may be able to reproduce the problem using simpler data structures if really needed (however, since my immediate problem is solved and due to a tight schedule in this project, that may take a few days.)
d) can we assume no changes in the serialization traits?
I guess so. During these tests, I did not touch any of my own code. I only replaced older Boost header and library files with freshly checked-out and built ones.
Robert Ramey
Again thanks for your time and interest. -yzt
participants (2)
-
Robert Ramey
-
Yaser Zhian