[multi_index] announce: serialization support

Hi, I've just uploaded an upgrade of Boost.MultiIndex that provides serialization support at: http://groups.yahoo.com/group/boost/files/multi_index_120104.zip Docs are not yet fully upgraded, but there's some material on serialization support on the advanced topics page, plus a test case. I'd be grateful if those interested in Boost.MultiIndex can give this a try, and report about problems, missing features, improvements, etc. The implementation is a little complex, so some source code reviewing would be appreciated. If no major objections are risen, I'll commit this to the CVS in a few days. Oh, and kudos to Robert for bringing in such a powerful library as Boost.Serialization! Thank you, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:41ADE6BA.7E4EEAF4@tid.es...
Hi, I've just uploaded an upgrade of Boost.MultiIndex that provides serialization support at:
http://groups.yahoo.com/group/boost/files/multi_index_120104.zip
Docs are not yet fully upgraded, but there's some material on serialization support on the advanced topics page, plus a test case. I'd be grateful if those interested in Boost.MultiIndex can give this a try, and report about problems, missing features, improvements, etc. The implementation is a little complex, so some source code reviewing would be appreciated.
I did peruse the source code some. Here are some random observations. First of all on the multi-index package: a) I was surprised at how much was involved in realizing what seems to me such a simple idea. The fact I'm surprised at this is actually no surprise, as I'm almost always surprised at how much work it is to actually finish anything. b) This is an incredibly professional job - a very high standard. This documentation makes a great contribution to this impression. c) I believe that this is the reason the very ambitious undertaking "sailed thtough" the boost review process - (unlike most others). Aspiring library authors should study this as an example. Re the serialization aspect. I have to say I was a little disapointed that the implemention of serialization wasn't more transparent. I really couldn't follow the line from the serialization interface to the implementation in the time I was willing to spend. So I don't feel I can verify the implementation other than by testing. This always makes me feel sligthly uncomfortable. This isn't really a criticism and I'm not suggesting any changes. Only in reviewing the code did I become aware for the first time how much is required, so maybe it can be no other way. I'm a little disappointed at how much effort was required to implement serialization for this container. My hopes were that implemention of serialization for any class would be easier. Of course this is not a typical case so its not a huge thing. I'm curious if any of the complexity was a result of some requirement of the serialization package itself. I didn't spend a lot of time on this, so feel free to take any of the above with a grain of salt. Robert Ramey

Robert Ramey ha escrito: [snip]
I did peruse the source code some. Here are some random observations.
First of all on the multi-index package:
[snip] You'll make me blush :) thank you.
c) I believe that this is the reason the very ambitious undertaking "sailed thtough" the boost review process - (unlike most others). Aspiring library authors should study this as an example.
I think I have to give here due credit to Pavel, who acted as my private reviewer for almost a year. I objected to many of his criticisms, but he definitely never let me go sloppy. I think this mentorship role should be stressed more here at Boost.
Re the serialization aspect.
I have to say I was a little disapointed that the implemention of serialization wasn't more transparent. I really couldn't follow the line from the serialization interface to the implementation in the time I was willing to spend. So I don't feel I can verify the implementation other than by testing. This always makes me feel sligthly uncomfortable. This isn't really a criticism and I'm not suggesting any changes. Only in reviewing the code did I become aware for the first time how much is required, so maybe it can be no other way.
I'm a little disappointed at how much effort was required to implement serialization for this container. My hopes were that implemention of serialization for any class would be easier. Of course this is not a typical case so its not a huge thing. I'm curious if any of the complexity was a result of some requirement of the serialization package itself.
Yes and no. Boost.Serialization interface forces me to do things in weird ways, but I'm not sure this can be improved (I have a suggestion, though, please read on). Let me elaborate: Loading an element into a (any) container involves the following ops: load_contruct_data(element); ar>>element; container.insert(element); So, the element cannot be restored *in-place*, i.e., directly inside the container, as it is the container itself that controls object creation thru its allocator. This is a restriction with containers, rather than any serialization package. But now comes the problem. From a data structure point of view, you can conside a multi_index_container as a bunch of elements plus N different rearrangements of these, one for index. These rearrangements are archived more or less as sequences of pointers to the elements. On a first approach: save elements for each index{ for(iterator it=index::begin...index::end){ ar<<&(*it); // save a pointer to the element } } But this scheme does not work, because on loading time object tracking is tied to the element as first constructed, and not its copy inside the container: load_contruct_data(element); ar>>element; // Loaded pointers will be pointing here container.insert(element); Got it? This accounts for some of the complexity in the implementation of multi_index serialization. Basically, what I'm doing is to serialize both the element and its position on the container (the latter being done in index_node_base.hpp). The position thing is merely a marker, i.e. it does nothing but to force Boost.Serialization to track subsequent pointers to the right address. In fact, its serialize() memfun does nothing. In pseudocode // saving save elements for(each element){ save position(element) // does not emit info } save indices as pointers to the positions // loading load elements for(each element){ load position(element) // instructs Boost.Serialization about // where subsequent pointers have to be tracked to } load indices I hope I made myself clear. The problem is not particular to multi_index_containers, it'll also pop up in any situation involving pointers to elements in a container. I can workaround the problem cause I have direct access to the representation of multi_index_container (the position thing) but when serializing pointers to STL container elements there's no way around AFAICS. I think Boost.Serialization can be extended to offer better support for this thru one of this mechanisms (or both): 1. Allow the user to "retrack" an object, i.e. to instruct Boost.Serialization on loading time that pointers to an object have to be displaced to a user-defined address. 2. Define a special entity (a la make_nvp) that serves to serialize external objects, i.e. ar<<make_external(obj); ar<<&obj; ... ar>>make_external(obj); //obj is preexistent ar>>obj_ptr; // will be pointing to obj. In the pseucode above, obj is not really serialized nor does Boost.Serialization attempt to construct it on loading time, yet it is possible to serialize pointers to it. Have I made myself cleer? I'm aware the explanation is fuzzy, but I hope you got my point. Otherwise, please let me know so that I can try to express myself clearer. As for the rest of the complexity in the implementation of multi_index_serialization, it has to do with some algorithms to code indices as compactly as possible, basically by archiving "diff" subsequences wrt to the base sequence. This stuff is in index_matcher, index_loader and index_saver. This complexity is in no way related to Boost.Serialization. Sorry for the long post, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

c) I believe that this is the reason the very ambitious undertaking "sailed thtough" the boost review process - (unlike most others). Aspiring
"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:41AE1469.543AD33E@tid.es... Robert Ramey ha escrito: library
authors should study this as an example.
I think I have to give here due credit to Pavel, who acted as my private reviewer for almost a year. I objected to many of his criticisms, but he definitely never let me go sloppy. I think this mentorship role should be stressed more here at Boost.
I would note that Pavel played the same role regarding the serialization library. He pointed innumerable errors and suggested corrections - most of them I implemented. I'm doubtful this this is generally recognized as it should be as most of occurred during private communications. As a reward (or punishment) we have the serialization library working with borland compilers. I'm still considering the observations regarding container serialization. This was originally significantly different. After a spirited exchange I, and though I had some reservations, I came to implement the current one. I'm sure that the last word has been written on this. Robert Ramey

On Wed, 1 Dec 2004 12:57:29 -0800, Robert Ramey <ramey@rrsd.com> wrote:
"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:41AE1469.543AD33E@tid.es...
I think I have to give here due credit to Pavel, who acted as my private reviewer for almost a year. I objected to many of his criticisms, but he definitely never let me go sloppy. I think this mentorship role should be stressed more here at Boost.
I would note that Pavel played the same role regarding the serialization library. He pointed innumerable errors and suggested corrections - most of
Are you guys perhaps talking about Pavol (Droba) and not Pavel (?) -- Caleb Epstein caleb dot epstein at gmail dot com

I'm referring to Pavel Vozenilek "Caleb Epstein" <caleb.epstein@gmail.com> wrote in message news:989aceac04120113255c883a24@mail.gmail.com... On Wed, 1 Dec 2004 12:57:29 -0800, Robert Ramey <ramey@rrsd.com> wrote:
"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:41AE1469.543AD33E@tid.es...
I think I have to give here due credit to Pavel, who acted as my private reviewer for almost a year. I objected to many of his criticisms, but he definitely never let me go sloppy. I think this mentorship role should be stressed more here at Boost.
I would note that Pavel played the same role regarding the serialization library. He pointed innumerable errors and suggested corrections - most of
Are you guys perhaps talking about Pavol (Droba) and not Pavel (?) -- Caleb Epstein caleb dot epstein at gmail dot com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Dec 01, 2004 at 04:25:23PM -0500, Caleb Epstein wrote:
On Wed, 1 Dec 2004 12:57:29 -0800, Robert Ramey <ramey@rrsd.com> wrote:
"Joaquín M? López Mu?oz" <joaquin@tid.es> wrote in message news:41AE1469.543AD33E@tid.es...
I think I have to give here due credit to Pavel, who acted as my private reviewer for almost a year. I objected to many of his criticisms, but he definitely never let me go sloppy. I think this mentorship role should be stressed more here at Boost.
I would note that Pavel played the same role regarding the serialization library. He pointed innumerable errors and suggested corrections - most of
Are you guys perhaps talking about Pavol (Droba) and not Pavel (?)
Not as far as I know.... They were talking about Pavel Vozenilek, who realy deserves some credit. Regards, Pavol

"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:41AE1469.543AD33E@tid.es...
I hope I made myself clear. The problem is not particular to multi_index_containers, it'll also pop up in any situation involving pointers to elements in a container.
If you guarentee that the container itself is always serialized before your indices, then de-serialization of the indices would automatically be reduced to providing the original (tracked) pointer. In such a case, I would think the whole isse would never appear and that the implementation would be very straight forward. Robert Ramey
participants (4)
-
Caleb Epstein
-
Joaquín Mª López Muñoz
-
Pavol Droba
-
Robert Ramey