[serialization] Attaching arbitrary data to an archive

Noah Roberts

1 Jun 2010 1 Jun '10

5:35 p.m.

Sometimes during serialization an object needs to know a lot more than it does during normal operation. The way I've dealt with this in the past is to subclass the archive I want to use and add setter/getters for the data that various objects down the line are going to need to store and retrieve themselves. Is there a better way though, something that's built into the library already maybe? I don't see anything but I thought I'd ask. -- http://crazyeddiecpp.blogspot.com/

Show replies by date

Emil Dotchevski

1 Jun 1 Jun

6:41 p.m.

New subject: [serialization] Attaching arbitrary data to an archive

On Tue, Jun 1, 2010 at 10:35 AM, Noah Roberts <roberts.noah@gmail.com> wrote:

...

Sometimes during serialization an object needs to know a lot more than it does during normal operation. The way I've dealt with this in the past is to subclass the archive I want to use and add setter/getters for the data that various objects down the line are going to need to store and retrieve themselves.

Is there a better way though, something that's built into the library already maybe? I don't see anything but I thought I'd ask.

In my own serialization library I have a system similar to the one used in Boost Exception that allows exception objects to carry arbitrary data: all my archive types implement this functionality. For example, this has allowed my shared_ptr serialization implementation to be decoupled from the serialization library itself, not to mention things like reading and writing DirectX textures (which need a D3D device object), etc. Perhaps boost::serialization can benefit from a similar approach. I can upload some source code if there is interest. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Robert Ramey

7:04 p.m.

New subject: [serialization] Attaching arbitrary data to an archive

Noah Roberts wrote:

...

Sometimes during serialization an object needs to know a lot more than it does during normal operation. The way I've dealt with this in the past is to subclass the archive I want to use and add setter/getters for the data that various objects down the line are going to need to store and retrieve themselves.

Is there a better way though, something that's built into the library already maybe? I don't see anything but I thought I'd ask.

I needed this to implement serialization for shared_pointers. For this reason if you look into text_iarchive.hpp you'll see the "shared_pointer_helper". That is, I subclassed the "naked_text_iarchive". with multiple inheritance. This works well for me. Except one thing.... I've now built in a dependency into ...archive to shared_ptr. Sometime ago, I had an undocumented concept "attach runtime helper". I saw this as "polluting" the otherwise "pristine" archive code with something that didn't belong there so I factored it out and made is specific to shared_pointer_helper. I see now that the second decision was a mistake and that I should have left it generic after I factored it out. So the definitive solution is: a) replace shared_pointer_helper with a "generic runtime helper". b) adjust shared_pointer_helper interface to be an instance of the above c) document all this. Feel free to take this on. Robert Ramey

Stefan Strasser

7:07 p.m.

Zitat von Robert Ramey <ramey@rrsd.com>:

...

Noah Roberts wrote:

...
Sometimes during serialization an object needs to know a lot more than it does during normal operation. The way I've dealt with this in the past is to subclass the archive I want to use and add setter/getters for the data that various objects down the line are going to need to store and retrieve themselves.

Is there a better way though, something that's built into the library already maybe? I don't see anything but I thought I'd ask.

my solution is pretty similar, but I think somewhat cleaner at least in my case, because the serialization can depend on the kind of archive that is used: instead of the serialize() functions extracting the additional data they need from the archive, I introduced a new serialization primitive. in my case, the serialize() function needed to serialize a type, one out of many, so on load() it needed to know what types to consider -> stored in the archive. I also overloaded the archives, but instead of getters/setters there is something like: void load(type_selection &type){ ... } this decouples the info stored in the archive from the individual serialize() functions. also if the archive doesn't implement type_selection as a primitive its serialize() function is called, so you can react adequately when an archive is used that isn't subclass-ed. robert, this seems very similar to pointer serialization to me. if a pointer is loaded, info stored in the archive is required (the pointer tracking map). same for derived type serialization. maybe pointer tracking/derived types can one day be as optional as the shared_ptr mix-in you described is. I still use something like this just to avoid the construction overhead of a Boost.Serialization archive when it's not needed: class archive{ archive &operator<<(int t){ //don't need Boost.Serialization for this ... } archive &operator<<(T *t){ //need Boost.Serialization for this: if(!sarchive) sarchive=in_place(... *sarchive << t; } private: optional<archive::binary_oarchive...> sarchive; } Stefan

Robert Ramey

8:39 p.m.

New subject: [serialization] Attaching arbitrary data to an archive

Stefan Strasser wrote:

...

Zitat von Robert Ramey <ramey@rrsd.com>:

...

I also overloaded the archives, but instead of getters/setters there is something like:

void load(type_selection &type){ ... }

this decouples the info stored in the archive from the individual serialize() functions. also if the archive doesn't implement type_selection as a primitive its serialize() function is called, so you can react adequately when an archive is used that isn't subclass-ed.

as far as I can tell, this seems consistent with my original intention. I endeavored to minimize the handling for special types. Of course for some specific application, I can see one wanting to add his own special handling for types not otherwise serializable.

...

robert, this seems very similar to pointer serialization to me. if a pointer is loaded, info stored in the archive is required (the pointer tracking map). same for derived type serialization.

Hmmm - sycronicity here. 1.43 includes two new case studies. On is for a simple light weight archive class meant to be used for debug logging. It only handles output, doesn't, doesn't follow derivation paths for polymorhic base classes. Best part is a) it,s HEADER ONLY so it's convenient for use in debugging without changing the build or adding another compiled library to your project. b) it's FREE in that you've already got the serialize functions in there. There is nothing extra to do. But, as mentioned before, if you display a base class, that's all you'll get.

...

maybe pointer tracking/derived types can one day be as optional as the shared_ptr mix-in you described is.

I'm not sure that it would be all that difficult to implement this. The elaborate code for handling polymorphic pointers, etc. Of course if you don't want to repeat stuff, etc it could turn into a lot of work.

...

I still use something like this just to avoid the construction overhead of a Boost.Serialization archive when it's not needed:

class archive{ archive &operator<<(int t){ //don't need Boost.Serialization for this ... } archive &operator<<(T *t){ //need Boost.Serialization for this: if(!sarchive) sarchive=in_place(... *sarchive << t; } private: optional<archive::binary_oarchive...> sarchive; }

I'm not sure I understand the motivation for this - but then I don't have to. Note that the implementation of the serialization library relies on template metaprogramming to generate code ONLY for those features actually invoked. So, I'm not convinced of the utility of the above approach. Perhaps there's "too much" overhead in the construction of an archive - but someone would ahve to make a case for this assertion. Robert Ramey

...

Stefan

Stefan Strasser

8:02 p.m.

Zitat von Robert Ramey <ramey@rrsd.com>:

...

Hmmm - sycronicity here. 1.43 includes two new case studies. On is for a simple light weight archive class meant to be used for debug logging. It only handles output, doesn't, doesn't follow derivation paths for polymorhic base classes. Best part is

I'll have a look, thank you.

...

...
I still use something like this just to avoid the construction overhead of a Boost.Serialization archive when it's not needed

...

I'm not sure I understand the motivation for this - but then I don't have to. Note that the implementation of the serialization library relies on template metaprogramming to generate code ONLY for those features actually invoked. So, I'm not convinced of the utility of the above approach. Perhaps there's "too much" overhead in the construction of an archive - but someone would ahve to make a case for this assertion.

in libraries under construction STLdb, Persistent and maybe even STM we (ab)use serialization for copying, cloning, comparing,... individual objects. so in a lot of cases an archive is constructed, used to serialize exactly ONE object and then destructed, because archives cannot be reused as their state cannot be reset. (and even if their state could be reset, there would be thread-safety issues.) consider for example: /// creates a deep copy of t template<class T> T copy(T const &t){ { memory_oarchive ar; ar << t; } T tmp; { memory_iarchive ar; ar >> tmp; } return tmp; } "clone" instantiated with a very simple type that doesn't serialize pointers results in almost no code, almost as if operator= was used in case a deep copy doesn`t differ from a shallow copy for that type. almost any archive construction overhead is "too much" overhead here. Stefan

5225

Age (days ago)

5225

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Emil Dotchevski
Noah Roberts
Robert Ramey
Stefan Strasser