On 7/8/07, Robert Ramey
Sean Cavanaugh wrote:
Ok I've spent a good chunk of my day making a set of custom archive class, based on the portable_binary_iarchive and portable_binary_oarchive.
I have a non-virtual class hierarchy named Asset, in memory this is a cyclic graph, but my requirements are that this structure cannot be serialized all at once, it has to be broken up at boundaries. So the archiver serializes the first Asset* (which is always the root object passed into operator&). This object serializes as normally as per the boost serialization code, except except that when subsequent Asset*'s are encountered
?which point to a previously serialized object? or any subsequent Asset *?
I send the pointer into an AssetManager class and translate the Asset* into a ResourceID, and serialize the ResourceID instead of the object. This is done by directly calling operator& inside save_override.
******* Hmmm this sounds to me exactly equivalent to what the serialization system does by default for tracked objects. Objects serialized through pointers are tracked by default. Your "ResourceID" seems a re-implemenation of the "object id" used by the serialization libary to track
This is true, but the serialization libraries view of the id's only have
archive scope, which means the same id's are used for totally different
things when you use multiple archives. My ResourceID's have to be globally
unique, in the sense they are GUIDs or relative pathnames, which can then be
mapped to a fully qualified pathname and can be loaded on demand. As far as
the archive class is concerned its a user defined translation (proxy-to-real
and real-to-proxy) that exists for certain types.
So in this case its conceptually an archive of archives. With the outermost
archive being the filesystem, and the innermost being a single object (a
file). The file only contains one object, and all links to other objects
are handles (a filename). So the innermost code when it is loading from the
filesystem, it knows that it wants a pointer to another object, but it only
has its name. So it has to ask the filesystem class to translate the name
into an object, which it can do because it can lookup if its already loaded
and return that, or literally open another file based archive and read it in
on demand, and return that. The archive class doesn't care about the
specifics really, it just needs the means to achieve the result.
So the archive's base class code should be doing this in a conceptual way:
for_each type X, if has_user_defined_translation
...
I also have to hard-code the full list of derived Asset types and manually provide specializations for all of them in save_override and load_override.
**** Well, since they're different - I would expect each of them to have a different serialize function. If all the serialize functions are the same, it would seem that something should be moved from the derived class to the base class.
If I use the base class, I end up slicing my class down to its base, and cannot serialize it.
**** serializing through a base class pointer solves this problem as well.
In my currently kind-of-working hacked up version of the archive classes the methods load_override are nearly identical when specializing for AssetModel, AssetTexture, etc. The behavior is constant (proxy-to-real translation or vice-versa) but the type is not. I can slice them here safely on saving, but not on loading (since the C++ code in the serialize method 'ar & foo', is expecting a more-derived type to be filled in).
I can't make serialize virtual, since the intrusive serialize methods
are templates, but it certainly would solve the problem if it were possible.
*** I suspect that if the other changes suggested were implemented this would disappear as a problem. I don't think I've tried it, but rather than including boiler plate code in each derived class, one might try adding a "mix-in" base class which contains the serialize function.
I'll play around with alternatives, I basically spent the day learning the archive templates by watching the code flow.
In addition the bodies of all of my overrides are completely identical except for the classname (AssetModel, AssetTexture, etc).
Which means I'll be wrapping the bodies of a generic save_override and load_override in a macro, and have to manually add all Asset derived classes to a list of classes inside my archive class. Which means that my archiver cannot be generic, even though I have managed to make it a template in the sense that the passed in asset manager and base asset types are template parameters.
*** looks to me that you've gotten off on the wrong foot and stuck with it.
That isn't possible with learning new code :) This is what the path of least resistance yielded, with the docs and examples provided by boost. Basically this is as far as I got without having to directly hack on the existing boost code, and having to deal with a crash course on the code flow and internal data structure of everything.
*** me this is exactly the wrong approach. Now you've coupled your classes to be serialized to a specific archive. This means you won't be able to use any other archive type and you've defeated one of the main benefits to the serialization library. Perhaps it wasn't a suitable library for your task.
The library can do what I want, because I have the source code :) Anyway the classes aren't coupled to the archive with what I've come up with so far, its the other way around. I definitely do not want my classes to understand archiving beyond a very basic sense of having to call operator& on most of their fields, since I plan on having several wildly different archive classes calling the serialize methods on my classes. So I have a working implementation, how do I make it better?
**** Maybe you might try doing it in the simplest way.
I can't see how what you want to do is different than what everyone else uses the library for. And I can't see how what you want to do is different than what the examples do.
I could get the behavior I want by altering the serialize methods, but then it would be ill formed for other archives. I could also template specialize the serialize methods for the archive in question, but then I would have to write more than one. Its the archives job to interpret what to do when you call ar & foo. I anticipate having more data than I can load, so I need to load and save at an object level. But I still need to write the serialize methods as if they all could fit in memory, since I plan on having other archive classes that do operate on the graph of what is loaded at runtime (i.e. to compute garbage collection). In essence the archive classes need to be made to be programmable for these behaviors to work: Graph of Foo: Saving: save the first Foo*, translate all further Foo*'s into a user defined handle with a user defined function and save that instead. Loading: load the first Foo*, assume all further Foo*'s are saved with a user defined handle, translate them back into live objects on demand, and also use the existing caching scheme to prevent having to translate the same user-defined handle over and over. Graph of Bar: Saving: save all Bar's, but save all occurences of Foo*'s as handles Loading: save all Bar's, but load all occurences of Foo*'s from handles Garbage Collecting Foo: 'Saving' : archive an array of live root level Foo objects, build a list of all Foo* that are reachable through serialization. Compare this list to the full list of Foo objects, and unload the ones that are missing.