Re:[boost] Serialization: versioning and XML.

Anatoli Tubman wrote:
My boss just gave me OK to discuss this, so here goes.
A common drawback of various serialization libraries is the need to provide versioning code for each data structure to be serialized. This need not be the case for XML archives (and other formats that store NVPs). Because the program knows which data members it needs, and the names are stored in the archive, the program can leave any missing members in their current state (typically just after construction) and ignore any extra ones. After all, this is one of the ideas behind XML! More often than not this tactic is sufficient to handle versioning. When it's not, there's always the old way of explicit version checks.
The included XML archive presumes that saving and loading of class variables are synchronized. This is consistent with all the the other archives so that once the serialization for a class is specified it is guaranteed to work with all archives with no code changes.
It seems that this should be easy to implement for boost::serialization.
It might be make an archive which implements XML archive in a different way so that data members would be optional and in an arbitrary sequence. Clearly it would be less efficient than the current version and its not clear whether it would be easy or hard to do. Its probably harder than it would first appear. (as most things are).
Just make two passes through the deserialization function. In the first pass, just collect information on data members to be deserialized into some kind of map keyed on tag. In the second pass, perform actual deserialization, looking at the map built in the first pass. Any tags in the archive but not in the map are ignored together with their contents.
This method would destroy the independence between class serialization specification and archive format. If I were going to do such a thing (which I'm not), I would build this logic into a new XML archive class. You're free to do this. (if your boss will give you permission) Of course, if such an archive can't made, there would be no motivation to use the serialization library at all for this purpose. Actually this question touches upon a central issue about what serialization is all about and how it conflicts with what XML is all about. Serialization is about making an arbitrary set of data structures moveable from one context to another. The serialized data stream is a function of the data structures to be moved. That is, program code => XML definition. The main attraction of serialization is that the code which saves/loads data streams is automatically generated and kept in sync with program data structures. In the XML archive, XML data definition is driven by the program code and data structures. XML is about rendering data in a program independent manner. Program code is synchronized to match the XML structure. That is, XML definition => program code. There are packages that, given and XML definition, will generate C++ code to save/load data as an XML structure. In essence, these packages are the mirror image of serialization. So the question really is: Which is the independent variable? Program code and data structures or the XML definition?. If it's the former serialization to XML archives is a good choice. If it's the latter, a different approach would probably be better. As long as code and XML definition doesn't change, It doesn't make much difference. When something has to change the question arises, do we change he XML and adjust the program to match or vice versa? This question also touches on another topic that comes up regularly. There is often a desire to use the serialization library to generate a specific data format. If this format is a meta-data specification such as XML it's possible. If it's a more specific format - serialization is probably not the right approach. Again the question comes down to who's boss. Is the data stream format driving the program design or vice versa. With the serialization library, data stream format is driven by the program data structures. Attempts to mandate a too specific data stream format may be possible but ultimately not worthwhile. Robert Ramey

Robert Ramey wrote:
This method would destroy the independence between class serialization specification and archive format. If I were going to do such a thing (which I'm not), I would build this logic into a new XML archive class. You're free to do this. (if your boss will give you permission)
It absolutely certainly will be a new archive class. Sorry if it wasn't clear from the start. I just think such an archive/serialization library is (a) useful enough to bea part of boost and (b) similar enough to the existing boost library to share the namespace with it. I understand fully that this idea works only with tagged formats (not necessary just XML), that's why I'm saying that it should probably use a different syntax or maybe a sub-namespace under boost::serialization. My boss :) would be happy to see such a library as a part of boost.
Actually this question touches upon a central issue about what serialization is all about and how it conflicts with what XML is all about.
[long quote snipped] I'm staying within program-structure-drives-data-format school of thought. I don't precisely understand how going in the other direction is ever useful. Okay, given a DTD, you can generate a program that could read and write a bunch of data structures; but what will it *do* with them? -- Anatoli Tubman PTC Israel (Haifa) Tel. (+972) 4-8550035 ext. 229
participants (2)
-
Anatoli Tubman
-
Robert Ramey