Re:Re: [boost] Serialization Formal Review #2

25 Apr 2004

      Matthias Troyer wrote:
...
I have tried extending the library since I want to swap our old
serialization library used at http://alps.comp-phys.org with a Boost
serialization library as soon as possible. It was important to keep the
same archive format, since we have gigabytes of data collected over ten
years in millions of CPU-hours and need to be able to read those with
both new and old codes. With the help of Robert I managed to convince
myself that this is indeed possible
Perhaps it might be possible, but I'm doubtful that it's the best way to
address the problem.  If the legacy format is a meta-data format it can be
possible in a way similar to the way the XML was handled.  But in general it
will not be desirable nor necessary. To see this take an extreme example.
Suppose over the last 10 years we had 10 different programmers working on a
the project.  Each one had to save and reload data for his classes.  Now we
want to convert to the new system.  We would need a new archive class that
tracked all the idiosyncracies in the current code.  A huge job.  And there
exists a much simpler approach.

Loading legacy data(legacy_data & ld)
====================
Ifstream is("old_data");
// read first line of old_data file
// if it doesn't have the serialization signature  
	// load the data using the legacy system.
   else{
	Boost::binary_iarchive ia(is);
	ia >> ld
   }

Save data
// save in using serialization.

As file are processed they are automatically converted to the new system.
No special programming is required as the code to load the class instances
from the legacy data format already exists - it's the legacy code ! its free
at this point.
...
4. Documentation of archive formats, especially what class and object
information is stored. I want to be able to predict, from reading the
documentation, what exactly will be written to the archive and in which
order.
This would entail a detail paraphrasing of the operation of the code. In the
case where there is no versioning, no tracking and no pointer serialization
its fairly simple.  When any of the others are present, its starts to get
pretty complex.  There is a comment in basic_archive.cpp which in fact
summarizes the format - but I'm not convinced it something that belongs in
user documentation.

//////////////////////////////////////////////////////////////////////
//
// class_information is stored as
//
//      class_id*   // -1 for a null pointer
//      if a new class id
//      [
//          exported key - class name*
//          tracking level - always/never
//          file version
//      ]
//
//      if tracking
//      [
//          object_id
//      ]
//          
//      [   // if a new object id
//          data...
//      ]
//
//  * required only for pointers - optional for objects

This recursively defines the file format for any serialized data structure.
...
This is essential when exchanging data with an application that
does not use this library.
I'm skeptical that this is going to be fruitful.  The "other" application is
going to have to effectively re-implement this whole library.  Why bother,
just use this one.  

To send data to "other" applications, there is always the XML archive.

I don't think that a general purpose tool such as serialization is going to
be very helpful in trying to implement some externally defined data format.
Some meta-data formats (e.g. XML and windows ini files) are doable but in
general it's not going to be productive.

Robert Ramey