
Matthias Troyer wrote:
I have tried extending the library since I want to swap our old serialization library used at http://alps.comp-phys.org with a Boost serialization library as soon as possible. It was important to keep the same archive format, since we have gigabytes of data collected over ten years in millions of CPU-hours and need to be able to read those with both new and old codes. With the help of Robert I managed to convince myself that this is indeed possible
Perhaps it might be possible, but I'm doubtful that it's the best way to address the problem. If the legacy format is a meta-data format it can be possible in a way similar to the way the XML was handled. But in general it will not be desirable nor necessary. To see this take an extreme example. Suppose over the last 10 years we had 10 different programmers working on a the project. Each one had to save and reload data for his classes. Now we want to convert to the new system. We would need a new archive class that tracked all the idiosyncracies in the current code. A huge job. And there exists a much simpler approach. Loading legacy data(legacy_data & ld) ==================== Ifstream is("old_data"); // read first line of old_data file // if it doesn't have the serialization signature // load the data using the legacy system. else{ Boost::binary_iarchive ia(is); ia >> ld } Save data // save in using serialization. As file are processed they are automatically converted to the new system. No special programming is required as the code to load the class instances from the legacy data format already exists - it's the legacy code ! its free at this point.
4. Documentation of archive formats, especially what class and object information is stored. I want to be able to predict, from reading the documentation, what exactly will be written to the archive and in which order.
This would entail a detail paraphrasing of the operation of the code. In the case where there is no versioning, no tracking and no pointer serialization its fairly simple. When any of the others are present, its starts to get pretty complex. There is a comment in basic_archive.cpp which in fact summarizes the format - but I'm not convinced it something that belongs in user documentation. ////////////////////////////////////////////////////////////////////// // // class_information is stored as // // class_id* // -1 for a null pointer // if a new class id // [ // exported key - class name* // tracking level - always/never // file version // ] // // if tracking // [ // object_id // ] // // [ // if a new object id // data... // ] // // * required only for pointers - optional for objects This recursively defines the file format for any serialized data structure.
This is essential when exchanging data with an application that does not use this library.
I'm skeptical that this is going to be fruitful. The "other" application is going to have to effectively re-implement this whole library. Why bother, just use this one. To send data to "other" applications, there is always the XML archive. I don't think that a general purpose tool such as serialization is going to be very helpful in trying to implement some externally defined data format. Some meta-data formats (e.g. XML and windows ini files) are doable but in general it's not going to be productive. Robert Ramey