Re:Re: [boost] Serialization Formal Review #2

Matthias Troyer wrote:
I have tried extending the library since I want to swap our old serialization library used at http://alps.comp-phys.org with a Boost serialization library as soon as possible. It was important to keep the same archive format, since we have gigabytes of data collected over ten years in millions of CPU-hours and need to be able to read those with both new and old codes. With the help of Robert I managed to convince myself that this is indeed possible
Perhaps it might be possible, but I'm doubtful that it's the best way to address the problem. If the legacy format is a meta-data format it can be possible in a way similar to the way the XML was handled. But in general it will not be desirable nor necessary. To see this take an extreme example. Suppose over the last 10 years we had 10 different programmers working on a the project. Each one had to save and reload data for his classes. Now we want to convert to the new system. We would need a new archive class that tracked all the idiosyncracies in the current code. A huge job. And there exists a much simpler approach. Loading legacy data(legacy_data & ld) ==================== Ifstream is("old_data"); // read first line of old_data file // if it doesn't have the serialization signature // load the data using the legacy system. else{ Boost::binary_iarchive ia(is); ia >> ld } Save data // save in using serialization. As file are processed they are automatically converted to the new system. No special programming is required as the code to load the class instances from the legacy data format already exists - it's the legacy code ! its free at this point.
4. Documentation of archive formats, especially what class and object information is stored. I want to be able to predict, from reading the documentation, what exactly will be written to the archive and in which order.
This would entail a detail paraphrasing of the operation of the code. In the case where there is no versioning, no tracking and no pointer serialization its fairly simple. When any of the others are present, its starts to get pretty complex. There is a comment in basic_archive.cpp which in fact summarizes the format - but I'm not convinced it something that belongs in user documentation. ////////////////////////////////////////////////////////////////////// // // class_information is stored as // // class_id* // -1 for a null pointer // if a new class id // [ // exported key - class name* // tracking level - always/never // file version // ] // // if tracking // [ // object_id // ] // // [ // if a new object id // data... // ] // // * required only for pointers - optional for objects This recursively defines the file format for any serialized data structure.
This is essential when exchanging data with an application that does not use this library.
I'm skeptical that this is going to be fruitful. The "other" application is going to have to effectively re-implement this whole library. Why bother, just use this one. To send data to "other" applications, there is always the XML archive. I don't think that a general purpose tool such as serialization is going to be very helpful in trying to implement some externally defined data format. Some meta-data formats (e.g. XML and windows ini files) are doable but in general it's not going to be productive. Robert Ramey

On Apr 24, 2004, at 8:18 PM, Robert Ramey wrote:
Matthias Troyer wrote:
I have tried extending the library since I want to swap our old serialization library used at http://alps.comp-phys.org with a Boost serialization library as soon as possible. It was important to keep the same archive format, since we have gigabytes of data collected over ten years in millions of CPU-hours and need to be able to read those with both new and old codes. With the help of Robert I managed to convince myself that this is indeed possible
Perhaps it might be possible, but I'm doubtful that it's the best way to address the problem. If the legacy format is a meta-data format it can be possible in a way similar to the way the XML was handled. As file are processed they are automatically converted to the new system. No special programming is required as the code to load the class instances from the legacy data format already exists - it's the legacy code ! its free at this point.
In our case this is not possible as long as applications using the old format are still in use. More than a dozen people at various places around the worlds have written simulation programs and produced codes written based on our application framework. I thus do not have full control over all the data that is around and need to support the legacy format for years to come. On the other hand, I have full control over the future of the serialization level of these codes. The current version of your library allows me to implement an archive compatible with the legacy format. This will allow me to easily swap my serialization library with yours, and I need to support the legacy files just through a simple archive type. Old codes will still compile and might not even notice that serialization is now handled by your library. Application developers can then, if desired rewrite their codes to employ new features of your library, but they do not have to. If I had all the data and all codes based on our libraries under my control, I could, as you recommend just change codes and file format. I wish to thank you for your advice and your willingness to help. What is most important for me (and maybe also useful for others) is that I can use your library and still read my old files.
4. Documentation of archive formats, especially what class and object information is stored. I want to be able to predict, from reading the documentation, what exactly will be written to the archive and in which order. This is essential when exchanging data with an application that does not use this library.
I'm skeptical that this is going to be fruitful. The "other" application is going to have to effectively re-implement this whole library. Why bother, just use this one.
The other applications may be written in Fortran, Java or other languages, such as Python. As long as no pointers or derive classes are used the format should be fairly simple to re-implement reading and writing such files in another language.
To send data to "other" applications, there is always the XML archive.
For new applications we are already defining XML applications to exchange data, but large data sets will always be stored in a binary format and referenced from these XML files. Matthias
participants (2)
-
Matthias Troyer
-
Robert Ramey