
On Sep 18, 2006, at 8:50 PM, Robert Ramey wrote:
The set of types for which an array optimization can be done is different for binary, MPI, XDR, ... archives, but a common dispatch mechanism is possible, which is what we have implemented in the array::[io]archive classes.
And I think that is what I have a problem with. The "common" dispatch as I see it implemented presumes the known optimizable types. When other optimizable types are added, this will have to grow. It seems to me that it is fundementally not scalable. So personally, I would prefer to add the code to the derived types - but I understand this is my preference.
No, the "optimizable types" are not the types (like std::vector, std::valarray) for which an array optimization exists, but rather the value_types of the array for which the storage can be optimized. This set depends only on the archive itself, and not on the types, and each archive can have its own lambda expression to determine whether the value_type is optimizable. Adding optimized serialization to e.g. multi_array will only mean that multi_array should use the array wrapper to serialize its data instead of writing a loop over all elements. This simplifies the serialization implementation for this class, and automatically provides optimized serialization for all the types, without any change in the serialization library, nor any change in an archive. This is perfectly scalable in contrast to your idea of having each archive class re-implement the serialization of all optimizable containers. I am a bit confused about your arguments above since it was actually you who suggested the array wrapper as the least intrusive and scalable solution
I would to find this code as part of the array_wrapper for std::vector rather than as part of the archive class.
again, there is no array_wrapper for std::vector, rather the std::vector<T> serialization serializes its data through an array<T> wrapper, as you had proposed.
This would entail making the above somewhat more elaborate
class array<std::vector> { template<class Archive> void binary_serialize(...){...}
template<class Archive> void mpi_serialize(...){...}
template<class Archive> void serialize<Archive &ar, const unsigned int version) const { // if Archive is derived from base_binary binary_serialize(ar, version); // else // if Archive is derived from base_mpi_archive mpi_serialize(...) // else array_default<T>::serialize(* this, version) }
Ouch!!! This is just what I mean by not scalable. We already have five cases now (plain, binary, packed MPI, MPI datatype, skeleton) with two more coming soon (XDR, HDF5). Do you really want that each author of a serialization function for an array-like data structure should reimplement an optimization for all these archives????????
If by three or four logically distinct things you mean
1. the array optimization 2. the skeleton&content archive wrappers 3. the MPI archives 4. the MPI library
then my comments are:
1. is already factored out and in the serialization library. If anything should be done to it, there was the desire to extend array wrappers to strided arrays, which can easily be done without touching anything in the serialization library.
Hmmm - what about MTL or ublas - don't these have there own special types for collections. I know boost::multi_array does. Wouldn't these have to be added to the std::valarray, and std::vector already in the binary archive?
I skipped most of the above because it seems there is a fundamental misunderstanding regarding the role of the array wrapper. The array wrapper, which you had suggested yourself, was introduced to completely decouple array optimizations from specific datatypes. When implementing MTL, ublas, Blitz or other serialization one just uses an array wrapper to serialize contiguous arrays. An archive can then user either the element-wise default serialization of the array wrapper, or decide to overload it, and implement an optimized way -- independent of which class the array wrapper came from. Thus, there is no std::vector, std::valarray, ... overload in any of the archives - not in the binary archive nor anywhere else. What you seem to propose, both above and in the longer text I cut, is to instead re-implement the optimized serialization for all these N classes in the M different archive types that can use it (we have M=4 now with the binary, packed MPI, MPI datatype, and skeleton archives, and soon we'll do M+=2 by adding XDR and HDF5 archives.). Besides leading to an M*N problem, which the array wrapper was designed to solve, this leads to intrusion problems into all classes that need to be serialized (including multi_array and all others), which is not feasible as we discussed last year.
I am intrigued by the skeleton - again the documentation doesn't really give a good idea of what it does an what else it might be used for.
The skeleton is just all types that you treat in the archive classes and not in the primitives, while the contents is all you treat in the primitives. It is just a formalization of your serialization library implementation details.
So my complaints really come down to two issues.
a) I'm still not convinced you've factored optimizations which can be applied to certain pairs of types and archives in the best way.
That's a separate discussion which we seem to be repeating every few months now. It seems to me from today's discussion that there is a confusion now about the use of the array wrapper, which we use in just the way you originally proposed.
b) The MPI documention doesn't make very clear the organization of the disparate pieces. Its a user manual "cookbook" which is fine as far as it goes. But I think its going to need more explanation of the design itself.
Most of the issues you are interested, such as the use of serialization for the skeleton&content are implementation details, the important points of which will be explained in a paper that is currently being written. Matthias