Re: [boost] [MPI] Review comments

18 Sep 2006

      On Sep 18, 2006, at 8:50 PM, Robert Ramey wrote:
...
...
The set of types for which an array optimization can be done is
different for binary, MPI, XDR, ... archives, but a common dispatch
mechanism is possible, which is what we have implemented in the
array::[io]archive classes.
And I think that is what I have a problem with.  The "common" dispatch
as I see it implemented presumes the known optimizable types.  When
other optimizable types are added, this will have to grow.  It  
seems to
me that it is fundementally not scalable.  So personally, I would  
prefer
to add the code to the derived types - but I understand this is my
preference.
No, the "optimizable types" are not the types (like std::vector,  
std::valarray) for which an array optimization exists, but rather the  
value_types of the array for which the storage can be optimized. This  
set depends only on the archive itself, and not on the types, and  
each archive can have its own lambda expression to determine whether  
the value_type is optimizable. Adding optimized serialization to e.g.  
multi_array will only mean that multi_array should use the array  
wrapper to serialize its data instead of writing a loop over all  
elements. This simplifies the serialization implementation for this  
class, and automatically provides optimized serialization for all the  
types, without any change in the serialization library, nor any  
change in an archive.

This is perfectly scalable in contrast to your idea of having each  
archive class re-implement the serialization of all optimizable  
containers. I am a bit confused about your arguments above since it  
was actually you who suggested the array wrapper as the least  
intrusive and scalable solution
...
I would to find this code as part of the array_wrapper for std::vector
rather than as part of the archive class.
again, there is no array_wrapper for std::vector, rather the  
std::vector<T> serialization serializes its data through an array<T>  
wrapper, as you had proposed.
...
This would entail making the above somewhat more elaborate
class array<std::vector> {
    template<class Archive>
    void binary_serialize(...){...}
template<class Archive>
    void mpi_serialize(...){...}
template<class Archive>
    void serialize<Archive &ar, const unsigned int version) const {
        // if Archive is derived from base_binary
            binary_serialize(ar, version);
        // else
        // if Archive is derived from base_mpi_archive
            mpi_serialize(...)
        // else
            array_default<T>::serialize(* this, version)
}
Ouch!!! This is just what I mean by not scalable. We already have  
five cases now (plain, binary, packed MPI, MPI datatype, skeleton)  
with two more coming soon (XDR, HDF5). Do you really want that each  
author of a serialization function for an array-like data structure  
should reimplement an optimization for all these archives????????
...
...
If by three or four logically distinct things you mean
1. the array optimization
2. the skeleton&content archive wrappers
3. the MPI archives
4. the MPI library
then my comments are:
1. is already factored out and in the serialization library. If
anything should be done to it, there was the desire to extend array
wrappers to strided arrays, which can easily be done without touching
anything in the serialization library.
Hmmm - what about MTL or ublas - don't these have there own special
types for collections.  I know boost::multi_array does.  Wouldn't  
these
have to be added to the std::valarray, and std::vector already in the
binary archive?
I skipped most of the above because it seems there is a fundamental  
misunderstanding regarding the role of the array wrapper. The array  
wrapper, which you had suggested yourself, was introduced to  
completely decouple array optimizations from specific datatypes. When  
implementing MTL, ublas, Blitz or other serialization one just uses  
an array wrapper to serialize contiguous arrays. An archive can then  
user either the element-wise default serialization of the array  
wrapper, or decide to overload it, and implement an optimized way --  
independent of which class the array wrapper came from.

Thus, there is no std::vector, std::valarray, ... overload in any of  
the archives - not in the binary archive nor anywhere else. What you  
seem to propose, both above and in the longer text I cut, is to  
instead re-implement the optimized serialization for all these N  
classes in the M different archive types that can use it (we have M=4  
now with the binary, packed MPI, MPI datatype, and skeleton archives,  
and soon we'll do M+=2 by adding XDR and HDF5 archives.). Besides  
leading to an M*N problem, which the array wrapper was designed to  
solve, this leads to intrusion problems into all classes that need to  
be serialized (including multi_array and all others), which is not  
feasible as we discussed last year.
...
I am intrigued by the skeleton - again the documentation doesn't  
really
give a good idea of what it does an what else it might be used for.
The skeleton is just all types that you treat in the archive classes  
and not in the primitives, while the contents is all you treat in the  
primitives. It is just a formalization of your serialization library  
implementation details.
...
So my complaints really come down to two issues.
a) I'm still not convinced you've factored optimizations which
can be applied to certain pairs of types and archives in the best
way.
That's a separate discussion which we seem to be repeating every few  
months now. It seems to me from today's discussion that there is a  
confusion now about the use of the array wrapper, which we use in  
just the way you originally proposed.
...
b) The MPI documention doesn't make very clear the organization
of the disparate pieces.  Its a user manual "cookbook" which is
fine as far as it goes.  But I think its going to need more  
explanation
of the design itself.
Most of the issues you are interested, such as the use of  
serialization for the skeleton&content are implementation details,  
the important points of which will be explained in a paper that is  
currently being written.

Matthias

Re: [boost] [MPI] Review comments

Matthias Troyer