[boost] array optimized serialization (was: [MPI] Review comments)

19 Sep 2006

      Hi Robert,

Since this is a discussion orthogonal to the MPI review, I have  
renamed the subject.

On Sep 19, 2006, at 12:30 AM, Robert Ramey wrote:
...
Matthias Troyer wrote:
...
On Sep 18, 2006, at 8:50 PM, Robert Ramey wrote:
...
Thus, there is no std::vector, std::valarray, ... overload in any of
the archives - not in the binary archive nor anywhere else.
Well, we're not seeing the same thing.
template<class Archive>
class basic_binary_oarchive :
    public array::oarchive<Archive>
{
...
template <class Archive>
class oarchive
 : public archive::detail::common_oarchive<Archive>
[snip - implementation details - snip]
OK, now I understand your point. Let us recall the original design  
that we had discussed and that was in the end your suggestion:

1. introduce an array wrapper array<ValueType> to be used in the  
serialize function of all classes that can profit from array  
optimizations

2. provide a default serialization for the array wrapper

3. overload the array< ValueType>  serialization in those classes  
that can optimize array serialization

4. to simplify these optimizations and avoid code duplication provide  
a wrapper like

   template <class Archive> array_oarchive<Archive> : Archive {....}

   to "add" array optimization to existing archives

I hope you do not want to change this design now?

We had next implemented just this, and placed in a separate namespace  
archive::array, without touching your binary archive. The  
save_override for array<ValueType> with all the dispatch logic you  
quoted above was implemented in this wrapper.

In addition, we provided, in the archive::array namespace, separated  
from yours, an array optimized binary archive using that wrapper.  
Then, on June 4th you sent me an e-mail, suggesting that I move the  
array<ValueType> optimization into your binary archive. I quote from  
your e-mail:

#Howevever, once we introduce the concept of wrapper, we can  
introduce the
# special wrapper type into binary_archive without changing the  
requirements
# for other archives. This is the model for xml_?archives.  So I  
think you
# should move your array type to somewhere higher - perhaps as
# basic_binary_?archive.  This would eliminate a set of special  
classes just
# for this.  In this way we all get what we want: You get enhanced
# binary_?archive - I keep the minimal set of requirements for all new
# archives.

The easiest way of achieving this without code duplication was to  
actually provide an array-optimized base class from which all  
archives using array optimization could derive, and to put all of the  
array optimization logic into this base class array::oarchive, from  
which you quoted above.

It seems now, however, that you dislike this, and that's why I  
proposed to just roll back to the state before June 4th, thus not  
touching any of your archives, and again providing our own version of  
an array optimized binary archive. You can then implement the array  
optimization as you feel like for your archives, and we have our  
mechanism for our archives. Please let me know if I should do this.
...
which just moves some of the implementation out of binary_oarchive
into oarchive.  What attracts my attention is:
// dispatch saving of arrays to the optimized version where  
supported
  template<class ValueType>
  void save_override(serialization::array<ValueType> const& x,  
unsigned int
version)
  {
Now these suggest to me that the next person who want's to handle his
wrapper
specially for certain archives will then want/have to back into
binary_oarchive
and/or oarchive to add special handling to HIS wrapper.  That is  
what I mean
by not scalable.
Actually, as you can see from your own mail of June 4th, it was *you*  
who suggested to move this save_override into the binary archive. If  
you now have doubts about this idea of yours, then we can again roll  
back to the CVS state of June 4th.
...
template<class ValueType, class Allocator>
  void save_override(std::vector<ValueType,Allocator> const &x,  
unsigned int
version)
  {
And then there is the special handling for std::vector - which  
suggests that
when
one want's to add special handling for multi_array he'll have to do
something else.
No, this will not be needed for multi_array, nor for ublas or Blitz  
arrays or any other array class which requires default constructible  
value types. The std::vector override is only a workaround until we  
have an is_default_constructible traits in Boost. If your main  
objection is to that overload, then we can just implement  
is_default_constructible and remove that workaround.
...
Finally - the usage of inheritance used above strikes me as confusing,
misleading, and unnecessary.  This may or may not be strictly an
aesthetic issue.  The only really public/protected function is the
overload for the array wrapper and the default forwarder. These
entry point functions are only called explictly by the drived class.
Basically the base class is not being used to express an "IS-A"
relationship but rather an "Implemented in terms of" relationship
as decribed in Scott Meyers Effective C++ item 41. So I would
have expected something in binary_oarchive like
template<class T>
void save_override(array<T> & t, const unsigned int version) const {
    // if array is optmizable
    save_optimized_array(t, version); // most of oarray in here
   // else
   // forward to default handling
}
This would have to be added to every archive which supports this
but then we now have to explictly forward twice anyhow so I don't
see a hugh difference.
Well, you are suggesting to just copy this same function plus the  
"  // if array is optmizable " dispatch logic into every new archive  
class that supports array optimization. Isn't heritance just the  
mechanism to avoid this copy&pasting of hundred lines of identical  
code? Again, as I said before, if you do not like this in your binary  
archive, we can easily remove it from there again and you can do your  
own implementation, while we do ours. Sha
...
...
What you
seem to propose, both above and in the longer text I cut, is to
instead re-implement the optimized serialization for all these N
classes in the M different archive types that can use it (we have M=4
now with the binary, packed MPI, MPI datatype, and skeleton archives,
and soon we'll do M+=2 by adding XDR and HDF5 archives.). Besides
leading to an M*N problem, which the array wrapper was designed to
solve, this leads to intrusion problems into all classes that need to
be serialized (including multi_array and all others), which is not
feasible as we discussed last year.
I am aware of the desire to avoid the M*N problem - that should be
pretty clear from the design of the serialization library itself -  
which
takes great pains to permit any serializable type to be saved/load
from any archive.  The problem is that when I look at the code, its
not clear that this problem is being addressed.
And its not even clear that there is an M*N problem here:
binary archives benefit from optimization of types which are
"binary serializable"
mpi_archives benefit from optimizaton of types for which there
exists a predefined MPI prmitive.
etc.
There may or may not be overlapp here.  But it seems to me
that we're trying to hard to factor where truely common
factors don't exist.
So I'm looking at
binary, packed - 3 archives - valarray, vector, C++ native arrays 3  
types- 9
overloads
and one special trait - is_binary_serializable
No, wait a moment!

If by packed you mean a packed MPI archive then  
is_binary_serializable is not the right trait.

Also, where do you get only 3 types? We will have multi_array, ublas  
matrices and vectors, MTL arrays, Blitz arrays and many more coming.  
But the array wrapper reduced this down to just one class: array.
...
mpi - 1 archive * 3 types - 3 overloads
and one special trait - is_mpi_optimizable
Again why 3 overloads? There is only one for array (and the temporary  
workaround for vector which we can remove if you want).

So, if we keep the array wrapper then we are halfways there, Having  
reduced the M*N to an M*1 problem. What we have done next is to go  
further and simplify even this M*1 problem  by introducing a base  
class that implements the dispatch logic for array. I believe that  
this is what you do not like. Thus we can easily remove it again from  
your archive, and leave the array optimization for your binary  
archive to you, while we use our method for the archives that we are  
writing (packed MPI, MPI datatype, skeleton, XDR, HDF5).
...
Hmmm - when I proposed it I had in mind that it would be used
differently - as I outlined above.  This is not anyone's fault, it  
just
didn't occur to me that it would occur to anyone to use it  
differently.
And truth is, I've really just wanted to keep ehancements/extensions
to the library orthogonal to the original concepts.  I thought the
array wrapper suggestion would accomplish that so once it was
"accepted" I got back to may paying job.
OK, since I also do not get paid for this, and your criticism is  
based on a change that you yourself proposed on June 4th, my proposal  
is to stop the discussion right here, go back to the state of June  
4th, where our optimizations were completely decoupled from your  
archives. That way you will be able to implement the array  
optimization for the binary archive in the way you like best, and we  
have our own way. Please confirm and I will do that.

Matthias