Re: [boost] [serialization] fast array serialization (10x speedup)

13 Nov 2005


      Matthias Troyer wrote:
...
...
a) save_binary optimizations are invoked from the
fast_oarchive_impl class.
They only have to be specified once even though they are used in more
than one variation of the binary archive.  That is, if there are N
types to be subjected to the treatment by M archives - there are
only N overrides - regardless of the size of M.
Indeed this reduces an NxM problem into a 2*N problem: serialzation
of all classes that can profit from this mechanism needs to written
twice, Better than M times, but still worse than doing it once. There
are more fundamental problem though that I will come to later.
In your version of serialization/valarray.hpp, serialization/vector.hpp
there are two functions - one for fast archives and one for other
archives.  That is, for all collections which might benefit from
this optimization, there are two implementations.  This is the
key feature of your implementation - I have preserved that
in my suggestion for an alternative.  Put another way,  Exactly
the same number of functions need to be written in both
implementations - there is no difference in this point.
...
...
b) save_binary optimizations can be overriden for any particular
archive
types.
(It's not clear to me how the current submission would address such a
situation).
Actually the problem is reverse. In my proposal, the save_array
function of the archive can decide how to treat each type, while your
proposal dispatches everything to save_binary.
currently any archive can decide how to treat any type.  There is no
need or benefit to making a separate function to do this.
...
...
d) It doesn't require that anything in the current library be
conditioned on
what
kind of archive is being used.  Insertion of such a coupling would be
extremely unfortunate and create a lot of future maintainence work.
This would be extremely unfortunate for such a special purpose
library. This
is especially true since its absolutly unnecessary.
This coupling can be removed in my proposal just by moving the
serialization of arrays out of i/o serializer.hpp and into a separate
header. A coupling between archive types and serialization of arrays
will be necessary at some point, and encapsulating this in a single
small header file is probably the best.
You can hide the code in i/o serializer.hpp just by inserting the
following into your own archive

    // for C++ built-in arrays
    template<typename T, int N>
    void save_override(const T (& t)[N] , int){
       // your own code here.  whatever.
        save_array(t, sizeof(t));
    }

Once you do this, the code in i/o serializer for buit-in arrays
hidden and never invoked.  It effectively invisible to your
code.  This technique is shown in demo_fast_archive to
achieve exactly this end.  This is the basis of my view that
the core library doesn't have to be modified to achieve your
ends.
...
...
e) The implemenation above, could easily be improved to be resolved
totally
at compile time.  Built with a high quality compiler (with the
appropriate
optimization switches set), this would result in fastest possible
code.
Same as my proposal.
No disagreement here.   It is my intention in this section
to show how you can implement your proposal in a less
intrusive way.
...
...
g) Now, f above could also be seen as a disadvantage.  That is, it
might seem better to let each one involved in serialization of
a particular collection keep his stuff separate.  There are a
couple of options here I will sketch out.
i) one could make a new trait is_bitwise_serializable whose default
value is false.  For each collection type one would specialize this
like:
template<class T>
struct is_bitwise_serializable<vector <T> > {
   ... is_fundamental<T>
   ... get_size(){ // override default options which is sizeof(T)
       ..
}
Now fast_oarchive_impl would contain something like:
// here's a way to do it for all vectors in one shot
    template<class T>
    void save_override(const T & t, int){
        // if T is NOT bitwise serializable - insert mpl magic
required here
          // foward call to base class
          this->Base::save_override(t, 0);
          // else -
          *(this->This()) << make_nvp("count", t.size() * sizeof(T));
          *(this->This()) << make_nvp(make_binary_object(...get_size
(), &
t));
          // note - the nvp wrappers probably not necessary of
we're only
          // only going to apply this to binary archives.
    }
Which would implement the save_binary optimization for all types with
the the is_bitwise_serializable trait set.  Of course any class
derived from fast_oarchive_impl could override this as before.

...
There is one serious and fundamental flaw here: whether or not a
certain type can be serialized more efficiently as an array depends
not only on the type, but also on the archive. Hence we need a trait
taking BOTH the archive and the type, one like the has_fast_array
serialization that I proposed.
You don't need a trait because the  code is implemented inside of
fast_oarchive_impl so it won't get invoked by any other archive
class.
...
...
ii) another option would be to implement differing serializations
depending upon the archive type.  So that we might have
template<class T>
void save(fast_oarchive_impl &ar, const std::vector<T> & t, const
unsigned
int){
    // if T is a fundamental type or ....
    ar << t.size();
    ar.save_binary(t.size() * sizeof(T), t.data?());
}
This would basically much simpler substitute for the
"fast_archive_trait"
proposed by the submission.
Now we are back to an NxM problem.
No we're not.  fast_oarchive_impl is a base class from which all your
other "fast" archives are derived:
...
But the real issue is that for many array, vector or matrix types
this approach is not feasible, since serialization there needs to be
intrusive. Thus, I cannot just reimplement it inside the archive, but
the library author of these classes needs to implement serialization.
Hence, your approach will not work for MTL matrices, Blitz arrays and
other data types.
Matthias
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] [serialization] fast array serialization (10x speedup)

Robert Ramey