Re: [boost] [serialization] fast array serialization (10x speedup)

25 Nov 2005

      Robert Ramey wrote:
...
Matthias Troyer wrote:
...
...
Dave's design does not change anything in your archives or
serialization functions, but only adds an additional binary archive
using save_array and load_array.
Hmm - that's not the way I read it. I've touched on this in another post.
As already explained elsewhere, you mis-read it.  The archive was in a
sub-namespace.  Perhaps it would have been clearer if Dave had used a
completely different namespace name, boost::array_serialization_extensions,
or boost::not_the_serialization_namespace ?.
...
...
...
...
c) The premise that one will save a lot of coding
(see d) above) compared to to the current method
of overloading based on the pair of archive/type
is overyly optimistic.
Actually I have implemented two new archive classes (MPI and XDR)
which can profit from it, and it does save lots of code duplication.
All of the serialization functions for types that can make use of
such an optimization can be shared between all these archive types.
In addition formats such as HDF5 and netCDF have been mentioned,
which can reuse the *same* serialization function to achieve optimal
performance.
There is nothing "optimistic" here since we have the actual
implementations, which show that code duplication can be avoided.
OK - I can really only comment on that which I've seen.
Are we talking at cross-purposes here?  Matthias is talking about sharing
*serialization* functions.  That is, for each data type, there is only
*one* serialization function that calls load/save_array (or whatever the
array hook function is called...).

You seem to be disputing the code duplication issue by saying that different
*archives* will not typically(*) be able to share implementations of array
processing.  This I completely agree with.  But that is a completely
separate to the number of *serialization* functions that need to be
written.

Matthias, it might help if you show an example of a serialization function
for some vector type, and the implementation of the array processing for
the MPI and XDR archives, do demonstrate the orthogonality of the
serialization vs archive ideas.

(*) of course there are some counter-examples.  That is the idea for
deriving one archive from another, is it not?

[snip]
...
...
...
As you can see the overhead of the serialization library (less than
2%) is insignificant compared to the cost of doing lots of individual
insertion operations into the buffer instead of one big one. The
bottleneck is thus clearly the many calls to save() instead of a
single call to save_array().
Well, this is interesting data.  the call to save() resolves inline to
a call to std::vector get element and stuffing the value into the buffer.
I wonder how much of this in std::vector and how much is in
the save to the buffer?.
As described here 
http://lists.boost.org/Archives/boost/2005/11/97156.php
the effect of using a custom buffer versus a buffer based around
vector::push_back is exactly a factor 2, irrespective of cache effects. 
Matthias' benchmark showed that the time taken to serialize an array into a
vector buffer is almost the same as the time taken to push_back the array
in a loop (ie. the serialization library itself introduces negligable
overhead in this case).  Thus, a serialization archive based on the same
buffer I used in my benchmark should achieve the same factor 2 speedup.

Note that the speedup using save_array was of the order of 30, so that, even
with a factor 2 speedup from using an optimized buffer, save_array would
still be 15 times faster!  (This is using the first set of data.  Using the
set for small arrays would only give a modest factor 3x improvement for
save_array versus a cusom buffer archive).

Cheers,
Ian

Re: [boost] [serialization] fast array serialization (10x speedup)

Ian McCulloch