Re: [boost] [serialization] fast array serialization (10x speedup)

15 Oct 2005

      On Wed, Oct 12, 2005 at 02:17:06PM +0200, Matthias Troyer wrote:
...
Oct 11, 2005, at 6:45 PM, Robert Ramey wrote:
...
I would prefer
something like the following:
class my_class {
    stl::vector<int> m_vi;
    ...
};
template<class Archive>
my_class::serialize(Archive &ar, const unsigned int version){
    // standard way
    ar & m_vi;
    // or fast way which defaults to standard way in appropriate  
cases.
    save_array(ar, m_vi);
}
This
    a) keeps the stl portion of the library smaller
    b) leave the user in control of what's going on
    c) permits development of save/load array to be on an independent
parallel track
    with everyting else.
I find this proposal unacceptable for the following reasons:
- it breaks the orthogonality between serialization and archives
- how the array representation of the vector gets serialized should  
be a concern of the archive and not the user
- the user has to remember to always call save_array(ar,m_vi) instead  
of just serializing the vector directly. This is quite error-prone  
and will easily lead to sub-optimal code.
- the user has to know for which classes to call save_array and for  
which it will not be needed. For vector it might be intuitive, but  
what about ublas matrix types: do you know which ublas matrix types  
can use fast array serialization and which ones cannot? Or, even  
worse, if the matrix type is a template parameter you have a worse  
problem.
I'm not real fond of adding save_array() to the archive's interface
either.  When you first see the examples for the serialization
library:

struct Something 
{
  double x, y, z;      
  doesnt_matter_what_it_is_t d;
  template <class Archive>
  void serialize(Archive & ar, unsigned version)
  {
    ar & x;
    ar & y;
    ar & z;
    ar & d;
  }
};

You think Wow.  That's cool.  It's so clean.  And you can pass
*anything* to the archive?  It tracks the pointers and everything?
Wow.  When you later get on to nvp() stuff, the base_object<> and
export macros, you react "ah, well, it can't all be magic.  It's still
supercool", and despite these base_object<>-type caveats you can still
teach a monkey to put serialization routines into his classes (for me
this is essential).  To the monkey it makes sense that you have to
explain things like your base classes to the serialization library.
It won't make sense that you can pass the archive an int, map, or a
pointer to a variant, but for arrays you have to do something special.

If you forget an nvp() or a base_object(), your data isn't serialized
correctly, or the code won't compile.  The problems are easy to locate
as the problems appear early.  save_array() wouldn't be like that.
Things will serialize correctly but slowly, and then you have to go
digging.

Most importantly,

  template <class Archive>
  void serialize(Archive & ar, unsigned version)
  {
    ar & make_nvp("x", x);
    ar & make_nvp("y", y);
    ar & make_nvp("z", z);
    save_array(ar, make_nvp("some_array",some_array));
  }

is just ugly.  Sorry, but it is.  It's a big wart on an otherwise
extremely finely crafted interface.  (I think the operator&() is
elegant, for the record.)
...
...
A group of its own tests - just like we have tests for all other
combinations of serializations and archives - I can hear the
howling already.  We'll have to see what to do about this.
I'll volunteer (well, I already am) to help with testing.  I'll help
out with maintenance as well (I'm all gcc/linux/mac, no overlap with
your testing).  Whatever it takes to not have to save_array().  :)
I'll also provide tests that verify that these changes are backwards
compatible.
...
...
A separate documenation section in the documenation of the
serialization library.  Similar to the miscelleneas.  But
miscellaneas holds things that are really separate so we'll find a
good place for it.  Maybe a section titled something like "Special
Considerations When Serializing Collections" (but shorter).
I'll volunteer to help with docs, as well, though hopefully the
"special considerations for collections" would be focused on archive
authors.  I think this would be a useful exercise.  After all this has
gone through and I've delivered some kind of portable binary archive
to my client.

-t

Re: [boost] [serialization] fast array serialization (10x speedup)

troy d. straszheim