Re: [boost] [serialization] fast array serialization (10x speedup)

15 Oct 2005

      troy d. straszheim wrote:
...
struct Something
{
 double x, y, z;
 doesnt_matter_what_it_is_t d;
 template <class Archive>
 void serialize(Archive & ar, unsigned version)
 {
   ar & x;
   ar & y;
   ar & z;
   ar & d;
 }
};
You think Wow.  That's cool.  It's so clean.  And you can pass
*anything* to the archive?  It tracks the pointers and everything?
Wow.  When you later get on to nvp() stuff, the base_object<> and
export macros, you react "ah, well, it can't all be magic.  It's still
supercool", and despite these base_object<>-type caveats you can still
teach a monkey to put serialization routines into his classes (for me
this is essential).  To the monkey it makes sense that you have to
explain things like your base classes to the serialization library.
It won't make sense that you can pass the archive an int, map, or a
pointer to a variant, but for arrays you have to do something special.
If you forget an nvp() or a base_object(), your data isn't serialized
correctly, or the code won't compile.  The problems are easy to locate
as the problems appear early.  save_array() wouldn't be like that.
Things will serialize correctly but slowly, and then you have to go
digging.
Most importantly,
template <class Archive>
 void serialize(Archive & ar, unsigned version)
 {
   ar & make_nvp("x", x);
   ar & make_nvp("y", y);
   ar & make_nvp("z", z);
   save_array(ar, make_nvp("some_array",some_array));
 }
is just ugly.  Sorry, but it is.  It's a big wart on an otherwise
extremely finely crafted interface.  (I think the operator&() is
elegant, for the record.)
This is a very convincing argument.  That is - I'm convinced.  I very much 
liked the monkey analogy.  Not to say programmers are monkeys.  But 
serialization is something I'm using so I can get on with the true topic at 
hand so its important to me that it "just works" without using up my 
precious brain stack space.

Now take a look at my first idea - a fast archive adaptor which would 
overload serialization of stl vector and c array.  Ideally application of 
the wrapper to inappropriate adaptees would result in a compile time 
assertion so as to preserve the monkey proof aspect of the libray.  Damn, 
now I've forgotten what the objections were to it.  I'll have to go back and 
check.
...
...
...
A group of its own tests - just like we have tests for all other
combinations of serializations and archives - I can hear the
howling already.  We'll have to see what to do about this.
I'll volunteer (well, I already am) to help with testing.  I'll help
out with maintenance as well (I'm all gcc/linux/mac, no overlap with
your testing).
I'll also provide tests that verify that these changes are backwards
compatible.
We will get to that.  I'm interested in incorporating your improved testing. 
But I do have one concern.  I test with windows platforms including borland 
and msvc.  These can be quite different than just testing with gcc and can 
suck up a lot of time.  It may not be a big issue here, but it means you'll 
have to be aware not to do anything toooo tricky.

Since you're interested in this I would suggest making a few new directories 
in your personal boost/libs/serialization tree.  I see each of these 
directories having its own Jamfile so we could just invoke runtest from any 
of the test suites just by locating to the desired directory.

a) old_test - change the current test directory to this
b) test - the current test with your changes to use the unit_test library. 
You might send me source to one of your changed test to see if I want to 
comment on it before too much effort is invested.
c) test_compatibility.  Included your back compatibility tests
d) test_performance - I want to include a few tests to test times for thinks 
like time to serialize different primitives, opening/closing archives, etc. 
This would be similar to the current setup so I could sort of generate a 
table which shows which combinations of features and archives are 
bottlenecks.  Its the hope that this would help detect really dumb 
oversights like recreating an xml character translation table for each xml 
character serialized !
...
...
...
A separate documenation section in the documenation of the
serialization library.  Similar to the miscelleneas.  But
miscellaneas holds things that are really separate so we'll find a
good place for it.  Maybe a section titled something like "Special
Considerations When Serializing Collections" (but shorter).
...
I'll volunteer to help with docs, as well, though hopefully the
"special considerations for collections" would be focused on archive
authors.  I think this would be a useful exercise.  After all this has
gone through and I've delivered some kind of portable binary archive
to my client.
Its a tiny bit premature - Archive Implementation needs at least another 
pass.  But I would envisage either one or two new sections

a) Archive adaptors.  This is a class that can be applied to any existing 
archive in order to modify some aspects of its behavior by hiding the base 
class functions with an overloaded implementation.  Refers to fast array 
archive as an example.

b) Fast array archve adaptor - description of how to use it.

Just my thoughts

Robert Ramey