[boost] Re: serialization archive types: random_iarchive, pretty_oarchive

2 Mar 2005

      Robert Ramey wrote:
...
I love hearing this - I always wanted to be associated with particle
physics. LOL
Heh heh...  *sigh*.  Don't... don't get me started.  :)
...
Of course you know that is straight forward - and you have he xml archives
that can be used as examples.  If you just want to display the information
and not load it, a bunch of tags like object id, etc. can be suppressed.
Personally I would just just the xml_archive and concentrate my efforts on a
program that displays XML in a convenient and perhaps customizable way.  I
suspect you could find or make a suitable program of that nature for free or
for low cost.  To re-iterate, I would factor the "pretty display" from the
serialization and make it customizable according to the kind of display
required.
In fact, if I had nothing else to do, and had that much interest, I would
make an enhanced version of xml_archive would output TWO files, a) the
xml_archive and b) an xml_schema which could be used by other programs to
parse the xml_archive.  Just random thoughts.
Sure, that much I've got: suppressing the object id tags, overriding 
save() methods for pointers so that they get "skipped", all that, I 
guess I'm talking more about factoring out the markup within the 
serialization library itself.  It would be easy to just copy/paste the 
entire xml_ thing, rename the classes and change the tags and so forth, 
but of course this would be dispicable nastiness.  Better to do 
something like an nvp_archive that referred to some kind of formatting 
policy class, with xml_archive as nvp_archive<xml_formatting_policy>... 
  or something like that.  This could also get you the ability to do 
SpitAndDuctTapeNVPML, or, say, some kind of binary_nvp_archive, 
basically the same as XML but without all the ascii bloat.  We would 
certainly find this handy, as we never know if we might be forced to 
convert to XML at some point, but there's just so much data that we 
can't afford the ascii bloat in our storage.  But of course you can just 
zip the xml stuff, and a binary_nvp_archive is a lot more work than just 
factoring tags and indentation out of xml_archive...  OK, I've gone off 
on a tangent.  Never mind.   And your points about where to focus the 
effort are well taken.

Anyway, the purpose isn't visualization after the program has run, it is 
more like

    pretty(log_stream) << my_particle;

in the code itself.  I'm catering to printf-style debugging.  This is 
what the Beakers (jargon for "Physicists"... think Dr. Benson Honeydew 
and his assistant...)  like to do, and since I'm ripping out the old 
serialization method (from the root analysis toolkit, which involves 
running your headers through a quasi-compiler which generates 
serialization functions, and which pukes the moment it sees anything in 
namespace boost...)  and since the beakers will react violently to this 
at first,  it would be good to toss them a bone as well, like "you get 
ToStream(ostream&) for free".  Now that I see memoization_archive, I see 
I can give them something else, too....

But to focus on reformatting as you suggest, I could concievably do it 
with some kind of xml-reformatting stream.   Make a convenience function 
that wraps the insertion into xml_oarchive(stringstream) and the pass 
through the tag-removing reformatter.   Would be a good opportunity to 
play with iostreams.  Yeah, sounds good.  OK.
...
I'm not sure I'm convinced of this.
I recommend the following when you make a new archive
a) run the code module for the new archive through Gimple LINT and fixup the
obvious oversights.
b) make a file similar to text_archive.hpp in the test directory for your
new archive - new_archive.hpp
c) modify the Jamfile in the serialization test directory to include your
new archive archive
d) invoke the batch/script file run_archive_test <compiler>
<new_archive.hpp>
This will run all the serialization tests against your new archive.  It
takes a while - but its worth it.
Sure.  I did this with variant, it works great.
...
I recommend the following when you make a new serializable class.
a) run the code module for the new serializable class through Gimple LINT
and fixup the obvious oversights.
b) using the other tests as a basis, make a new test for your new
serializable class.
c) in the course of this you may have to make additions to your new class
such as operator= or you might not.  Perhaps, adding a global
operator=(const T lhs &, const T &rhs) might be added just to the test.
d) add test for your new class to the Jamfile in serialization/test
e) invoke batch/shell script runtest <compiler> to generate a table of all
tests including your new one.  These tests will run your new class against
all currently defined archives.  This is important as some archives are not
sensitive to some errors.  For example, tagged XML can recover from some
errors whereas the more efficient native binary cannot.
I've already learned from experience to have the testsuites run on all 
archive types automatically, if for no other reason than to catch places 
where you've forgotten to use make_nvp().  I'm with you.

The random_iarchive is intended as a tool to be used in this process: 
for instance, I won't sleep well until I have seen a terabyte's worth of 
events get serialized in one run....  The tests have to be *big*, 
stressful, lots of data.
...
Even if you only use just one particular compiler for the application you
ship, I would recommend building and running all tests on at least two
pretty good different compilers.  For example, gcc 3.4? and VC 7.1 is a good
combination.  This will often uncover subtle ambiguities that would
otherwise linger on for years inflicting programmer pain.
I have to say the one single most important thing I've learned from boost is
that its cheaper to maintain the test suite and build for several compilers
than it is to debug the application.  bjam (which DOES drive me crazy) is a
godsend for doing this kind of thing.
Sure, you don't have to convince me of this.  There's nothing more 
beautiful than a rigorous set of test suites.  I'm a crusty UNIX guy 
with abysmal debugger skills, I'm dependent on them.

We have a similar testing infrastructure that I've thrown together... 
We're a "make" shop...  I wasn't sold on bjam.  And running classes 
through all archive types, automatically, is obviously the only way to 
do it:  I put together a few macros to accomplish this in code rather 
than in a bunch of build-system mechanics.  One macro creates tests for 
one class through all archives.  Not sure if they would integrate with 
Boost.Test so easily, though, and Boost.Test is surely more robust in 
various ways in case of failure.  I can post 'em if you're curious.
...
...
Another problem is that if a class contains vector<shared_ptr<Base> >,
you'd like to be able to populate this with  shared_ptr<Derived>,
where Derived is randomly selected from the set of classes that
inherit from
Base.  Since serialization requires these classes to be registered, it
seemed to me there might be a way to do this.  But maybe its all
overkill.
If you don't find the above sufficient, then its not overkill.  As I said
the pain of writing the test is nothing compared to shipping a product with
a bug.
I was wondering how to accomplish it.  I am in, say,

template <typename T> void 
random_iarchive::load_override(vector<shared_ptr<T> >), with T = Base.

My random_iarchive has had Base and several types Derived registered 
with it already.  Because I know what Base is (from T),  I can easily 
populate the vector with shared_ptr<Base>, but in order to populate it 
with Base and a variety of classes Derived, I have to somehow ask the 
archive what possibilites are registered and choose one...  Forgive me 
if I'm way off base.  The whole business of type registration in the 
archives is still pretty opaque to me, and my gut says that this is 
either impossible or overkill.
...
...
Anyhow, this random_iarchive exists (except for the Base/Derived
thing, above), maybe it would make a good tutorial case for custom
serialization archives, maybe people want to use it for something.
I'd  be more than glad to write up some tutorial material, I'm sure I'd
get a  lot out of it.
As I said, I'm not convinced that the random test data should be part of the
archive class.  But I'm certainly pleased that someone finds the
serialization suffiiciently useful and interesting to do stuff like this.
I have also created a root_oarchive which creates root "trees", in case 
anybody is working with the ROOT analysis toolkit.  The way one does 
this "normally" is a real nightmare, and being able to wrap all that in 
operator<< is a huge, huge win for cleanliness and maintainability. 
Testament to the flexibility of the serialization library.  One big 
thing here is that the serialization library allows you to "flatten" 
nested structures into tuples by keeping track of the nvp paths in a 
deque inside the oarchive.  Kind of like xml output, but without 
start/end tags, and where each nvp has all of its parents prepended to 
it separated by some path separator character.   One could concievably 
create an iarchive for these things as well, I haven't bothered.
...
So if you want to polish this up and add it to the Files section on source
forge I think it would be great.
So the attempt is to factor out the business of populating classes with 
random test data into an iarchive class, in an effort to thoroughly test 
  the "real" archive classes, and so that as a user with a bunch of 
serializable classes, I can fill them up with random stuff and serialize 
them through all the various archive types them until my CPU smokes, 
without writing fill_with_random_data() routines by hand for every one 
of them.

Actually, now that you mention the memoization_archive, it would 
actually be ideal if there were an archive that could do a deep 
*comparison*, thus eliminating the need to write all those 
operator==()s.  I had thought about this and deemed it impossible, but 
if you're talking about deep copy....  Then you've got a real 
full-of-data workout canned in a function for an arbitrary serializable 
user class:

(for each A in xml, text, binary)
MyHugeClass src, dst;
random_iarchive >> src; // src now swollen with data
A_oarchive oa(somewhere) << src;
A_iarchive ia(somewhere) >> dst;
comparison_archive ca(src) << dst;  // or however that looks

 From your serialization(archive) method, you get xml/txt/binary i/o, 
comparison and copy.
...
e) memoization_archive - an archive adaptor which does a deep copy using the
serialize templates.  This also requires some extra help from
extended_type_info.
This is big to us.  I'll contact you...

Thanks again,

-t

[boost] Re: serialization archive types: random_iarchive, pretty_oarchive

troy d. straszheim