[boost] Re: [serialization] flush ?

6 Mar 2005

      troy d. straszheim wrote:
...
...
Here's a use case that has been discussed before, but to which I
couldn't seem to find any solid resolution the list archives:
class A;
some_oarchive oa;
void mutate(shared_ptr<A>& aptr_)
{
    aptr_ = shared_ptr<A>(new A);
};
shared_ptr<A> aptr;
for (int i=0; i<LARGE_NUMBER; i++)
{
    mutate(aptr);
    oa << aptr;
}
The problem is that your oarchive ends up with somewhere between two
and LARGE_NUMBER distinct A's in it.  Usually two.  Presumably the
"two" is because in this simple example only A's are getting
allocated
and deallocated, and therefore there are often free A-sized blocks
conveniently laying around for reuse.  aptr gets assigned A's at two
different alternating addresses.
For what its worth, I believe there is an warning in the document against
this kind of thng. In fact, the documentation says that this will invoke a
compile time error. I just checked - it doesn't produce a compile time error
as I expected.  I found the code that does this commented out - Now I don't
remember why I commented it out.

That is, the following are not recommended:

a) changing he state of data while its in the process of be saved.
b) serializating data of the stack.  This will break the tracking as
different objects will have the same address and be mis-identified as being
different.

If I had nothing else to do and could figure out how to do it, I would like
implement a warning so that if one used a << operator with a non-const
argument one would get a warning or maybe

I am in the process of adding two more flags to archives:

a) no_tracking - which will suppress tracking for all objects regardless of
the setting of their serialization traits.  My motivation for doing this was
to permit the usage of serialization for things such as debug and
transaction logs which would generate cases such as yours above.
b) no_object_creation - which will simple reload pointers rather than
re-create them.  My motivation is to permit serialization to be used to
implement the memento patter as described in GoF Patterns book.

I currently have these changes in my local code base.  And I've run all my
old tests and they still work.  I'm still struggliing with some small issues
regarding loading to stl collections with no_object_creation.  I'm also
struggling with some issues related to these flags being runtime rather than
compile time (i.e. template instantiation) options.

I'm missing writing tests, demos and tutorial , and documentation.   I'm not
sure, but I think these new facilities may address the use cases raised
here.
...
...
I looked through the archives a bunch and didn't come across anything
conclusive.  It seemed that some thought this kind of use case was
pathological, but I'm not sure why.
My view has been that changing the state of an object while it is in the
process of being serialized will inevitably lead to program that are not
provably/demonstrably correct.  The same goes for archive classes whose
behavior can be changed during the course of serialization.

Now by supporting the usage of serialization for logging - This concept will
be broken.  I'm still struggling with this.

a) the idea of serialization of mutable objects does have application on
logging type applications. its appealing to use if for this purpose.
b) It wll break the original concept and lead to cases where errors are
introduced which are almost impossible to track down without tracing into
the implementation of the serialization library itself.  This defeats the
whole purpose of having a library in the first place.
...
...
What I'd like to be able to do is to tell the archive, "The previous
calls to operator<<() represent a 'snapshot' of the state of some
group of objects, and now I want you to forget about existent objects
because I am going to rearrange them all.  Continue to track object
types, but forget about the addresses."  I realize that this creates
the possiblity for memory leaks, but if the serialization is done
through one toplevel call to operator<< on a shared_ptr whose pointee
contains pointers to a whole universe of home-cooked pointer
spaghetti, I don't see a better way to do this, and I don't see how
to clearly express what I intend via the export and tracking macro
mechanisms.
...
...
You can't close and reopen the archive in the top loop, you
get duplicate headers.
What about "no_header" ? and what  would be wrong with duplicate headers
anyway?  The stream is still open and could just as well contain multiple
archives.
...
...
The list archives mention the use case of
serializing the state of some memory pool that is very likely to get
reused: I think the little example above is probably the simplest
case of this.
I'm not sure what this means.
...
...
So without asking for a sanity-check, I implemented
basic_oarchive::flush(), and some tests.  The changes to
basic_oarchive
sanity is sometimes overrated.
...
...
and basic_oarchive_impl are very small.  basic_oarchive has an
internal object_set, which tracks object_ids and addresses.  I add a
num_flushed_objects member, and flush() clears out the set and adds
the
number of objects flushed to this counter.  New tracked objects are
assigned object_ids starting at num_flushed_objects +
  object_set.size(). In this way the class_id's are reused
post-flush, but object id's are
not.  The interface is simply this:
My comments above should make it clear I wouldn't be enthusiastic about this
approach.

Having said that,  I find it personally gratifying to find that some people
are so enamored with this library to spend this kind of effort.   Certain
people have taken the library in "experimental" directions and I have worked
in their results into the library.  Persons who have made significant
contributions are:

Pavel Vozenilek - borland compilers and documentation
Martin Ecker - DLL versions of Serialization and serialization of classes
implemented in DLLS (plug-ins)
troy d. straszheim(you) - serialization of variant.

At the same time I endeavor to keep it from breaking under the weight of its
own success. Its a fine line.  Key "fixed points" in my requirements were
and still are:

a) boost acceptance - i need this for my resume as I'm currently looking for
work.
b) support of all compilers on which tests are run.
c) idiot proof user interface and documentation. Ah - maybe i better say,
user interface and documentation such that one can use the library without
having to delve into its implementation.  Also its important to me that one
be able to use the library with a very short learning curve - say 1 hour to
get started.  Personally, I don't have much more patience than that.

Robert Ramey