Re: [boost] [serialization docs] Ping?

22 Sep 2005


      Joaquín Mª López Muñoz <joaquin@tid.es> writes:

<snip long quote>

Please don't overquote.
...
...
...
2. "a==b" is a C++ expression, so implying that a and b are
objects living inside the same program. If I save an object a
on my PC, pass the file to you and you load it a year later as
b on your Linux box, what is "a==b" supposed to mean?
Exactly.
...
3. A serializable type can be implemented without observing
the "a==b" rule: for instance, a list-like container can
load the elements in reverse order --I understand this is
a perfectly legitimate implementation that shouldn't be banned
because of the "a==b" restriction.
I'm not sure it should be considered legit under any Archive concept
that will be defined by the library.  Is it a useful semantics?
Beware premature generalization!
In my serialization stuff for Boost.MultiIndex I actually have a
serializable type that does not conform to the equivalence rule. Its
layout kinda looks like:
template<typename Value>
struct node
{
  value v;
template<class Archive>
  void serialize(Archive& ar,const unsigned int)
  {
    // do nothing
  }
}
I use this weird construct to make node trackable, but no contents
information is dumped to the archive (that is taken care of somewhere
else in the program). In case you're curious, this arises in connection
with serialization of iterators.
I can't imagine why you'd need that; a hint would help me to
understand better.

Are you saying there's no sense in which a deserialized node<T> will
be equivalent to the one that has been serialized?  I realize they
have different "value" members, but sometimes those kinds of
differences disappear under the right concept of equivalence.  For
example, if Value is a pointer, we don't expect it to have the same
bits.
...
So, yes, there are actual uses of serialization not conforming to
the equivalence rule.
If so, that may kill off my argument.
...
I guess one can also figure out other possible scenarios breaking
the equivalence rule, like for instance a struct where some fields
are serialized whereas others are local.
Okay.
...
...
...
as they relay to user provided serialize() functions.
But that's not what Robert is saying; he's saying they don't have to
even do that!
IMHO an archive should guarantee that loading/saving an UDT executes
the associated load/save functions.
That makes sense.  I'm beginning to be convinced that you have it
right.
...
Failing to do would devoid the Archive concept of most useful
purposes.  A do-nothing archive (i.e the logging example) could be
covered by a more relaxed concept, if someone finds that useful.
Well, the do-nothing Saving Archive doesn't have to have a
corresponding Loading Archive.  It's the notion of correspondence that
we're concerned here, not necessarily an intrinsic property of Archives.
...
...
...
So, from my point of view, the real task of an input/output
archive pair is to ensure that, when a T::serialize function is
invoked on loading, the input context (i.e, permissible >> ops
on the input archive) is a replica of the output sequence.
This rule recursively descends to primitive (in the serialization
sense) types, where an equivalence rule can actually be provided.
My (skectchy) proposal is merely a formalization of this
idea.
That's an interesting rule.  So essentially you are saying that the
output archive needs to record enough structure to ensure that the
input archive can read the same sequence of types?
Yes.
...
What if the user serializes an aggregate struct X containing two ints?
Is the corresponding input archive required to be able to read two
ints as part of reading an X?
Not only that: X::save is actually *required* to load those two
                    ^^^^                           ^^^^
??
ints.
...
Consider
the following sample:
#include <boost/config.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
#include <sstream>
struct foo
{
  foo(int a=0,int b=0):a(a),b(b){}
int a,b;
BOOST_SERIALIZATION_SPLIT_MEMBER()
template<class Archive>
  void save(Archive& ar,const unsigned int)const
  {
    ar<<a;
    ar<<b;
  }
template<class Archive>
  void load(Archive& ar,const unsigned int)
  {
    ar>>a;
    // we do not load b!!
  }
};
int main()
{
  const foo x0(1,2),x1(3,4);
std::ostringstream oss;
  {
    boost::archive::text_oarchive oa(oss);
    oa<<x0;
    oa<<x1;
  }
foo y0,y1;
std::istringstream iss(oss.str());
  boost::archive::text_iarchive ia(iss);
  ia>>y0;
  ia>>y1;
std::cout<<"y0.a="<<y0.a<<std::endl;
  std::cout<<"y1.a="<<y1.a<<std::endl;
return 0;
}
Note that foo::save only loads the first int. The program outputs
y0.a=1
y1.a=2
which is incorrect (y1.a should be 3), so serialization of foo is not
correctly implemented. For  XML archive types my hunch is that the
program would throw.
Okay, so it would be sufficient to add

      int x;
      ar >> x;

to foo::load, right?  Otherwise it seems you're treading back into the
domain of equivalence.
...
Well, of course users of Boost.Serialization (specially if they do
not write any serialize function of their own but merely use
serialization capabilities of 3rd party types) expect this fuzzy
equivalence rule to be held. My point is that meeting that
expectation is up to each serializable type implementer, and
shouldn't be enforced by the concepts section.
That may make it hard to describe the semantics of generic code that
uses Serializable types with Archives.  But then, I guess people can
invent a stronger concept if necessary.
...
If Robert does not have the time/will to pursue a more formal approach,
I think the equivalence rule could be relaxed to something like:
T x, y;
  // arbitrary operations on x to set its state
  sar & x;
  lar & y;
Postconditions:
    *For primitive serializable types, y is equivalent to x.
    *For pointer types, bla bla
    *Other types are expected to implement serialization
    in such a manner that y is equivalent to x, but this is not
    guaranteed.
I really prefer your operational approach now.  I don't think it's
hard to describe in a reasonably formal way, and the loosened
equivalence you describe above really isn't worth very much.
...
* An input archive iar is compatible with an output archive oar if
  1. iar allows a sequence of >> ops matching the corresponding << ops
  made upon oar (matching defined in terms of types involved and
  nesting depth of the call.)
Is the nesting depth of the call really relevant?
...
2. For primitive serialization types, the restored copies are equivalent
  to their original (expand on this, specially with respect to pointers.)
* A type T is serializable if it is primitive serializable or else it defines
  the appropriate serialize (load/save) function such that the sequence
  of >> ops in load() match the << ops in save().
[This is not a requirement] For each serializable type, the implementor
can define "equivalence" in terms of its constituent types. For instance,
for std::vector:
Given a std::vector<T> out, where T is serializable, and a restored copy in,
then in(i).size()==out(i).size() and each in(i)[j] is a restored copy of
out(i)[j].
I don't think this latter part is worth much.  I think it might be
worth defining an EquivalentSerializable concept, though.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Re: [boost] [serialization docs] Ping?

David Abrahams