[boost] [serialization docs] Ping?

23 Sep 2005

      This is in regard to the discussion of the "equivalence" of a
serialized object and the deserialized counterpart. It also touches on
the Serializable concept and some recent discussion of how classes
without default constructors can be handled, and a few other things
besides. (Sorry about the entanglement, but I'm not sure how to
separate some of these issues.) (Note that I also haven't read *all*
of the mail on this subject yet.)

The Common Lisp committee (X3J13) needed to deal with essentially the
same problem.

First, a bit of background. Because CL supports programmatic
construction of code which can then be passed to the compiler, and the
code "syntax" supports a quoting form for referring to a literal
constant, there was a need to address what it meant for various kinds
of objects to appear in such a context. And the resulting objects may
be referred to in code will be compiled to a file for later loading
into the same or some completely different runtime environment.
Earlier versions of the language simply listed all the cases for the
"built-in" types, and a simple mechanism for the simple record type
provided by the language (things defined with defstruct, if you care).
But with the addition of OO concepts in various implementations that
eventually led to CLOS (the CL Object System), it was realized that
this wasn't sufficient.

The term that was eventually adopted was "similar" or "similar as
constants" where further disambiguation was needed. I think that would
be a good term for the serialization library to adopt, as it avoids an
implications related to operator== and the ambiguity around the word
"equivalent".

A protocol was designed to permit instances of arbitrary user-defined
classes to be saved and loaded. (Sound familiar? Note that support for
different data formats for that saving and restoring was only
addressed to the extent that different CL implementations likely had
different compiled file formats and were not required to be
compatible; something like the idea of binary / text / XML / whatever
archives was not addressed, and never even came up, so far as I can
recall. A missed opportunity there.)

The relevant part of the specification is Section 3.2.4, "Literal
Objects in Compiled Files", which can be found at:

   http://www.lisp.org/HyperSpec/Body/sec_3-2-4.html

and in the definition of make-load-form, found here:

http://www.lisp.org/HyperSpec/Body/stagenfun_make-load-form.html#make-load-f...

(Let me know if / where translation between CL terminology and C++
terminology would be helpful and I'll give it a shot.)

The CL term "externalizable object" corresponds to the Serializable
concept for the serialization library. Corresponding to the Save /
Load Archive compatibility concept, CL says:

   "The \term{file compiler} must cooperate with the \term{loader} in
   order to assure that in each case where an \term{externalizable
   object} is processed as a \term{literal object}, the \term{loader}
   will construct a \term{similar} \term{object}."

Substituting serialization library terminology into that quote:

   The saving archive must cooperate with the loading archive in order
   to assure that in each case where a serializable object is saved,
   the loading archive will construct a similar object.

I think it should be reasonably straightforward to massage this into a
statement about whether a saving archive and a loading archive are
compatible.

The CL protocol for loading involved a two step process. First, a
constructor is called with some arguments. Then, optionally, an
initialization form is called called, which contains references to the
constructed object in order to perform additional modifications to it.

I *think* this protocol is strictly more powerful than that presently
specified by the serialization library. An example of a class that I
don't know how to "serialize" is a "symbol" lazily constructed on
named lookup; the save/load_construct_data mechanism is inadequate for
this. (There are also object graphs containing reference cycles that
might not be serializable but are CL externalizable, because the
reference cycle can be broken by using the two stage protocol. I
haven't looked to see whether the serialization library installs an
object in the "pointer table" (whatever it is called) at allocation
time or only after it has been initialized via deserialization.)

Attempting to translate the CL protocol into C++ terminology, I
think it would consist of first calling a static factory function
associated with the type, passing it the archive as an argument, and
then calling a member function on the object, again passing the
archive as an argument.

[boost] [serialization docs] Ping?

Kim Barrett