[serialization docs] Ping?

This is in regard to the discussion of the "equivalence" of a serialized object and the deserialized counterpart. It also touches on the Serializable concept and some recent discussion of how classes without default constructors can be handled, and a few other things besides. (Sorry about the entanglement, but I'm not sure how to separate some of these issues.) (Note that I also haven't read *all* of the mail on this subject yet.) The Common Lisp committee (X3J13) needed to deal with essentially the same problem. First, a bit of background. Because CL supports programmatic construction of code which can then be passed to the compiler, and the code "syntax" supports a quoting form for referring to a literal constant, there was a need to address what it meant for various kinds of objects to appear in such a context. And the resulting objects may be referred to in code will be compiled to a file for later loading into the same or some completely different runtime environment. Earlier versions of the language simply listed all the cases for the "built-in" types, and a simple mechanism for the simple record type provided by the language (things defined with defstruct, if you care). But with the addition of OO concepts in various implementations that eventually led to CLOS (the CL Object System), it was realized that this wasn't sufficient. The term that was eventually adopted was "similar" or "similar as constants" where further disambiguation was needed. I think that would be a good term for the serialization library to adopt, as it avoids an implications related to operator== and the ambiguity around the word "equivalent". A protocol was designed to permit instances of arbitrary user-defined classes to be saved and loaded. (Sound familiar? Note that support for different data formats for that saving and restoring was only addressed to the extent that different CL implementations likely had different compiled file formats and were not required to be compatible; something like the idea of binary / text / XML / whatever archives was not addressed, and never even came up, so far as I can recall. A missed opportunity there.) The relevant part of the specification is Section 3.2.4, "Literal Objects in Compiled Files", which can be found at: http://www.lisp.org/HyperSpec/Body/sec_3-2-4.html and in the definition of make-load-form, found here: http://www.lisp.org/HyperSpec/Body/stagenfun_make-load-form.html#make-load-f... (Let me know if / where translation between CL terminology and C++ terminology would be helpful and I'll give it a shot.) The CL term "externalizable object" corresponds to the Serializable concept for the serialization library. Corresponding to the Save / Load Archive compatibility concept, CL says: "The \term{file compiler} must cooperate with the \term{loader} in order to assure that in each case where an \term{externalizable object} is processed as a \term{literal object}, the \term{loader} will construct a \term{similar} \term{object}." Substituting serialization library terminology into that quote: The saving archive must cooperate with the loading archive in order to assure that in each case where a serializable object is saved, the loading archive will construct a similar object. I think it should be reasonably straightforward to massage this into a statement about whether a saving archive and a loading archive are compatible. The CL protocol for loading involved a two step process. First, a constructor is called with some arguments. Then, optionally, an initialization form is called called, which contains references to the constructed object in order to perform additional modifications to it. I *think* this protocol is strictly more powerful than that presently specified by the serialization library. An example of a class that I don't know how to "serialize" is a "symbol" lazily constructed on named lookup; the save/load_construct_data mechanism is inadequate for this. (There are also object graphs containing reference cycles that might not be serializable but are CL externalizable, because the reference cycle can be broken by using the two stage protocol. I haven't looked to see whether the serialization library installs an object in the "pointer table" (whatever it is called) at allocation time or only after it has been initialized via deserialization.) Attempting to translate the CL protocol into C++ terminology, I think it would consist of first calling a static factory function associated with the type, passing it the archive as an argument, and then calling a member function on the object, again passing the archive as an argument.

Very interesting. First I'm surprised that anyone else was even looking at that thread after all this time. Its clear that there is a strong parallel here - maybe even a one-to-one correspondence. I've concluded that the concept of Semantic really isn't formal. Its a narrative description of what someone expects an expression to do. This will often have some unavoidable ambiguity. That doesn't make it useless its just that I don't think a semantic description can every be definitive. So I think I can just add a plausible narrative description that I think will satisfy everyone. In a year this has never come up. I suspect that this is because people who read the documentation already have an intuitive idea about what the expressions mean from reading the tutorial. Of course, there's no harm in trying to concisely explain them. But I think its naive think that they are going to be definitiive or all-encompassing. The C++ compiler can enforce correct usage of the concepts but it can't do the same for the semantics unless they are written so narrowly so as to be of little value. We don't want to stop someone from using a class modeling the Saving Archive Concept as a debug log even if our semantic narative says the valid expressions are meant to fullfill a different function. Once one recognises this inherent limitation, there's not much justification to try specify something like equivalence in anything more than some sort of appeal to common sense and intuition. Constant class members are a good example. In one application or class they might be considered as part of some sort of equivalence relation. In a different context they might not. I don't see how you eliminate all the ambiguity from the definition without excluding useful applications of the library. Sooner or later we get to the point where we just have to assume we have shared understanding on what words mean. The question is where does one stop. This is a subjective determination. Anyway ,your informaton was very informative in that its clear that someone else has been here before. Robert Ramey Kim Barrett wrote:
This is in regard to the discussion of the "equivalence" of a serialized object and the deserialized counterpart. It also touches on the Serializable concept and some recent discussion of how classes without default constructors can be handled, and a few other things besides. (Sorry about the entanglement, but I'm not sure how to separate some of these issues.) (Note that I also haven't read *all* of the mail on this subject yet.)
The Common Lisp committee (X3J13) needed to deal with essentially the same problem.
The term that was eventually adopted was "similar" or "similar as constants" where further disambiguation was needed. I think that would be a good term for the serialization library to adopt, as it avoids an implications related to operator== and the ambiguity around the word "equivalent".
My inclination is to use the word "equivalent" along with a qualifying sentence that it is of necesity ambiguous.
A protocol was designed to permit instances of arbitrary user-defined classes to be saved and loaded. (Sound familiar? Note that support for different data formats for that saving and restoring was only addressed to the extent that different CL implementations likely had different compiled file formats and were not required to be compatible; something like the idea of binary / text / XML / whatever archives was not addressed, and never even came up, so far as I can recall. A missed opportunity there.)
"The \term{file compiler} must cooperate with the \term{loader} in order to assure that in each case where an \term{externalizable object} is processed as a \term{literal object}, the \term{loader} will construct a \term{similar} \term{object}."
Substituting serialization library terminology into that quote:
The saving archive must cooperate with the loading archive in order to assure that in each case where a serializable object is saved, the loading archive will construct a similar object.
I think it should be reasonably straightforward to massage this into a statement about whether a saving archive and a loading archive are compatible.
I can buy this. I think language similar to that - with that level of informality - will work well here.
The CL protocol for loading involved a two step process. First, a constructor is called with some arguments. Then, optionally, an initialization form is called called, which contains references to the constructed object in order to perform additional modifications to it.
I *think* this protocol is strictly more powerful than that presently specified by the serialization library. An example of a class that I don't know how to "serialize" is a "symbol" lazily constructed on named lookup; the save/load_construct_data mechanism is inadequate for this.
As far as I can till this is identical to what the serialization library does when it loads a pointer. At least to the extent that C++ and CLOS are similar. (There are also object graphs containing reference cycles that
might not be serializable but are CL externalizable, because the reference cycle can be broken by using the two stage protocol. I haven't looked to see whether the serialization library installs an object in the "pointer table" (whatever it is called) at allocation time or only after it has been initialized via deserialization.)
the serialzation library handles cycles of pointers with no special efforts required. I suspect that, allowing for differences in the languages themselves, the systems are functionally identical.
Attempting to translate the CL protocol into C++ terminology, I think it would consist of first calling a static factory function associated with the type,
This is part of the implementation as pointer_iserializer<T, Archive>::load_object_ptr
passing it the archive as an argument, and then calling a member function on the object, again passing the archive as an argument.
which is template<class Archive> serialize(Archive &ar, T & t, const int version);

Robert Ramey wrote:
Very interesting.
First I'm surprised that anyone else was even looking at that thread after all this time.
Its clear that there is a strong parallel here - maybe even a one-to-one correspondence.
I've concluded that the concept of Semantic really isn't formal. Its a narrative description of what someone expects an expression to do.
Right. Coming from mathematical logic it's clear to me that usual concept definitions aren't really formal. I'd call them 'semi-formal'. If you wanted to write a truly formal specification, you'd first have to describe an abstract machine to represent C++ programs and their execution environments, because the C++ standard isn't really formal, either. -- Jonathan Turkanis www.kangaroologic.com

Jonathan Turkanis wrote:
Robert Ramey wrote:
Very interesting.
First I'm surprised that anyone else was even looking at that thread after all this time.
Its clear that there is a strong parallel here - maybe even a one-to-one correspondence.
I've concluded that the concept of Semantic really isn't formal. Its a narrative description of what someone expects an expression to do.
Right. Coming from mathematical logic it's clear to me that usual concept definitions aren't really formal. I'd call them 'semi-formal'. If you wanted to write a truly formal specification, you'd first have to describe an abstract machine to represent C++ programs and their execution environments, because the C++ standard isn't really formal, either.
-- Jonathan Turkanis www.kangaroologic.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Interesting: an 'abstract machine' representing the compiler is almost exactly the informal description used by version of the standard I saw (in the "as if" rule) for itself: basicly the standard is a human-readable (barely) compiler. The problem I have with the standard is that it's bloody hard to read, even when it doesn't actually increase accuracy. (Without a copy with me, I don't have a reference, sorry. It doesn't help that I would have to get it delivered from overseas to get it :-)

Simon Buchan wrote:
Jonathan Turkanis wrote:
Robert Ramey wrote:
I've concluded that the concept of Semantic really isn't formal. Its a narrative description of what someone expects an expression to do.
Right. Coming from mathematical logic it's clear to me that usual concept definitions aren't really formal. I'd call them 'semi-formal'. If you wanted to write a truly formal specification, you'd first have to describe an abstract machine to represent C++ programs and their execution environments, because the C++ standard isn't really formal, either.
Interesting: an 'abstract machine' representing the compiler is almost exactly the informal description used by version of the standard I saw
But it's not defined precisely enoung to be called 'formal' by my standards. Compare it with the definition of abstract state machines (http://www.eecs.umich.edu/gasm/), for example. Note that I'm not criticising the standard (although it certainly has some problems with lack of precision). It would be nice to have a truly formal specification, but in the case of C++ it's probably not realistic. -- Jonathan Turkanis www.kangaroologic.com

Jonathan Turkanis wrote:
Simon Buchan wrote: <snip>
Interesting: an 'abstract machine' representing the compiler is almost exactly the informal description used by version of the standard I saw
But it's not defined precisely enoung to be called 'formal' by my standards. Compare it with the definition of abstract state machines (http://www.eecs.umich.edu/gasm/), for example.
True.
Note that I'm not criticising the standard (although it certainly has some problems with lack of precision). It would be nice to have a truly formal specification, but in the case of C++ it's probably not realistic.
I do think it's possible (If it wasn't, we wouldn't be able to write compilers for it!), but remember the standard has basicly grown out of rewordings from the days of C (which did likewise back to B, etc...) I think a formal, but human-readable, grammar, kind of like EBNF for semantics, would be useful here. (From what I understand of ASM's, they are rather scary to read)

Simon Buchan wrote:
Jonathan Turkanis wrote:
Note that I'm not criticising the standard (although it certainly has some problems with lack of precision). It would be nice to have a truly formal specification, but in the case of C++ it's probably not realistic.
I do think it's possible
I said 'realistic'
(If it wasn't, we wouldn't be able to write compilers for it!),
That's like saying: Of course it must be possible to provide a formal semantics for English -- otherwise I'd never be able to understand the instructiosn that come with my coffee maker :-)
but remember the standard has basicly grown out of rewordings from the days of C (which did likewise back to B, etc...) I think a formal, but human-readable, grammar, kind of like EBNF for semantics, would be useful here.
It would be very useful (except I'm not sure what you mean by "kind of like EBNF for semantics") -- Jonathan Turkanis www.kangaroologic.com

Jonathan Turkanis wrote:
Simon Buchan wrote:
Jonathan Turkanis wrote:
Note that I'm not criticising the standard (although it certainly has some problems with lack of precision). It would be nice to have a truly formal specification, but in the case of C++ it's probably not realistic.
I do think it's possible
I said 'realistic'
I should have /emphasised/ it, sorry.
(If it wasn't, we wouldn't be able to write compilers for it!),
That's like saying: Of course it must be possible to provide a formal semantics for English -- otherwise I'd never be able to understand the instructiosn that come with my coffee maker :-)
Can you? If so, English (or more precisely, the subset used in the instructions) can be formally defined (in fact, you can find programs that do). The problem with natural languages is that they have a symbol table with trillions of entries, and that table is dynamicly generated based on the situation the compiler (ie, you) is in.
but remember the standard has basicly grown out of rewordings from the days of C (which did likewise back to B, etc...) I think a formal, but human-readable, grammar, kind of like EBNF for semantics, would be useful here.
It would be very useful (except I'm not sure what you mean by "kind of like EBNF for semantics")
Think algebra for syntax. That's EBNF. I don't know of any generalised definition of semantics (Turing machines don't count, 'cause they arn't human-readable (even the definition of Turing machines arn't!), and Lambda Calculus isn't general enough.) that would be useful, but feel free to correct me.

Simon Buchan wrote:
Jonathan Turkanis wrote:
Simon Buchan wrote:
Jonathan Turkanis wrote:
... It would be nice to have a truly formal specification, but in the case of C++ it's probably not realistic.
I do think it's possible
(If it wasn't, we wouldn't be able to write compilers for it!),
That's like saying: Of course it must be possible to provide a formal semantics for English -- otherwise I'd never be able to understand the instructiosn that come with my coffee maker :-)
Can you?
Okay, I was exaggerating. I should have picked a simpler example.
If so, English (or more precisely, the subset used in the instructions) can be formally defined (in fact, you can find programs that do).
I think it's possible to come pretty close, but nobody knows how to do it yet.
The problem with natural languages is that they have a symbol table with trillions of entries, and that table is dynamicly generated based on the situation the compiler (ie, you) is in.
What are the trillions of entries?
but remember the standard has basicly grown out of rewordings from the days of C (which did likewise back to B, etc...) I think a formal, but human-readable, grammar, kind of like EBNF for semantics, would be useful here.
It would be very useful (except I'm not sure what you mean by "kind of like EBNF for semantics")
Think algebra for syntax. That's EBNF. I don't know of any generalised definition of semantics
I don't think there is one.
(Turing machines don't count, 'cause they arn't human-readable (even the definition of Turing machines arn't!), and Lambda Calculus isn't general enough.) that would be useful, but feel free to correct me.
I'd settle for unreadable semantics. -- Jonathan Turkanis www.kangaroologic.com

For those of you who enjoy reading the tax code, I've uploaded into RC_1_33_0 my latest attempt resolve this issue. Its in boost/libs/seiralization/doc/ I believe this will be found to be at least a great improvement. I sure hope so. I've strived to full the the requirements for formal documentation as I understand them. BTW - is there any formal specification of what constitutes formal documentation? The discussion sounded like there was but a cursory web search didn't turn up anything. Also, careful examination of different documentation suggests, that although they have lots of similarities - there are enough differences to make me believe that there really is no universally accepted definition for this. Robert Ramey
participants (4)
-
Jonathan Turkanis
-
Kim Barrett
-
Robert Ramey
-
Simon Buchan