Re: [boost] [1.44] Beta progress?

24 Jul 2010

      Matthias Troyer wrote:
...
On 22 Jul 2010, at 18:42, Robert Ramey wrote:
...
Matthias Troyer wrote:
...
On 22 Jul 2010, at 13:39, Robert Ramey wrote:
...
...
I totally disagree with your statements that we depend on internal
details of Boost.Serialization. Boost.Serialization does publish
an (incomplete!) archive concept and you did intend that others
can extend it with new archive classes.
The published archive concept specifes the concepts that must
be fullfilled by any serializable types.  The serialization
library includes examples of archives which depend upon only
on the documented concepts.
The documentated archive concepts don't prevent other archive
classes from including more functionality.  Indeed, facilities such
as serialization of pointers through a base class, etc demand it.
And it's true that I haven't discouraged leveraging on these
"extended archives"
I'm talking about the requirements on a minimal archive. Those are
not fully documented.
Hmmm that would be news to me.  I compiled trivial archive from
the documentation with serialization code.  And trivial archive
is model of minimal archive.  That is I believe that serializing
any types to a trivial archive will compile without error.
If there is a serializable type which trivial archive fails
to work with, I would be interested in hearing about it.
You misunderstand. The concepts should specify:
1) what primitive types the archive needs to be able to serialize
2) what concepts these primitive types satisfy that can be used.
Do you want to say that the trivial archive should be used to deduce
that?
class trivial_oarchive {
public:
   //////////////////////////////////////////////////////////
   // public interface used by programs that use the
   // serialization library
   typedef boost::mpl::bool_<true> is_saving;
   typedef boost::mpl::bool_<false> is_loading;
   template<class T> register_type(){}
   template<class T> trivial_oarchive & operator<<(const T & t){
       return *this;
   }
   template<class T> trivial_oarchive & operator&(const T & t){
       return *this << t;
   }
   void save_binary(void *address, std::size_t count){};
};
This class satisfies the following:
1) it can deal with ANY type
2) it uses NO property of the types
correct.  And it fullfills the requirements of the archive concept.
...
Clearly, those are not the requirements for all archives.
The archive concept doesn't prohibit any model of that
concept to add other facilities of it's own choosing.  The
guarentee is that any archive which models the concept
will compile

ar << t

for any serializable type t.
...
If we next look at the text archives we'll find that
1) it can deal with any primitive type that can be streamed to in
i/ostream.
I believe that this is the same as saying that a text archive
models the archive concept as stated in the documentation.

2) it uses the streaming operator of those types

True - this isn't required by the archive concept.  And in fact
it's not true for all archives.  For example, the native binary_archive
doesn't use the streamin operators at all.  It makes all calls to
the underlying filebuf.  In fact, a binary_archive can be constructed
with either a stream (from which it just gets the filebuf) or from a
filebuf or stringbuf directly.
...
Are those the requirements? Again not. What we need is an explicit
list of all primitive types that an archive is required to support,
and a list of the properties of those types that a user can rely on.
if it can't support any serializable type it's not an archve.
all primitive types are serializable types by definition.
...
For example, all the primnitive types had a default constructor
all prmitive types may have a default constructor - but the archive
concept doesn't require that.  There might be some ambiguity here.
For purposes of serializability - a prmitive type is one that either
a) is a C++ primitive type
or
b) marked "primitive" via a serialization trait.

the "serializable" concept doesn't require that a serializable type
have a default constructor - and in fact many do not.
...
and I  used it
You presumed that a type marked "primitive" must have a
default constructor.  I realize that all C++ types have default
constructors - but there is no guarentee that a type
marked as serializable via "primitive" have a default constructor.
...
but now you say that was an implementation detail that I
should not have used.
agreed, I think you made an error here.
...
How can I as an archive implementor know what I
may use without risking breakages?
Of course this is the real question.

Strictly speaking the only way would to not leverage on the
archives already written - as they add a lot of capability
beyond that which is required by the concept.

Of course, that would be a lot more work which we want to
avoid.   You wanted to leverage on a huge part of the
archive implemenation which is beyond the strict archive
concept.  All the archvies class do this in different ways.
xml, text and binary have bridged an incredible breadth
of utility and functionality with very little breakage over
the years.  Honestly, I can't guarentee that this won't
happen from time to time.

If it makes you feel any better, in order to "fix" this
issue with version_type I had to make a few minor
changes in xml and text archives as well.  I deally it
would seem that it would be possible that they might
be totally independent - in practice, sharing code
through base classes makes them "a little bit"
interdependent.  But it cuts down the work by a
large amount.  So that's the trade off we've gotten.
Huge reduction on effort, very wide applicability,
very minor interface breakage, backward compatibiliy
over 8 years (so far).

****
So now I've answered your specific questions - here are
a couple more random observations.

The archive concept is very "thin".  It only specifies
the calling interface - really nothing more.  It doesn't include
versioning, it doesn't say anything about pointers, it doesn't
say anything about tracking, etc, etc.  We know that these
issues are essential to a useful serialization library.  How is
it that it doesn't address this?  (I'm suspecting that this might
be your question).  If we think about this it turns out there
are different ways an archive might be implemented.  Take
pointers.  Someone might make an archve which would
just copy the raw pointer.  He would say - hey I'm just
using for in memory copies and I don't want "deep copies".
The exact same line of thought comes up with regards
to tracking (not good for sending data over a line),  and
all the other aspects of what we now call the serialization
library.

So I concluded that the archive concept shouldn't specify
what the archive does - only it's interface.  That's why
it's as it is to day. When the serialization documentation
was being disputed - it became clear that there was
no agreement what seriaization should mean - by
limiting the concept to the interface - it let me get past
this dispute and move on.  do you see a pattern here?
It's facinating to me that this is how boost helps software
quality.  But failing to agree, it became clear, the the
only way to move on was to leave any "features" unspecified
which turned out to be the correct decision.

Besides shortening the effort considerably it has had huge
practical benefits.  First it has permited the extension
to new kinds of archives which were totally unanticipated.
For an interesting example, look at the simple logging
archive in the documentation.  The archive is output only
and can dump to any stream output any serializable type
in a formatted way.  It's header only.  It implements the
archive concept but doesn't rely on the base class
implementations of the other archives so it's very light
weight.  I see this is being very interesting for logging
and debugging.  (of course public reaction has been
underwhelming but that's not my point here).

There are other things that one could make archives for:
a) a template "deep copy"
b) an editing archive for gui editing
c) a diff archve where two archives are compared and their diffence is 
produced.
d) an inverse of the above.
e) c + d above would lead to a whole "archive algebra" for rolling back and 
forward archives.

All of the above would be permited by the archvie concept
and would work with all serializable types - with changing
any current code!

I hope that clarifies the reasonnig for why things are the
way they are.

Now, taking a look at  the mpi usage of serializaiton.

I realy haven't looked at it enough to really understand it
so I maybe wrong about this - these are only casual
observations.

a) it seems that the "skeleton" idea seems to depend on the
idea that the size of the data stored in the bniary archive be the same
as the size of the underlying data type.  Up until now that
has been true even though there was never any explicity
guarantee to that effect.  I had to change the behavior in
order to extract myself from some other fiasco and this
"feature" was no longer true.  I think this is where the
problem started.  It's no one's fault.

b) the MPI file sends the class versions over the wire.
It doesn't need to do this.  If you look at some of the
archives there is class_optional_id which is trapped
by the archvie classes and suppressed both on input
and output because that particular archive class
doesn't need it. But it's there if someone want's
to hook it (like an editing archive). I think MPI
might want to do the same thing with version_type.

c) I'm not sure how MPI uses portable binary archive
(if at all).  Seems like that might be interesting.

d) what is really needed to send data "over the
wire" is to be able to supress tracking at the
archive level.  The would permit the same data
to be sent over and over and wouldn't presume
that the data constant. So you wouldn't have to create
a new archive for each transaction.  I've puzzled about how
to do this without breaking the archive concept. Turns
out it's a little tricky.  And there doesn't seem to be
much demand for it - but maybe there would be if
I did it.

e) this bit of code is what created the the issue
with the Sun compiler.
...
The problem comes from this line in boost/mpi/datatype_fwd.hpp:
template<typename T> MPI_Datatype get_mpi_datatype(const T& x = T());
Frankly, it's just plain wrong and should be fixed.  You might
say that you know it's wrong but it works around this
or that template or compiler quirk and it's too hard to fix.
I could accept that.  But if it's fixable, it should be fixed.

I did make the constructors of version_type public.  I had
made them private to trap errors in code where they
were constructed but not initialized.  Now error like this
arn't trapped.  So I think you should fix this.

f) I believe that MPI uses binary_archive_base? as a basis.
you could have used a higher level class as a basis.  I don't
know that that woudl have made things easier or harder
but it's worth looking into.  The binary_archive is actually
very small - only a few hundred lines of code.  This could
have been cloned and edited. This might or might not
have made things more/less intertwined with the other
archive classes.  This isn't a suggestion - just an observation
that it might be worth looking into.

Robert Ramey