
Matthias Troyer wrote:
On 22 Jul 2010, at 18:42, Robert Ramey wrote:
Matthias Troyer wrote:
On 22 Jul 2010, at 13:39, Robert Ramey wrote:
I totally disagree with your statements that we depend on internal details of Boost.Serialization. Boost.Serialization does publish an (incomplete!) archive concept and you did intend that others can extend it with new archive classes.
The published archive concept specifes the concepts that must be fullfilled by any serializable types. The serialization library includes examples of archives which depend upon only on the documented concepts.
The documentated archive concepts don't prevent other archive classes from including more functionality. Indeed, facilities such as serialization of pointers through a base class, etc demand it. And it's true that I haven't discouraged leveraging on these "extended archives"
I'm talking about the requirements on a minimal archive. Those are not fully documented.
Hmmm that would be news to me. I compiled trivial archive from the documentation with serialization code. And trivial archive is model of minimal archive. That is I believe that serializing any types to a trivial archive will compile without error. If there is a serializable type which trivial archive fails to work with, I would be interested in hearing about it.
You misunderstand. The concepts should specify:
1) what primitive types the archive needs to be able to serialize 2) what concepts these primitive types satisfy that can be used.
Do you want to say that the trivial archive should be used to deduce that?
class trivial_oarchive { public: ////////////////////////////////////////////////////////// // public interface used by programs that use the // serialization library typedef boost::mpl::bool_<true> is_saving; typedef boost::mpl::bool_<false> is_loading; template<class T> register_type(){} template<class T> trivial_oarchive & operator<<(const T & t){ return *this; } template<class T> trivial_oarchive & operator&(const T & t){ return *this << t; } void save_binary(void *address, std::size_t count){}; };
This class satisfies the following:
1) it can deal with ANY type 2) it uses NO property of the types
correct. And it fullfills the requirements of the archive concept.
Clearly, those are not the requirements for all archives.
The archive concept doesn't prohibit any model of that concept to add other facilities of it's own choosing. The guarentee is that any archive which models the concept will compile ar << t for any serializable type t.
If we next look at the text archives we'll find that
1) it can deal with any primitive type that can be streamed to in i/ostream.
I believe that this is the same as saying that a text archive models the archive concept as stated in the documentation. 2) it uses the streaming operator of those types True - this isn't required by the archive concept. And in fact it's not true for all archives. For example, the native binary_archive doesn't use the streamin operators at all. It makes all calls to the underlying filebuf. In fact, a binary_archive can be constructed with either a stream (from which it just gets the filebuf) or from a filebuf or stringbuf directly.
Are those the requirements? Again not. What we need is an explicit list of all primitive types that an archive is required to support, and a list of the properties of those types that a user can rely on.
if it can't support any serializable type it's not an archve. all primitive types are serializable types by definition.
For example, all the primnitive types had a default constructor
all prmitive types may have a default constructor - but the archive concept doesn't require that. There might be some ambiguity here. For purposes of serializability - a prmitive type is one that either a) is a C++ primitive type or b) marked "primitive" via a serialization trait. the "serializable" concept doesn't require that a serializable type have a default constructor - and in fact many do not.
and I used it
You presumed that a type marked "primitive" must have a default constructor. I realize that all C++ types have default constructors - but there is no guarentee that a type marked as serializable via "primitive" have a default constructor.
but now you say that was an implementation detail that I should not have used.
agreed, I think you made an error here.
How can I as an archive implementor know what I may use without risking breakages?
Of course this is the real question. Strictly speaking the only way would to not leverage on the archives already written - as they add a lot of capability beyond that which is required by the concept. Of course, that would be a lot more work which we want to avoid. You wanted to leverage on a huge part of the archive implemenation which is beyond the strict archive concept. All the archvies class do this in different ways. xml, text and binary have bridged an incredible breadth of utility and functionality with very little breakage over the years. Honestly, I can't guarentee that this won't happen from time to time. If it makes you feel any better, in order to "fix" this issue with version_type I had to make a few minor changes in xml and text archives as well. I deally it would seem that it would be possible that they might be totally independent - in practice, sharing code through base classes makes them "a little bit" interdependent. But it cuts down the work by a large amount. So that's the trade off we've gotten. Huge reduction on effort, very wide applicability, very minor interface breakage, backward compatibiliy over 8 years (so far). **** So now I've answered your specific questions - here are a couple more random observations. The archive concept is very "thin". It only specifies the calling interface - really nothing more. It doesn't include versioning, it doesn't say anything about pointers, it doesn't say anything about tracking, etc, etc. We know that these issues are essential to a useful serialization library. How is it that it doesn't address this? (I'm suspecting that this might be your question). If we think about this it turns out there are different ways an archive might be implemented. Take pointers. Someone might make an archve which would just copy the raw pointer. He would say - hey I'm just using for in memory copies and I don't want "deep copies". The exact same line of thought comes up with regards to tracking (not good for sending data over a line), and all the other aspects of what we now call the serialization library. So I concluded that the archive concept shouldn't specify what the archive does - only it's interface. That's why it's as it is to day. When the serialization documentation was being disputed - it became clear that there was no agreement what seriaization should mean - by limiting the concept to the interface - it let me get past this dispute and move on. do you see a pattern here? It's facinating to me that this is how boost helps software quality. But failing to agree, it became clear, the the only way to move on was to leave any "features" unspecified which turned out to be the correct decision. Besides shortening the effort considerably it has had huge practical benefits. First it has permited the extension to new kinds of archives which were totally unanticipated. For an interesting example, look at the simple logging archive in the documentation. The archive is output only and can dump to any stream output any serializable type in a formatted way. It's header only. It implements the archive concept but doesn't rely on the base class implementations of the other archives so it's very light weight. I see this is being very interesting for logging and debugging. (of course public reaction has been underwhelming but that's not my point here). There are other things that one could make archives for: a) a template "deep copy" b) an editing archive for gui editing c) a diff archve where two archives are compared and their diffence is produced. d) an inverse of the above. e) c + d above would lead to a whole "archive algebra" for rolling back and forward archives. All of the above would be permited by the archvie concept and would work with all serializable types - with changing any current code! I hope that clarifies the reasonnig for why things are the way they are. Now, taking a look at the mpi usage of serializaiton. I realy haven't looked at it enough to really understand it so I maybe wrong about this - these are only casual observations. a) it seems that the "skeleton" idea seems to depend on the idea that the size of the data stored in the bniary archive be the same as the size of the underlying data type. Up until now that has been true even though there was never any explicity guarantee to that effect. I had to change the behavior in order to extract myself from some other fiasco and this "feature" was no longer true. I think this is where the problem started. It's no one's fault. b) the MPI file sends the class versions over the wire. It doesn't need to do this. If you look at some of the archives there is class_optional_id which is trapped by the archvie classes and suppressed both on input and output because that particular archive class doesn't need it. But it's there if someone want's to hook it (like an editing archive). I think MPI might want to do the same thing with version_type. c) I'm not sure how MPI uses portable binary archive (if at all). Seems like that might be interesting. d) what is really needed to send data "over the wire" is to be able to supress tracking at the archive level. The would permit the same data to be sent over and over and wouldn't presume that the data constant. So you wouldn't have to create a new archive for each transaction. I've puzzled about how to do this without breaking the archive concept. Turns out it's a little tricky. And there doesn't seem to be much demand for it - but maybe there would be if I did it. e) this bit of code is what created the the issue with the Sun compiler.
The problem comes from this line in boost/mpi/datatype_fwd.hpp:
template<typename T> MPI_Datatype get_mpi_datatype(const T& x = T());
Frankly, it's just plain wrong and should be fixed. You might say that you know it's wrong but it works around this or that template or compiler quirk and it's too hard to fix. I could accept that. But if it's fixable, it should be fixed. I did make the constructors of version_type public. I had made them private to trap errors in code where they were constructed but not initialized. Now error like this arn't trapped. So I think you should fix this. f) I believe that MPI uses binary_archive_base? as a basis. you could have used a higher level class as a basis. I don't know that that woudl have made things easier or harder but it's worth looking into. The binary_archive is actually very small - only a few hundred lines of code. This could have been cloned and edited. This might or might not have made things more/less intertwined with the other archive classes. This isn't a suggestion - just an observation that it might be worth looking into. Robert Ramey