Re: [boost] Re: Using Serialization for binary marshalling

Matthew Vogt wrote:
Robert Ramey <ramey <at> rrsd.com> writes:
Correct. That's exactly what the current implementations of STL classes do. So my question is why override vector for you xdr/cdr archives?
Primarily, because the implementation you have chosen for default serialization of STL containers is just an implementation detail from your point of view; you could quite reasonably change it in the next version, if someone with alternate requirements requested that, and it didn't conflict with your own purposes. OTOH, XDR and CDR mandate how sequences should appear in a stream, so it is necessary to ensure that the default implementation is not relied upon for conformance.
OK, but then why not just delay the effort until the underlying implementation actually changes?
Another reason, is that other formats will vary the implementation in minor ways - by using a preceding length field in 8-bit signed format, for example.
I believe this could be addressed without too much trouble - perhaps by making a special type (a strong type) to hold collection count and tweaking collection implementation to use it. So it could then be implemented to taste on each kind of archive.
I would like to see your xdr/cdr archives finished off at the same level of generality as the other archives. Any special purpose marshalling would leverage off that.
I don' think this is possible, since these formats do not specify Sufficient types to do so.
The library maps an arbitrary C++ data structure to a sequence of C++ primitives and back again. Any format that can handle all C++ primitives can therefore be used to serialize any C++ data structure. I'm sure that XDR/CDR can represent any C++ primitive data type. Hence, XDR/CDR can be used to serialize any C++ data structures with this library.
It would be possible to allow type promotion to cover some missing types - for example, in XDR, it would be possible to allow all C++ integral types less than 32-bits to be promoted to 32-bits during serialization, but I think this would be misleading.
Misleading?
More problematically, there is no concept of objects in this situation.
Of course at the archive level you only see a string of numbers. It's the serialization that recovers C++ objects from all of this.
It wouldn't make sense to transmit objects via pointers, since 'the other side' will never reconstitute objects.
Perhaps this is a source of confusion. It never occurred to me that the other side wouldn't have the mirror serialization code. That is of you're using xdr_oarchive to make the stream I assumed you would use xdr_iarchive on the other side to recover the data. Even if you don't anticipate the need for doing this yourself, other users will and they will want the universal coverage that the current system provides. I wager that if you tell users on this this list - "Oh the XDR archive is special, it's just for marshalling, you can't do objects with XDR" you'll get an earful. And its not necessary to subject oneself to this kind of grief. It's much less work to provide for every C++ primitive data type and know that any serialization is always going to work than it is to try to anticipate ahead of time for just those combinations of features users are going to actually use. If you want a "restricted" or special purpose xdr archive that prohibits pointers or something else in order to conform to some other program not built with this library, you can just derive from the simple general one and overload the prohibited operations with compile-time assertions. Robert Ramey

Robert Ramey <ramey <at> rrsd.com> writes:
OK, but then why not just delay the effort until the underlying implementation actually changes?
Because it's not a significant effort to do so now, and I won't need to worry about again. And they collections will need specialised serialization for many uses of ordered_[io]archive, ignoring CDR and XDR.
Another reason, is that other formats will vary the implementation in minor ways - by using a preceding length field in 8-bit signed format, for example.
I believe this could be addressed without too much trouble - perhaps by making a special type (a strong type) to hold collection count and tweaking collection implementation to use it. So it could then be implemented to taste on each kind of archive.
This would not necessarily work, since a format may contain (in the same message) a collection preceded by an 8-bit length, and another preceded by a 16-bit length - it's not necessarily a property of the archive. Also, my expectation is that any specialised format would need only use the ordered_[io]archive, rather than have to derive new types from them.
I would like to see your xdr/cdr archives finished off at the same level of generality as the other archives. Any special purpose marshalling would leverage off that.
I don' think this is possible, since these formats do not specify Sufficient types to do so.
The library maps an arbitrary C++ data structure to a sequence of C++ primitives and back again. Any format that can handle all C++ primitives can therefore be used to serialize any C++ data structure. I'm sure that XDR/CDR can represent any C++ primitive data type. Hence, XDR/CDR can be used to serialize any C++ data structures with this library.
This is not quite true; XDR is missing all sub-32-bit types, and there are obviously mismatches around wchar_t, long double, etc.
It would be possible to allow type promotion to cover some missing types - for example, in XDR, it would be possible to allow all C++ integral types less than 32-bits to be promoted to 32-bits during serialization, but I think this would be misleading.
Misleading?
Misleading because the size of an element in a serialized format is determined by the size of its matching C++ counterpart. The advantage of this is that the programmer does not have to specify the serialized size, apart from defining the type used to hold the value within a program. If XDR promotes C++ types to greater bit-widths, then this symmetry is broken. It is not difficult to understand, but restricting the possible types is simpler...
More problematically, there is no concept of objects in this situation.
Of course at the archive level you only see a string of numbers. It's the serialization that recovers C++ objects from all of this.
It wouldn't make sense to transmit objects via pointers, since 'the other side' will never reconstitute objects.
Perhaps this is a source of confusion. It never occurred to me that the other side wouldn't have the mirror serialization code. That is of you're using xdr_oarchive to make the stream I assumed you would use xdr_iarchive on the other side to recover the data. Even if you don't anticipate the need for doing this yourself, other users will and they will want the universal coverage that the current system provides.
Yes, this is the source of our confusion. My intention is to allow C++ programs to communicate with legacy applications, using legacy binary formats. The 'other side' in this case will never have mirror serialization code. Now, as you rightly point out, CDR does provide for efficient binary representation of C++ primitive types. I can see that some people might prefer to use CDR as an on-wire format for communication between equivalent systems, so this certainly is a candidate for complete C++ coverage. This is complicated, however. If you were to allow full C++ serialization into a CDR archive, the result would be portable but not conformant to any standard or protocol, since it relied on the serialization library's object serialization method. Only another user of the serialization library could recover the serialized data. The use of CDR for true interoperability would involve the use of the CORBA IDL, and the CORBA object model. This is a long way away from simple marshalling...
I wager that if you tell users on this this list - "Oh the XDR archive is special, it's just for marshalling, you can't do objects with XDR" you'll get an earful.
Well, maybe - but that is what the XDR is for. You would only serialize to XDR to communicate with an existing system already using that encoding - for any other purpose, CDR would be superior, I think. Perhaps there is a use case that I'm missing here?
And its not necessary to subject oneself to this kind of grief. It's much less work to provide for every C++ primitive data type and know that any serialization is always going to work than it is to try to anticipate ahead of time for just those combinations of features users are going to actually use.
That depends on what you mean by 'work'.
If you want a "restricted" or special purpose xdr archive that prohibits pointers or something else in order to conform to some other program not built with this library, you can just derive from the simple general one and overload the prohibited operations with compile-time assertions.
Yes, I suppose this should be the direction for CDR. Does anyone want this for XDR? Matt

On Wed, 21 Apr 2004 00:26:47 +0000 (UTC), Matthew Vogt wrote
Robert Ramey <ramey <at> rrsd.com> writes:
Perhaps this is a source of confusion. It never occurred to me that the other side wouldn't have the mirror serialization code. That is of you're using xdr_oarchive to make the stream I assumed you would use xdr_iarchive on the other side to recover the data. Even if you don't anticipate the need for doing this yourself, other users will and they will want the universal coverage that the current system provides.
... snip ...
Now, as you rightly point out, CDR does provide for efficient binary representation of C++ primitive types. I can see that some people might prefer to use CDR as an on-wire format for communication between equivalent systems, so this certainly is a candidate for complete C++ coverage.
This is complicated, however. If you were to allow full C++ serialization into a CDR archive, the result would be portable but not conformant to any standard or protocol, since it relied on the serialization library's object serialization method. Only another user of the serialization library could recover the serialized data.
Unless I'm mistaken you will have achieved a cross-platform (eg: big-endian, little-endian compatible) efficient binary format for the case where you do indeed have mirror serialization code on both sides. To me that's a huge win. I can write to binary files or across a socket to any platform and as long as I deserialize using the same code it should work. I don't believe the other binary (maybe the text) archives can claim that. I think there is one more restriction on the C++ code, however. Types would have to be coded with 'portable' types (eg: boost::uint_16, boost::uint_32, etc). But anyone doing cross-platform development already does that, right ;-) BTW, the CDR archive approach is a stark contrast to the idea of using highly bloated XML where the overhead of the message far outstrips the actual data. Perhaps I'm just one of those silly old fools that remembers when 'every byte was precious', but it just seems crazy to me to use XML for network messaging. ..
The use of CDR for true interoperability would involve the use of the CORBA IDL, and the CORBA object model. This is a long way away from simple marshalling...
I don't think we need to go there at all. I note that ACE has a similar class to do this sort of marshelling. http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/classACE__CDR.html Now obviously this is used in TAO (the CORBA implementation built on top of ACE), but I've used it directly on a project to help serialize arbitrary C++ objects across a socket interface to heterogeneous platforms... Jeff
participants (3)
-
Jeff Garland
-
Matthew Vogt
-
Robert Ramey