Re: [boost] Re: Using Serialization for binary marshalling

Matthew Vogt writes:
Actually, XDR and CDR are of secondary interest for me;
For me too, but these have be much requested and finishing these off in a polished way will be significant contribution to the library for many users.
I'm interested in being able to save to proprietary formats (which often means that the applications involved never got around to specifying a standard format...). There must be thousands of ad-hoc binary formats in use today.
Hmmm - I'm a little wary of this. Though I'm not sure I know what you mean. Some persons interested in the library have hoped it can be used to generate some specific format. But that format doesn't accommodated all the information required to rebuild and arbitrary C++ data structure so in order to do one ends up coupling the serialization of classes to the archive format - just exactly what the serialization library is designed to avoid. Actually, My current thinking is to add a section to the documentation ( I love my documentation ! Many people have contributed to it by careful reading and criticism) which suggests a transition path from a proprietary format to usage of the serialization library. This transition would be basically a) make a program which loads all your "old" data in the "old" way. b) serialize the structures. So I'm skeptical of trying to adjust to "old" proprietary formats with the serialization library.
1) I wonder why you derived ordered_oarchive from basic_binary_oprimitive<Archive, OStream>
I'm using the save_binary / load_binary functions from basic_binary_oprimitive and basic_binary_iprimitive.
2) there's some stuff in boost that addresses alignment in a guaranteed? Portable manner that may relevant here. Sse #include <boost/aligned_storage.hpp> . BTW - the best way to make your code
Maybe we'll consider factoring this out into a standalone function. We'll keep and eye on this for now. In fact, if the salient feature of XDR is Endian awarness, alignment etc. I'm wondering if some of the functionality of your classes shouldn't be moved into your own version of load_binary thereby making inheriting the native one unnecessary. portable
without cluttering up with #ifdef etc... is to use more boost stuff - let other people clutter up their code with #ifdef all over the place.
Yes, but the aligned_storage template helps with platform-specific alignment within the machine. I don't see how it helps with platform-independent alignment within the content of the archive...
OK - I would like to see that made a little more transparent and better explained with comments. It's an important part of the issues being addressed.
3) I'm curious about the override for the saving of vector. ...
I need to override the serialization of vector, because the vector must be serialized with a known policy to yield a required layout in the archive. The fact that this serialization happens (at this point in time) to be exactly the same as the default implementation is not relevant - that can be changed at any time, but the CDR, XDR and other binary formats must not change.
I'm still not convinced - its seems to me that it shouldn't need to be overridden for XDR and CDR which is what these classes do. What about list, deque, set, etc. [ Points 4-6 conceded ]
7) I'm just a little queasy about using the term marshalling. In fact any of the archive implementations could be used for marshalling - not just a binary one (XML is the favorite flavor of the month). So I think it's sort of misleading.
Yes, I see what you're saying. I'll have a think about this - but the term 'marshalling' is not particularly prevalent in the code. Even if the term does have broad application, I think I am using it in the traditional sense.
Your library does marshalling ( as understand the term is usually used ). My complaint is that its too modest. Your library does more than that. When I started this library there was strong usage of the term "persistence" which lead to the misconception that the library had nothing to do with "marshalling". I see serialization as use in a number of things - persistence, marshalling and who knows what else? (e.g. generating a crc on the whole data state of the program to detected changes). That's why I went to much effort to avoid this characterization of the library. Your addition will gain strength from leveraging on this and by fitting in with the established pattern will be found easier to use. This will make it more successful. Also by following such a pattern it will almost entirely eliminate the need for special documentation.
8) in my native binary archive, I used a template parameter for the stream class. As it stands, I don't test or use the corresponding archive for wide character streams because the WEOF would need special treatment. I haven't spent any time on this. Perhaps you've addressed this in some way.
No, I was hoping you'd have it all sorted out - I've never used a stream specialised with anything but char. In fact, it doesn't really make sense to me, to produce binary output as anything but a byte stream - I was just following your lead, really :)
That's basically why I haven't spent any time on it. I made it templated on the stream class basically motivated by symmetry with the other archives. It should be finished by addressing the WEOF/EOF issue and using a code convert facet which turns the wchar_t interface into a byte stream. This way it would be useful to programs which are built around wchar_t rather than char. I suspect more programs in the future will do this. Its not very hard, but not of urgent interest. BTW - this might be better reason to derive from basic_binary_oprimitive. Robert Ramey

On Sun, 18 Apr 2004 09:38:59 -0700, Robert Ramey wrote
Matthew Vogt writes:
Actually, XDR and CDR are of secondary interest for me;
For me too, but these have be much requested and finishing these off in a polished way will be significant contribution to the library for many users.
I'm interested in being able to save to proprietary formats (which often means that the applications involved never got around to specifying a standard format...). There must be thousands of ad-hoc binary formats in use today.
Hmmm - I'm a little wary of this. Though I'm not sure I know what you mean. Some persons interested in the library have hoped it can be used to generate some specific format. But that format doesn't accommodated all the information required to rebuild and arbitrary C++ data structure so in order to do one ends up coupling the serialization of classes to the archive format - just exactly what the serialization library is designed to avoid. Actually, My current thinking is to add a section to the documentation ( I love my documentation ! Many people have contributed to it by careful reading and criticism) which suggests a transition path from a proprietary format to usage of the serialization library. This transition would be basically
a) make a program which loads all your "old" data in the "old" way. b) serialize the structures.
So I'm skeptical of trying to adjust to "old" proprietary formats with the serialization library.
Well, I believe you are taking too narrow a view here. These 'old' proprietary formats may not be old or replaceable. They are just part of the landscape on many projects. For example, I recently worked on a project where there was a communications protocol between a server and remote computers that are on the other end of radio link. The communications protocols between the server and the remote computers was a binary format that was rather limited and specific -- about 4 message types. The physical level protocol isn't a standard TCP or UDP b/c the hardware doesn't support this sort of protocol (for good reason). There are little details like a maximum message size which means that continuation packets need to be generated. When messages come to the server from the remote computers it has to modify the message slightly then transmit the data to other servers and clients over TCP. Bottom line -- it was very handy to have a different message format on the TCP side as compared to the proprietary side. Of course in the servers and clients we had c++ objects to represent the messages. Trivially we created classes that had data members for each field in the message protocol. We had a serialization framework and we wrote serialization code for each message type. We serialized the fields in the order that the protocol specified. Then we had 2 archive types -- one to read/write the binary protocol format and one to read/write the internal format. This design works great. There is a clear mapping from the protocol specification to the message classes and all the details of the 'physical' protocol are in a single archive class. Of course if you sent some sort of message down pipe that wasn't of the correct type to go to the remote computers, the serialization part of the framework won't stop you. But we prevented that in other layers of the architecture. In this case, our server to client format could easily use one of the standard archives provided by the library. But the external protocol could not -- hence the need for the ability to write custom archive types that read/write proprietary protocols. As for the design coupling issue, there really isn't a problem. The archive implementation is tied to the serialization library as are the serialized classes. The fact that an arbitrary c++ type won't work with the specialized archive isn't important because other elements of the design can easily prevent this from happening. I believe that Matthew has now demonstrated that archive extension is possible, which I believe is one of the major changes from the first review. That said, any documentation of this process would clearly be a plus. In a typical proprietary archives writing of the version and type numbers will need to be shut off. So I can imagine documenting these sorts of issues. Jeff

Jeff Garland <jeff <at> crystalclearsoftware.com> writes:
Well, I believe you are taking too narrow a view here. These 'old' proprietary formats may not be old or replaceable. They are just part of the landscape on many projects. For example, [ snip]
This is exactly the type of problem I'm familiar with. Of course, the protocol can be arbitrarily complex - often propropionate to the longevity of the system, with increasing gnarliness the older its inception.
This design works great. There is a clear mapping from the protocol specification to the message classes and all the details of the 'physical' protocol are in a single archive class. Of course if you sent some sort of message down pipe that wasn't of the correct type to go to the remote computers, the serialization part of the framework won't stop you. But we prevented that in other layers of the architecture.
If you're lucky enough to work on a C-based system utilising such formats, you'll quickly realise how superior this design is to older methods!
I believe that Matthew has now demonstrated that archive extension is possible, which I believe is one of the major changes from the first review.
Archive extension is certainly possible - the only limitation is that no other information attends the serialization operation of a given object - I would love to see somebody creating a non-serial archive!
That said, any documentation of this process would clearly be a plus. In a typical proprietary archives writing of the version and type numbers will need to be shut off. So I can imagine documenting these sorts of issues.
Yes, the documentation here won't need to be extensive. Just describe the initialisation phase of an archive (writing the preamble), and describe the difference between overriding the 'save', 'save_override' and 'operator<<' options. Possibly, the 'version_type', 'tracking_type' etc. elements should be described in more detail. Would there be any reason for an archive to modify the object registration process? Matt

On Mon, 19 Apr 2004 00:43:51 +0000 (UTC), Matthew Vogt wrote
Jeff Garland <jeff <at> crystalclearsoftware.com> writes:
Yes, the documentation here won't need to be extensive. Just describe the initialisation phase of an archive (writing the preamble), and describe the difference between overriding the 'save', 'save_override' and 'operator<<' options. Possibly, the 'version_type', 'tracking_type' etc. elements should be described in more detail.
Would there be any reason for an archive to modify the object registration process?
Yes, for 'random access' archives (think relational database buffer here) it's possible the archive process will need additional meta-data not provided in the base library setup (think mapping of classes to tables and attributes to columns). And yes, I've implemented this one before as well -- it's a very handy bit of infrastructure. I think in the past Robert has considered this "out of scope", but in fact I believe given the current library it could now be done by writing and archive and perhaps some enhancements to the registration. BTW, I think the references should include a reference to this paper which discusses many of these variations and design issues. http://www.ubilab.org/publications/print_versions/pdf/plop-96-serializer.pdf Of course, the modern C++ approach of this library wisely eschews inheritence in favor of templates, but the concepts are still the same. Jeff

Jeff Garland <jeff <at> crystalclearsoftware.com> writes:
BTW, I think the references should include a reference to this paper which discusses many of these variations and design issues.
http://www.ubilab.org/publications/print_versions/pdf/plop-96-serializer.pdf
I agree; it has a useful summary of the benefits, and a interesting discussion of implementation details. Matt

Robert Ramey <ramey <at> rrsd.com> writes:
I'm interested in being able to save to proprietary formats (which often means that the applications involved never got around to specifying a standard format...). There must be thousands of ad-hoc binary formats in use today.
Hmmm - I'm a little wary of this. Though I'm not sure I know what you mean.
Yes, I had better explain here. In the case that I'm addressing, the format is not a means to an end, but an end in itself. The goal of this effort is to use the serialization library framework not to produce reversible transformations between arbitrary C++ and a bytestream, but to take targetted C++ objects and produce known binary representations.
Some persons interested in the library have hoped it can be used to generate some specific format. But that format doesn't accommodated all the information required to rebuild and arbitrary C++ data structure so in order to do one ends up coupling the serialization of classes to the archive format - just exactly what the serialization library is designed to avoid.
I don't think it is a case of coupling, merely one of limitation. An object that can be serialized into XDR format must use only a limited range of C++ types in it's composition - but having accepted that limitation, it can still be serialized into a less limiting archive type with the same code.
Actually, My current thinking is to add a section to the documentation ( I love my documentation ! Many people have contributed to it by careful reading and criticism) which suggests a transition path from a proprietary format to usage of the serialization library. This transition would be basically
a) make a program which loads all your "old" data in the "old" way. b) serialize the structures.
So I'm skeptical of trying to adjust to "old" proprietary formats with the serialization library.
This is a transition from an old format to the brave new world, but in the context of persistence. It doesn't apply in the case of serialization for marshalling.
1) I wonder why you derived ordered_oarchive from basic_binary_oprimitive<Archive, OStream>
I'm using the save_binary / load_binary functions from basic_binary_oprimitive and basic_binary_iprimitive.
Maybe we'll consider factoring this out into a standalone function. We'll keep and eye on this for now. In fact, if the salient feature of XDR is Endian awarness, alignment etc. I'm wondering if some of the functionality of your classes shouldn't be moved into your own version of load_binary thereby making inheriting the native one unnecessary.
I don't really see any need for this at this point. Once the data is the correct binary format, one save_binary function is as good as another. I don't think that inheriting extraneous 'save' members from the basic_binary_oprimitive class is a major concern. Also, as you point out later, I want to inherit any work regarding issues with streams, locales and whatever else is going on way down low.
2) there's some stuff in boost that addresses alignment in a guaranteed? Portable manner that may relevant here. Sse #include <boost/aligned_storage.hpp> . BTW - the best way to make your code portable without cluttering up with #ifdef etc... is to use more boost stuff - let other people clutter up their code with #ifdef all over the place.
Yes, but the aligned_storage template helps with platform-specific alignment within the machine. I don't see how it helps with platform-independent alignment within the content of the archive...
OK - I would like to see that made a little more transparent and better explained with comments. It's an important part of the issues being addressed.
Ok, no problem.
3) I'm curious about the override for the saving of vector. ...
I need to override the serialization of vector, because the vector must be serialized with a known policy to yield a required layout in the archive. The fact that this serialization happens (at this point in time) to be exactly the same as the default implementation is not relevant - that can be changed at any time, but the CDR, XDR and other binary formats must not change.
I'm still not convinced - its seems to me that it shouldn't need to be overridden for XDR and CDR which is what these classes do. What about list, deque, set, etc.
Well, I never thought these were necessary, remembering that I am providing for classes, which are designed to be serialized into a particular binary format. In a binary format, all collections will boil down to a group of repetitions, which are either preceded by a length/count argument, or whose length is a defined property of the format itself. For me, vector has always sufficed, in either fixed_length<>, or variable_length<> guise. Perhaps others have differing experience.
Yes, I see what you're saying. I'll have a think about this - but the term 'marshalling' is not particularly prevalent in the code. Even if the term does have broad application, I think I am using it in the traditional sense.
Your library does marshalling ( as understand the term is usually used ). My complaint is that its too modest. Your library does more than that. When I started this library there was strong usage of the term "persistence" which lead to the misconception that the library had nothing to do with "marshalling". I see serialization as use in a number of things - persistence, marshalling and who knows what else? (e.g. generating a crc on the whole data state of the program to detected changes). That's why I went to much effort to avoid this characterization of the library. Your addition will gain strength from leveraging on this and by fitting in with the established pattern will be found easier to use. This will make it more successful. Also by following such a pattern it will almost entirely eliminate the need for special documentation.
But there are inherent limitations in marshalling (per my usage). Pointers, which are adeptly handled by the general library, are not valid elements of a marshalled data set. Not all types can be represented in all formats. These are aspects that need documentation, because they render the archives fit only for marshalling, not for the more general 'serialization'. As is, the library certainly supports the most general concept, but my ambitions are more mundane. Of course, higher level constructs can be implemented on marshalling base archives. IIRC, 'IIOP' is the layer above CDR in CORBA, which provides for remote object references, etc. (I realise the need for documentation, this conversation would have been simplified had it existed earlier.) Matt
participants (3)
-
Jeff Garland
-
Matthew Vogt
-
Robert Ramey