[boost] Re: Re: Using Serialization for binary marshalling

5 May 2004

      Matthias Troyer wrote:
...
Brian Braatz wrote:
...
Question- I was just looking for clarification. Does the boost
serialization library allow me to have objects on different platforms
stream to each other?
You will need two ingredients:
1.) a binary portable archive, e.g. the CDR or XDR archives discussed
2.) replacements for the serialization of the standard containers,
since the ones provided by Robert store the sizes as int (instead of a
portable integer type). One possibility would be to have a traits type
in the serialization for these containers, which defaults to int (or
better yet std::size_t) but which could be specialized for the portable
archives.
Why is it not possible to store an 'int' in a portable binary archive?

If I understand XDR properly, RFC 1014 defines an implicit mapping between C
datatypes and XDR datatypes such that int maps onto 'Signed Intger',
unsigned maps onto 'Unsigned Integer' and so on.  All of the XDR stream
functions in the RPC toolkits use this mapping too.  If the target platform
has an int type that does not correspond to a XDR 'Signed Intger' then too
bad, only the common subset of values will be available to the program.

If the boost serialization library chooses a different mapping -- AFAICT the
suggestions so far have been to use
int_32 -> Signed Integer
int_64 -> Hyper Integer
and so on -- then this violates the usual XDR mapping and there is no chance
that the resulting program would be wire-compatible with anything other
than itself (and possibly not even then, if it is compiled on a platform
that doesn't have int_64, for example).

Requiring that the only types that can be portably serialized are the
fixed-size typedefs is far too high a burden IMHO.  There is no way to do a
compile-time (or even a runtime) check that some user hasn't accidentally
serialized an 'int' - except that it will mysteriously break on some
platform.  And what if you already have a std::vector<int> that you want to
serialize?  Should the user really be required to copy it into a
std::vector<int_32> first?

It seems to me, there are two mutually exclusive (incompatible) choices for
a binary archive.  Either specify mappings for the builtin types int -> X,
long -> Y etc and use only those types (no serializing a size_t or int_32),
*OR* specify mappings between fixed-size types int_32 -> X, int_64 -> Y etc
and use only those types (no serializing a size_t or a plain builtin). 
Better make a choice now and be consistent, because changing it later will
be a nightmare.

My clear preference is the first option.  In my own serialization code
(which I hope one day to layer on top of boost serialization), I define
serveral formats [currently 3, XDR, LE_LP32 (little-endian 32-bit pointers
& long, as in x86) and LE_LP64 (little-endian 64-bit pointers & long, as in
alpha)], as in the following pseudo-code:

struct XDR_tag {};
struct LE_LP32_tag {};
struct LE_LP64_tag {};

template <typename FormatTag> struct Mapping;

// XDR mapping

template <> struct Mapping<XDR_tag>
{
   typedef uint_32 size_type; 

   template <typename T> struct BuiltinTypeToStreamType;

   template <> struct BuiltinTypeToStreamType<int>
   {
      // int maps onto signed 32 bits in XDR
      typedef int_32 type;
   };

   template <> struct BuiltinTypeToStreamType<long>
   {
      // long maps onto signed 32 bits in XDR
      typedef int_32 type;
   };
   // ...
};

// LE_LP64 mapping

template <> struct Mapping<LE_LP64_tag>
{
   typedef uint_64 size_type; 

   template <typename T> struct BuiltinTypeToStreamType;

   template <> struct BuiltinTypeToStreamType<int>
   {
      typedef int_32 type;
   };

   template <> struct BuiltinTypeToStreamType<long>
   {
      typedef int_64 type;
   };
   // ...
};

The serialization functions map type T to
Mapping<Format>::BuiltinTypeToStreamType<T>::type, which are then converted
to the appropriate endianness and sent to the stream.  Low-level stuff that
cares about size_t can use Mapping<Format>::size_type (the serialization of
std containers does this).  The only place where the fixed-size types
appear is in the intermediate phase before hitting the stream.  They don't
need to be typedefs for builtin types, the only requirement is that the
buffer knows how to stream it.

I also have some types (not typedefs!) to represent fixed-size integers if
that is required - they just get mapped into the corresponding size type of
the binary format.  I've found thta I havn't used them much though.  It
would be possible to do something similar for size_t (which appears a lot,
of course).  This would be simpler than looking into the Mapping<Format>
traits, but I havn't implemented that yet.

Does something like this structure make sense for the boost serialization
library?

Cheers,
Ian McCulloch