[boost] Re: Re: Re: Re: Using Serialization for binary marshalling

6 May 2004

      Matthias Troyer wrote:
...
On May 5, 2004, at 4:11 PM, Ian McCulloch wrote:
...
Dave Harris wrote:
...
In-Reply-To: <95C853D0-9EAF-11D8-BFD9-000A95DC1C98@itp.phys.ethz.ch>
troyer@itp.phys.ethz.ch (Matthias Troyer) wrote (abridged):
...
As I see it the current serialization library allows both options,
depending on your preferences. Any archive may choose which types it
view as fundamental, but both have their disadvantages:
I would use variable length integers. Use as many bytes as you need
for
the integer's actual value. That way the archive format is
independent of
whether short, int, long or some other type was used by the outputting
program. It can also give you byte order independence for free.
Specifically, I'd use a byte oriented scheme where the low 7 bits of
each
byte contribute to the current number, and the high bit says whether
there
are more bytes to come.
[...]
Thus integers less than 128 take 1 byte, less than 16,000 take 2 bytes
etc. This gives a compact representation while still supporting 64-bit
ints and beyond. You can use boost::numeric_cast<> or similar to
bring the
uintmax_t down to a smaller size.
Nice idea.  I was thinking along the lines of how to achieve
high-performance, with shared memory & fast network message passing in
mind.  I think now though, this is probably too specialized an
application
for boost::serialization.
I disagree. The above ideas would be for archive formats geared towards
persistence, where speed is not the main issue. All we need to achieve
high performance is an archive designed for this purpose, with
specialized serialization functions for the standard containers.
Ignoring portability issues since the ultra-fast networks (such as
Infiniband, Cray Red Storm or similar) are between homogeneous nodes
with the same hardware. The same serialization code could then be used
both for fast network communications as well as for serialization to a
portable archive for persistence purposes, just by using different
archive types.
Sure.  I realized shortly after I wrote that, that I had failed to mention
an important feature that I need - namely the message-passing format needs
to be the same (or at least readable by) the on-disk format.  This is
beacuse I use a pair of smart pointers (pref_handle<T> and pref_ptr<T>),
the first of which doesn't care if the object is in memory or on disk (or
concievably, on some other node).  The second pointer loads the object into
memory (if it wasn't already) and gives access to it.  But in general an
object that is not in memory has no type identification with it
(polymorphic types do, but value types don't - this would be really
excessive overhead).  Sending a pref_handle<T> that exists only on disk
(cache) across MPI just requires sending the byte stream itself, with no
extra steps of deserialize in the disk-format and re-serialize in the
network-format.  Conversely, it is possible that on the receiving end, an
object will be loaded into a pref_handle<T> and not deserialized
immediately, but ending up with an object serialized in the network-format
on the disk would be a disaster.

Perhaps, a better design would be to keep close track of streams that are
serialized in the network-format and just make sure that they never hit the
disk.  I'm not sure whether this is even possible with my current scheme
though, but I'll look into it.  But this would solve all(?) portability
issues with respect to serializing typedef's etc.

OTOH, with the current scheme it is possible to be somewhat lazy about
caring whether the objects are in memory or not, since the serialization is
so fast (and there is of course a rather large cache before it actually
hits the disk, not to mention the OS buffer cache).

Cheers,
Ian