Boost.Serialization archive constructor performance

25 Jul 2005

      I've been experimenting with the Boost.Serialization library, to decide
whether to use it in a project I'm working on. If it weren't for some
performance issues, the decision of whether to use this library would be
a no-brainer. Because it seemed otherwise so promising, I've spent some
time investigating these performance issues, and have a couple of
suggestions for improvements that would address them. (Obviously these
suggestions are for post-1.33 release.)

The use case being considered is marshalling values for transport to
some other process. This is among the use cases discussed in the
overview section of the library's documentation. However, it turns out
that with relatively fine-grained marshalling operations, constructor /
destructor time for the archive can amount to a significant fraction of
the time spent in performing such an operation. The thing that seems to
be using up most of the time is the initialization of various standard
containers (std:set, std::vector, std::list), most of which are used for
pointer tracking.

For reference, a snippet to show the sort of thing I'm doing. (Please
ignore the fact that this code snippet is obviously not thread-safe and
similar issues; I've simplified things to show just the parts relevant
to this issue.)

     const int binary_archive_flags =
         boost::archive::no_header | boost::archive::no_codecvt;
     const std::size_t size 64;
     static char buffer[size];
     // allocate stream once, and reset for reuse
     static boost::iostreams::stream<io::array_sink> out(buffer, size);
     out.seekp(0);
#if REUSE_ARCHIVES
     // allocate archive once, and reset for reuse
     static boost::archive::binary_oarchive oa(out, binary_archive_flags);
     oa.reset();
#else
     boost::archive::binary_oarchive oa(out, binary_archive_flags);
#endif
     save_ss(ss, oa);     // write ss into archive oa
     // arrange for destination process to receive the contents of
     // buffer containing serialized ss

A change which I think should be not too difficult to make, and which
would make a large difference, would be to provide a mechanism for
resetting and reusing an archive. As part of my experiments I added a
public void reset() operation boost::archive::detail::basic_iarchive and
basic_oarchive, which clear the various tracking related containers in
the associated pimpl objects. I then modified my tests to allocate their
archives once and reset before each use. The result was a significant
speedup. Rather than spending roughly 1/3 of the time in archive
construction / destruction, the archive reset time is now close to the
noise level in my measurements. (As you can see in the above code
snippet, I'd already started reusing the streams for similar reasons.)

I'm not necessarily proposing exactly the interface I used here for
experimentation. A better interface might be to reset the archive when
the associated stream is changed (which involves adding an interface for
changing the stream, of course). But I'm not sure that's right either,
since there is some separation between the stream handling part and the
pointer tracking part of things. Perhaps there is an intent to support
something other than streams, by using something other than
boost::archive::basic_binary_[i,o]primitive?

I think that once the details of this interface are determined, the
implementation is pretty straight-forward.

A second change, which may or may not be worthwhile, would be to avoid
allocating the various tracking-related containers at all when the
archive is created with no_tracking specified. With archive reuse as
described above, this probably wouldn't make any directly noticeable
performance difference for my usage. It might make a difference in space
usage though, as the design I'm investigating this library for might end
up with a significant number of archives devoted to various specific
things, with pointer-less data being common in this design. So avoiding
the allocation of a bunch of always empty containers, might be
worthwhile, depending on the space used by such.  This seems like it
might be significantly more difficult to implement than the reuse
support though. And whether it is actually worth doing depends on the
internal implementation of the standard containers being used.

Kim Barrett

Robert Ramey

Kim Barrett

Robert Ramey

David Abrahams

Kim Barrett

Jarl Lindrud

Jonathan Turkanis

Jarl Lindrud

tags

participants (5)