
I've been experimenting with the Boost.Serialization library, to decide whether to use it in a project I'm working on. If it weren't for some performance issues, the decision of whether to use this library would be a no-brainer. Because it seemed otherwise so promising, I've spent some time investigating these performance issues, and have a couple of suggestions for improvements that would address them. (Obviously these suggestions are for post-1.33 release.) The use case being considered is marshalling values for transport to some other process. This is among the use cases discussed in the overview section of the library's documentation. However, it turns out that with relatively fine-grained marshalling operations, constructor / destructor time for the archive can amount to a significant fraction of the time spent in performing such an operation. The thing that seems to be using up most of the time is the initialization of various standard containers (std:set, std::vector, std::list), most of which are used for pointer tracking. For reference, a snippet to show the sort of thing I'm doing. (Please ignore the fact that this code snippet is obviously not thread-safe and similar issues; I've simplified things to show just the parts relevant to this issue.) const int binary_archive_flags = boost::archive::no_header | boost::archive::no_codecvt; const std::size_t size 64; static char buffer[size]; // allocate stream once, and reset for reuse static boost::iostreams::stream<io::array_sink> out(buffer, size); out.seekp(0); #if REUSE_ARCHIVES // allocate archive once, and reset for reuse static boost::archive::binary_oarchive oa(out, binary_archive_flags); oa.reset(); #else boost::archive::binary_oarchive oa(out, binary_archive_flags); #endif save_ss(ss, oa); // write ss into archive oa // arrange for destination process to receive the contents of // buffer containing serialized ss A change which I think should be not too difficult to make, and which would make a large difference, would be to provide a mechanism for resetting and reusing an archive. As part of my experiments I added a public void reset() operation boost::archive::detail::basic_iarchive and basic_oarchive, which clear the various tracking related containers in the associated pimpl objects. I then modified my tests to allocate their archives once and reset before each use. The result was a significant speedup. Rather than spending roughly 1/3 of the time in archive construction / destruction, the archive reset time is now close to the noise level in my measurements. (As you can see in the above code snippet, I'd already started reusing the streams for similar reasons.) I'm not necessarily proposing exactly the interface I used here for experimentation. A better interface might be to reset the archive when the associated stream is changed (which involves adding an interface for changing the stream, of course). But I'm not sure that's right either, since there is some separation between the stream handling part and the pointer tracking part of things. Perhaps there is an intent to support something other than streams, by using something other than boost::archive::basic_binary_[i,o]primitive? I think that once the details of this interface are determined, the implementation is pretty straight-forward. A second change, which may or may not be worthwhile, would be to avoid allocating the various tracking-related containers at all when the archive is created with no_tracking specified. With archive reuse as described above, this probably wouldn't make any directly noticeable performance difference for my usage. It might make a difference in space usage though, as the design I'm investigating this library for might end up with a significant number of archives devoted to various specific things, with pointer-less data being common in this design. So avoiding the allocation of a bunch of always empty containers, might be worthwhile, depending on the space used by such. This seems like it might be significantly more difficult to implement than the reuse support though. And whether it is actually worth doing depends on the internal implementation of the standard containers being used.