Boost.Serialization archive constructor performance

I've been experimenting with the Boost.Serialization library, to decide whether to use it in a project I'm working on. If it weren't for some performance issues, the decision of whether to use this library would be a no-brainer. Because it seemed otherwise so promising, I've spent some time investigating these performance issues, and have a couple of suggestions for improvements that would address them. (Obviously these suggestions are for post-1.33 release.) The use case being considered is marshalling values for transport to some other process. This is among the use cases discussed in the overview section of the library's documentation. However, it turns out that with relatively fine-grained marshalling operations, constructor / destructor time for the archive can amount to a significant fraction of the time spent in performing such an operation. The thing that seems to be using up most of the time is the initialization of various standard containers (std:set, std::vector, std::list), most of which are used for pointer tracking. For reference, a snippet to show the sort of thing I'm doing. (Please ignore the fact that this code snippet is obviously not thread-safe and similar issues; I've simplified things to show just the parts relevant to this issue.) const int binary_archive_flags = boost::archive::no_header | boost::archive::no_codecvt; const std::size_t size 64; static char buffer[size]; // allocate stream once, and reset for reuse static boost::iostreams::stream<io::array_sink> out(buffer, size); out.seekp(0); #if REUSE_ARCHIVES // allocate archive once, and reset for reuse static boost::archive::binary_oarchive oa(out, binary_archive_flags); oa.reset(); #else boost::archive::binary_oarchive oa(out, binary_archive_flags); #endif save_ss(ss, oa); // write ss into archive oa // arrange for destination process to receive the contents of // buffer containing serialized ss A change which I think should be not too difficult to make, and which would make a large difference, would be to provide a mechanism for resetting and reusing an archive. As part of my experiments I added a public void reset() operation boost::archive::detail::basic_iarchive and basic_oarchive, which clear the various tracking related containers in the associated pimpl objects. I then modified my tests to allocate their archives once and reset before each use. The result was a significant speedup. Rather than spending roughly 1/3 of the time in archive construction / destruction, the archive reset time is now close to the noise level in my measurements. (As you can see in the above code snippet, I'd already started reusing the streams for similar reasons.) I'm not necessarily proposing exactly the interface I used here for experimentation. A better interface might be to reset the archive when the associated stream is changed (which involves adding an interface for changing the stream, of course). But I'm not sure that's right either, since there is some separation between the stream handling part and the pointer tracking part of things. Perhaps there is an intent to support something other than streams, by using something other than boost::archive::basic_binary_[i,o]primitive? I think that once the details of this interface are determined, the implementation is pretty straight-forward. A second change, which may or may not be worthwhile, would be to avoid allocating the various tracking-related containers at all when the archive is created with no_tracking specified. With archive reuse as described above, this probably wouldn't make any directly noticeable performance difference for my usage. It might make a difference in space usage though, as the design I'm investigating this library for might end up with a significant number of archives devoted to various specific things, with pointer-less data being common in this design. So avoiding the allocation of a bunch of always empty containers, might be worthwhile, depending on the space used by such. This seems like it might be significantly more difficult to implement than the reuse support though. And whether it is actually worth doing depends on the internal implementation of the standard containers being used.

Truth is that I never considered the overhead in archive creation because the usage in marshalling is less common that for persistence. I wouldn't mess with it without real data that it would make a significant difference. Now that you have provided such data, I'll consider it. It doesn't seem that it would be all that hard to do. Feel free to experiment with thi sa little more. I presume with your changes, you've found the serialization libary suitable for your application? Robert Ramey

At 2:12 PM -0700 7/25/05, Robert Ramey wrote:
Truth is that I never considered the overhead in archive creation because the usage in marshalling is less common that for persistence. I wouldn't mess with it without real data that it would make a significant difference.
Entirely reasonable.
Now that you have provided such data, I'll consider it. It doesn't seem that it would be all that hard to do. Feel free to experiment with thi sa little more.
Most of the work in my experimental reset function involved studying the class hierarchy; the actual coding time was pretty minimal. Since the changes are small, I've appended them (made to some cvs version of boost from a couple weeks ago) to the end of this message. As I said previously, I'm not convinced this is quite the right interface, so feel free to not lock in on exactly this. I'll be on vacation starting the end of this week; when I return I'll be upgrading us to Boost 1.33 (presumably :) and then going back to work on serialization-related stuff. I'll be happy to work with you on this in whatever way will prove useful.
I presume with your changes, you've found the serialization libary suitable for your application?
I think so. I still have some things to look at, including code size, which I haven't really looked at yet. It might be that this is either just not a problem (some of our requirements haven't been nailed down yet), or addressable by coding patterns that ensure we aren't picking up multiple copies of (inlined) things unnecessarily. And I sure hope this works out, as it will save me a lot of work! We have an existing marshalling facility that we long ago decided needed some additional features. Using Boost.Serialization would give us every feature we've ever discussed, plus some bonus features that we hadn't realized we wanted or needed. ----- boost/archive/detail/basic_iarchive.hpp // in the definition of class basic_iarchive, add void reset(); ----- boost/archive/detail/basic_oarchive.hpp // in the definition of class basic_oarchive, add void reset(); ----- libs/serialization/src/basic_iarchive.cpp ----- // in the definition of class basic_iarchive_impl, add void reset(); // add this definition inline void basic_iarchive_impl::reset() { object_id_vector.clear(); moveable_object_stack.clear(); moveable_object_position = 0; cobject_info_set.clear(); cobject_id_vector.clear(); created_pointers.clear(); pending_object = NULL; pending_bis = NULL; pending_version = 0; } // add this definition BOOST_ARCHIVE_DECL(void) basic_iarchive::reset() { pimpl->reset(); } ----- libs/serialization/src/basic_oarchive.cpp ----- // in the definition of class basic_oarchive_impl, add void reset(); // add this definition inline void basic_oarchive_impl::reset() { object_set.clear(); cobject_info_set.clear(); stored_pointers.clear(); pending_object = NULL; pending_bos = NULL; } // add this definition BOOST_ARCHIVE_DECL(void) basic_oarchive::reset() { pimpl->reset(); }

Thinking about this, its very odd to me that just initializing some STL containers takes up so much time. For someone that has nothing else to do I would love to see the following. A comprehensive analysis of where the time goes in serialization. Most of the really deep and time consuming stuff is in templated inline functions - most of which are reducible to nothing - in theory. I suspect that practice, performance is all over the map in this area. In fact, I would be curious to see something like the above for STL in general. And other boost libraries. Microsoft used to have - up to VC6 a very nice interrupt driven profiler. It was very, very simple to use and didn't require instrumenting the code. I found it very useful. Since I've "upgraded" to VC 7.1 I don't have it now. Of course they've given me something else - but I haven't figured out what it is or how it works yet. Its right up there on my list with figureing out what's involved in making a 64 bit program which runs on ?... Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
Microsoft used to have - up to VC6 a very nice interrupt driven profiler. It was very, very simple to use and didn't require instrumenting the code. I found it very useful. Since I've "upgraded" to VC 7.1 I don't have it now. Of course they've given me something else - but I haven't figured out what it is or how it works yet. Its right up there on my list with figureing out what's involved in making a 64 bit program which runs on ?...
The vc8 betas are shipping with a comprehensive suite of performance and static analysis tools, or so the head of the Visual Studio team tells me. I haven't looked at them, but I suggest you try. -- Dave Abrahams Boost Consulting www.boost-consulting.com

At 9:41 AM -0700 7/26/05, Robert Ramey wrote:
Thinking about this, its very odd to me that just initializing some STL containers takes up so much time.
It does seem odd, so I've spent some more time looking at this. If I haven't messed up somewhere, it looks like the empty container initializations might only be accounting for about 10% of the total constructor time. But there is another container which is being affected by the reset trick which might not be empty: the m_helpers member of boost::archive::detail::basic_archive_impl. I'm not clearing that std::set (it seemed like something that shouldn't be cleared in order to reuse the archive, though I haven't fully figured out how it is used). I haven't yet figured out a way to measure it's impact in isolation (maybe I'll have to try making lots of archives with and without some part of its implementation commented out, which probably involves a rebuild of boost, so probably not today). So I'm not certain exactly why the reset and reuse approach helps as much as it does, just that it does. By the way, the times here are not individually large; archive constructor time on the 2GHz Pentium I'm using for these tests is 6-8 usec. But it adds up when one is doing *lots* of them, especially if they aren't actually necessary. Also, some of the intended target platforms for what I'm working on aren't nearly as fast as this test machine.

Kim Barrett <kab <at> irobot.com> writes:
I've been experimenting with the Boost.Serialization library, to decide whether to use it in a project I'm working on. If it weren't for some performance issues, the decision of whether to use this library would be a no-brainer. Because it seemed otherwise so promising, I've spent some time investigating these performance issues, and have a couple of suggestions for improvements that would address them. (Obviously these suggestions are for post-1.33 release.)
I can second Kim's experience; I've written a marshalling framework myself, and found that the frequent creation and deletion of Boost.Serialization archives became a bottleneck in the whole system. Regards, Jarl.

Jarl Lindrud wrote:
Kim Barrett <kab <at> irobot.com> writes:
I've been experimenting with the Boost.Serialization library, to decide whether to use it in a project I'm working on. If it weren't for some performance issues, the decision of whether to use this library would be a no-brainer. Because it seemed otherwise so promising, I've spent some time investigating these performance issues, and have a couple of suggestions for improvements that would address them. (Obviously these suggestions are for post-1.33 release.)
I can second Kim's experience; I've written a marshalling framework myself, and found that the frequent creation and deletion of Boost.Serialization archives became a bottleneck in the whole system.
By the way, I'm still interested in intergrating your code into my interfaces library. I haven't had time to work on it for about three months, but I'm looking forward to starting to work on it in a few weeks. Jonathan

Jonathan Turkanis <technews <at> kangaroologic.com> writes:
By the way, I'm still interested in intergrating your code into my interfaces library. I haven't had time to work on it for about three months, but I'm looking forward to starting to work on it in a few weeks.
Jonathan
Sounds good, I'm all for it. I'll be glad to answer any questions, too, so feel free to ask :) Jarl.
participants (5)
-
David Abrahams
-
Jarl Lindrud
-
Jonathan Turkanis
-
Kim Barrett
-
Robert Ramey