[serialization] Performance issues serializing lots of objects

Hi ! We've been successfully using the boost serialization library on various industrial projects. One one occasion though, we had to make a modification in the source code in order to improve the performance of the process : The library uses a set (in basic_oarchive.cpp) to keep track of the serialized objects. In our project, a very high number of relatively small objects are serialized through pointers and references to each other, and the number of elements in the set could rocket to several millions. This led to memory fragmentation and very high performance hits after the first serialization (on a Windows XP platform at least). We just added a boost::fast_pool_allocator on the set of serialized pointers. The set is cleared in the destructor of the archive, after which release_memory() is called on the pool. This simple patch could reduce the time to perform a whole save from several minutes to a few seconds. Best regards, -- Adrien Gervaise MasaGroup SCI

Adrien Gervaise wrote:
This simple patch could reduce the time to perform a whole save from several minutes to a few seconds.
One question - "could reduce..." how much time did it actually save? Robert Ramey

On Thu, 18 Aug 2005 19:05:13 +0200, Robert Ramey <ramey@rrsd.com> wrote:
One question - "could reduce..." how much time did it actually save?
Here are the results of some tests we performed when investigating the problem : The data set can be considered a random graph of objects. It is about 30MB large and is serialized several times in a row. Save number Without the pool With the pool 1 00:00:12 00:00:12 2 00:00:31 00:00:12 3 00:00:54 00:00:13 4 00:00:56 00:00:13 5 00:00:58 00:00:13 6 00:01:03 00:00:13 7 00:01:03 00:00:13 8 00:01:02 00:00:13 On the real system, the size of the saved data can go up to a 100MB. On a side note, the stack size of the application had to be increased to 15MB in order to avoid stack overflows during the serialization process. -- Adrien Gervaise MasaGroup SCI

Adrien Gervaise wrote:
On Thu, 18 Aug 2005 19:05:13 +0200, Robert Ramey <ramey@rrsd.com> wrote:
One question - "could reduce..." how much time did it actually save?
Here are the results of some tests we performed when investigating the problem : The data set can be considered a random graph of objects. It is about 30MB large and is serialized several times in a row.
Save number Without the pool With the pool 1 00:00:12 00:00:12 2 00:00:31 00:00:12 3 00:00:54 00:00:13 4 00:00:56 00:00:13 5 00:00:58 00:00:13 6 00:01:03 00:00:13 7 00:01:03 00:00:13 8 00:01:02 00:00:13
Can you try the same test with dlmalloc? http://gee.cs.oswego.edu/dl/html/malloc.html

The library uses a set (in basic_oarchive.cpp) to keep track of the serialized objects. In our project, a very high number of relatively small objects are serialized through pointers and references to each other, and the number of elements in the set could rocket to several millions. This led to memory fragmentation and very high performance hits after the first serialization (on a Windows XP platform at least).
I'm assuming you are using Visual Studio - In our app we call _set_sbh_threshold(1016) at the very beginning. This speeds up memory allocations noticeably. The non-standard function is declared in malloc.h -- /Johan.
participants (4)
-
Adrien Gervaise
-
Johan Lindvall
-
Peter Dimov
-
Robert Ramey