[serialization] performance issue when deserializing long string in a xml file

Hello, Since I discovered the boost serialization library, I'm using it in every single project I'm working on. I just cant figure how I was doing before. Makes life so much easier. I usually serialize all my objects in XML format. On my current project, I have a couple of big 500x500 complex matrices (ublas) that need to be serialized in my files. Serializing the matrices with the default boost functions results quite big files. So I've implemented the following: - I serialize the matrix in memory in binary format - I compress the data with the zlib - I encode the resulting data in base64 then copy it in a string - And I serialize the resulting string in my file It's working great but I've found out that deserializing this long string takes a huge amount of time. It bothers me and I don't really understand why. As I've got a filter reporting the bytes read to watch the progress. It seems that the time is spent after that all data has been read ! It confuses me. Has anybody experienced this or have any idea to solve my problem ? Thanks, Jean-Charles

On 12/11/2012 12:12 PM, jean-charles.quillet@alyotech.fr wrote:
Hello,
Since I discovered the boost serialization library, I'm using it in every single project I'm working on. I just cant figure how I was doing before. Makes life so much easier.
I usually serialize all my objects in XML format. On my current project, I have a couple of big 500x500 complex matrices (ublas) that need to be serialized in my files.
Serializing the matrices with the default boost functions results quite big files. So I've implemented the following:
- I serialize the matrix in memory in binary format - I compress the data with the zlib - I encode the resulting data in base64 then copy it in a string - And I serialize the resulting string in my file
It's working great but I've found out that deserializing this long string takes a huge amount of time. It bothers me and I don't really understand why. As I've got a filter reporting the bytes read to watch the progress. It seems that the time is spent after that all data has been read ! It confuses me.
Has anybody experienced this or have any idea to solve my problem ? Thanks,
You can avoid some of the above multiple allocations, copies and traversals of data by composing the appropriate combination of boost::iostream filters and sinks/sources. Using the text archive would certainly reduce the overall archive size, avoid the need for base64 converision and simplify parsing during de-serialization. Profiling the operation should help otherwise. Jeff

De : De la part de Jeff Flinn Envoyé : mardi 11 décembre 2012 19:56
You can avoid some of the above multiple allocations, copies and traversals of data by composing the appropriate combination of boost::iostream filters and sinks/sources.
I'm not sure what you mean here. I've got one file_source and one multichar_input_filter to watch the progress. But it doesn't seem to come from here when I desactivate the filter, it is as long as before.
Using the text archive would certainly reduce the overall archive size, avoid the need for base64 converision and simplify parsing during de-serialization.
I like the XML archive as it makes it easy to edit if needed. This is not true with the text archive. If I wanted something uneditable I'd rather use the binary archive which serialize my matrix very fast.
Profiling the operation should help otherwise.
I'm not sure how to achieve that. On linux I'm using gprof, but unfortunatly I'm working on windows. Any free profiling tool I can use on Windows ? Anyway, when I go step by step in the serialization code all the time is actually spent in the serialization line: ar & BOOST_SERIALIZATION_NVP(matrix_str); And my filter report that almost all data are read from the file. If I stop my debugger, I end up in the middle of boost spirit functions. Seems to me that it has something to do with the parsing...

I've just discovered the make_binary_object function which does pretty much what I was doing (binary serialization and base64 encoding) but also split the string on many lines and it works fast enough now :-) Problem solved, Cheers, Jean-Charles
participants (2)
-
jean-charles.quillet@alyotech.fr
-
Jeff Flinn