[Serialization] Speedding up client-server communication
Hi there, I am in the process of speeding up communication between a server and its clients. Communication involves serialized class data. Messages can be as large as 100 kilobytes. I have done some measurements which have shown that, in a cluster with Gigabit networking, most overhead of the parallelisation seems to come from the Broker infrastructure and the process of (de-)serialization. Network latency and/or bandwidth seems to play only a minor role in this environment. Hence, apart from optimizing my broker, I'm looking for ways to optimize the serialization process, as used in my application. As messages are discarded as soon as they reach the recipient, versions of serialized data do not play an important role. Apart from decreasing the frequency and size of data exchanges and using binary transfers in a homogeneous environment, another way of speeding up the application seems to be the "Class Information" section discussed e.g. in http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/special.html . Also, BOOST_IS_BITWISE_SERIALIZABLE(my_class) might help. One question I have about this is whether a std::vector<POD> would be accessible to this optimization ? I use quite a few of them. Do you have further suggestions for ways of influencing the Boost.Serialization library ? Thanks and Best Regards, Ruediger
Ruediger Berlich wrote:
Hi there,
I am in the process of speeding up communication between a server and its clients. Communication involves serialized class data. Messages can be as large as 100 kilobytes.
I have done some measurements which have shown that, in a cluster with Gigabit networking, most overhead of the parallelisation seems to come from the Broker infrastructure and the process of (de-)serialization. Network latency and/or bandwidth seems to play only a minor role in this environment.
Hence, apart from optimizing my broker, I'm looking for ways to optimize the serialization process, as used in my application. As messages are discarded as soon as they reach the recipient, versions of serialized data do not play an important role.
Apart from decreasing the frequency and size of data exchanges and using binary transfers in a homogeneous environment, another way of speeding up the application seems to be the "Class Information" section discussed e.g. in http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/special.html
If speed is paramount and you don't need versioning, you can mark your classes accordingly. If you're using this for passing data over the network and you have control over both ends, versioning should not be necessary. That could make a huge difference. Also if you turn off tracking, that could make big difference as well.
Also, BOOST_IS_BITWISE_SERIALIZABLE(my_class) might help. One question I have about this is whether a std::vector<POD> would be accessible to this optimization ? I use quite a few of them.
collections of types marked with BOOST_IS_BITWISE_SERIALIZABLE(my_class) serialized with the binary_?archive should be pretty much as fast as any other method. On my system when I disassemble these types, one can verify that it boils down the minimum code possible- at least with my MSVC compiler. One large bottle neck I've found is in the usage of standard stream buffer. Replacing this with a custom implemenation might help.
Do you have further suggestions for ways of influencing the Boost.Serialization library ?
see all of the above. Carefully read the serialization traits section of the documentation. I should say that I'm very skeptical of attempts to make such improvements without subjecting one's own use cases to a profiler. The library has a section "performance" which has a few tests which work with the bjam build/test system. You should add your own tests here and run them before doing too much.
Thanks and Best Regards, Ruediger
Robert Ramey
Ruediger Berlich wrote:
Do you have further suggestions for ways of influencing the Boost.Serialization library ?
I have a very similar scenario where version tracking does not play an
important role. In this case it is possible to create a single archive
and stream per, say, connection as opposed to creating them for every
object (de)serialized. This was a huge performance boost in my case.
What I do is I create a Serializer class for every connection. All the
(de)serialization is done through it. Note that for this to work all
objects (de)serialized should have tracking turned off.
template
Juraj Ivancic wrote:
Ruediger Berlich wrote:
Do you have further suggestions for ways of influencing the Boost.Serialization library ?
I have a very similar scenario where version tracking does not play an important role. In this case it is possible to create a single archive and stream per, say, connection as opposed to creating them for every object (de)serialized. This was a huge performance boost in my case.
What I do is I create a Serializer class for every connection. All the (de)serialization is done through it. Note that for this to work all objects (de)serialized should have tracking turned off.
In my view this is the correct approach for high performance considerations. Tracking is important to have for the most general case of saving state but it conflicts with using serialization for some applications. I've been considering ways to make serialization more useful in these types of scenarios.
This could be improved further:
1)By replacing stringstreams with something more lightweight.
In small experiments, I've found this to make a big difference. And it's not that hard as one needs only support a subset of the hole streambuf functionality.
2)Ideally these serialize methods should have some kind of compile time assertion that object has serialization tracking turned off.
Note that the default tracking attribute is "selective" which means that tracking is only on if an object is serialized through apointer somewhere in the program. So if you never serialize through a pointer, the default should be just fine. Anyone who serializes through a pointer, especially one to a virtual base class, must realize that this requires a lot more processing to work properly and is fundamentally incompatible with performance optimization. Note that "implementation level" is also important. The default is that the class id is looked up in a table to check to see if versioning must be supported. lowering the "implementation level" to "object serialization" (hmm I don't remember - better double check). Means that this class information is not checked. This speeds things up, but will mean that trying to load old archives could be a problem. For MPI type applications this shouldn't be a problem so it should also be considered.
I'm not quite sure whether this approach is fully supported by the boost::serialization interface and if this could could be broken by future versions. OTOH it has been working well with last 6-7 boost releases (1.41 is the last one I tested).
It seems to me that you've been using the system as I intended it to be used. I'm considering enhancement of the library to address situations like this. This takes me quite a while for a number of reasons. a) it's way too easy to make a change which ripples through in such a way that it complicates the library beyond usability. b) it's way too to make changes which break old archives. c) it's very helpful to get feed back from real users such as yourself with real problems to test ideas for extenstions as "thought experiments" to see if such ideas really would help without violating the considerations above. By being "conservative" in this way the library has steadily improved to be thread-safe, to handle dynamic loading/unloading of classes and related serialization code, and to have more "concept checking" to help detect mis-usages of the library. This is all due to getting complaints about particular use cases. The improvements have mostly been introduced without breaking old code or archives. Robert Ramey
HTH
Robert Ramey wrote:
Juraj Ivancic wrote:
This could be improved further:
1)By replacing stringstreams with something more lightweight.
In small experiments, I've found this to make a big difference. And it's not that hard as one needs only support a subset of the hole streambuf functionality.
I am not really a performance freak, but creating and destroying archives and streams really turned out to be bottleneck so I came up with this serialization technique. std::stream overhead never really bothered me so I left it there to this day (with a todo on top of it for some sunny day when there will be nothing else to do :) ).
Note that "implementation level" is also important. The default is that the class id is looked up in a table to check to see if versioning must be supported. lowering the "implementation level" to "object serialization" (hmm I don't remember - better double check). Means that this class information is not checked. This speeds things up, but will mean that trying to load old archives could be a problem. For MPI type applications this shouldn't be a problem so it should also be considered.
Thanks for the reminder. I am a bit rusty about this particular piece of coding. I wrote this about 2 years ago, and added myself a habit of adding BOOST_CLASS_IMPLEMENTATION(X, boost::serialization::object_serializable) BOOST_CLASS_TRACKING (X, boost::serialization::track_never ) for every class used with this code, gradually forgetting its original purpose. I do remember however that producing above two lines of code did cause me quite a bit of headache and debugging sessions at the time. The thing made me think I was doing something I was not supposed to was this: stringstream is; IArchive ia( is ); Object object1; Object object2; ia << object1 sendData( is ); // sends version + object1 data ia << object2; sendData( is ) // sends only object2 data On the other side of the network there were two deserializers: stringstream os1; stringstream os2; OArchive o1( os1 ); OArchive o2( os2 ); Object object1; Object object2; os1 << receivedData( 1 ); // contains version + object1 data o1 >> object1; os2 << receivedData( 2 ); // contains only object2 data o2 >> object2; The problem was that version data was written only once per archive and class and this caused problems when deserializing, as an exception was thrown when object2 was deserialized as Oarchive o2 expected version information on the stream. So I started using object_serializable class implementation which avoided versioning entirely, but I was left with a bitter taste in my mouth that I was doing something inappropriate. I'm glad to hear this is not the case although I do think it's a bit awkward. I'd really prefer that archives did not possess 'this is the first object of this class' knowledge as that leaves one wondering if they were ever intended to be reused anyway.
Juraj Ivancic wrote:
Robert Ramey wrote:
Juraj Ivancic wrote:
This could be improved further:
1)By replacing stringstreams with something more lightweight.
In small experiments, I've found this to make a big difference. And it's not that hard as one needs only support a subset of the hole streambuf functionality.
I am not really a performance freak, but creating and destroying archives and streams really turned out to be bottleneck so I came up with this serialization technique. std::stream overhead never really bothered me so I left it there to this day (with a todo on top of it for some sunny day when there will be nothing else to do :) ).
Hmm- I thought that performance was the concern raised here. If it is, you should definately look into replacing this. It will likely have a bigger effect than the other changes you made. Personally I don't believe that performance issues can every be addressed without subjecting the code to profiling. The serialization library has methods for doing this. Robert Ramey
participants (3)
-
Juraj Ivančić
-
Robert Ramey
-
Ruediger Berlich