[Boost][Serialization] CPU bottleneck
Hi, I am using the serialization library in my project and it has been functioning perfectly thus far. However, when I've scaled up the usage requirements, I had hit a very odd problem. Let me explain the use case. Serialization library is used in a DFS framework, handling large data structures (in terms of size, not complexity) such as std::vector<char> of size 1MB and above. The framework consists of two major components: Chunkserver (server) and a Client. Files are chunked, wrapped in a class, serialized and sent over the wire. Same things is done on the server side, but in a reverse order. The actual binary data is stored in a vector (previously, I've tried using string instead, but I had some issues with it and Robert suggested using some an alternative container). When I was depositing large files to Chunkserver, disk utilization was almost non-existent, whereas the CPU was maxed out. It is important to realize that this problem occurred only on the server side, not client. Upon further investigation using gprof I have concluded that the bottleneck was in the serialization library (it also may be the case that I am misusing it). According to the profiler, above 97% of the CPU time was spent in a singe function. Profiler results can be found at this address: http://www.cs.washington.edu/homes/balkan/gprof.txt For the reference, the signature of that function is: boost::serialization::serialize_adl< boost::archive::text_iarchive, std::vector...> and I am using text archive. I as mentioned before, this issue only occurs on a server side. I would appreciate if anybody could explain why this is happening and more importantly how to circumvent the issue. Thank you! Vjeko
I've reviewed the profile and found it interesting. Have you tried binary_?archive. You would find it much, much, faster in this this case for a variety of reasons. To maintain portability of text files, the library has to manipulate each character sent. This takes a lot of time and it adds up. You might experiment with creating a temporary array, wrapping in binary_obect and sending it that way. But still, the very fastest will be to use binary_?archive. Robert Ramey Vjekoslav Brajkovic wrote:
Hi,
I am using the serialization library in my project and it has been functioning perfectly thus far. However, when I've scaled up the usage requirements, I had hit a very odd problem.
Let me explain the use case. Serialization library is used in a DFS framework, handling large data structures (in terms of size, not complexity) such as std::vector<char> of size 1MB and above. The framework consists of two major components: Chunkserver (server) and a Client. Files are chunked, wrapped in a class, serialized and sent over the wire. Same things is done on the server side, but in a reverse order. The actual binary data is stored in a vector (previously, I've tried using string instead, but I had some issues with it and Robert suggested using some an alternative container).
When I was depositing large files to Chunkserver, disk utilization was almost non-existent, whereas the CPU was maxed out. It is important to realize that this problem occurred only on the server side, not client. Upon further investigation using gprof I have concluded that the bottleneck was in the serialization library (it also may be the case that I am misusing it). According to the profiler, above 97% of the CPU time was spent in a singe function. Profiler results can be found at this address:
http://www.cs.washington.edu/homes/balkan/gprof.txt
For the reference, the signature of that function is: boost::serialization::serialize_adl< boost::archive::text_iarchive, std::vector...> and I am using text archive. I as mentioned before, this issue only occurs on a server side.
I would appreciate if anybody could explain why this is happening and more importantly how to circumvent the issue.
Thank you!
Vjeko
On Sun, 14 Sep 2008, Robert Ramey wrote:
I've reviewed the profile and found it interesting.
Have you tried binary_archive. You would find it much, much, faster in this case for a variety of reasons.
After modifying the code to use binary_archive, everything runs as expected. The reason why I posed this email was to make sure that this is not a bug within the library.
To maintain portability of text files, the library has to manipulate each character sent. This takes a lot of time and it adds up. You might experiment
I see. That explains everything.
with creating a temporary array, wrapping in binary_obect and sending it that way. But still, the very fastest will be to use binary_?archive.
I was not aware of binary_object. Thanks for pointing this out. I will consult the documentation
Robert Ramey
Thanks a bunch for helping out in such short notice. I really appreciate it. Best! ;) -vjeko
Vjekoslav Brajkovic wrote:
Hi,
I am using the serialization library in my project and it has been functioning perfectly thus far. However, when I've scaled up the usage requirements, I had hit a very odd problem.
Let me explain the use case. Serialization library is used in a DFS framework, handling large data structures (in terms of size, not complexity) such as std::vector<char> of size 1MB and above. The framework consists of two major components: Chunkserver (server) and a Client. Files are chunked, wrapped in a class, serialized and sent over the wire. Same things is done on the server side, but in a reverse order. The actual binary data is stored in a vector (previously, I've tried using string instead, but I had some issues with it and Robert suggested using some an alternative container).
When I was depositing large files to Chunkserver, disk utilization was almost non-existent, whereas the CPU was maxed out. It is important to realize that this problem occurred only on the server side, not client. Upon further investigation using gprof I have concluded that the bottleneck was in the serialization library (it also may be the case that I am misusing it). According to the profiler, above 97% of the CPU time was spent in a singe function. Profiler results can be found at this address:
http://www.cs.washington.edu/homes/balkan/gprof.txt
For the reference, the signature of that function is: boost::serialization::serialize_adl< boost::archive::text_iarchive, std::vector...> and I am using text archive. I as mentioned before, this issue only occurs on a server side.
I would appreciate if anybody could explain why this is happening and more importantly how to circumvent the issue.
Thank you!
Vjeko
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Robert Ramey
-
Vjekoslav Brajkovic