
Hi, I am using the serialization library in my project and it has been functioning perfectly thus far. However, when I've scaled up the usage requirements, I had hit a very odd problem. Let me explain the use case. Serialization library is used in a DFS framework, handling large data structures (in terms of size, not complexity) such as std::vector<char> of size 1MB and above. The framework consists of two major components: Chunkserver (server) and a Client. Files are chunked, wrapped in a class, serialized and sent over the wire. Same things is done on the server side, but in a reverse order. The actual binary data is stored in a vector (previously, I've tried using string instead, but I had some issues with it and Robert suggested using some an alternative container). When I was depositing large files to Chunkserver, disk utilization was almost non-existent, whereas the CPU was maxed out. It is important to realize that this problem occurred only on the server side, not client. Upon further investigation using gprof I have concluded that the bottleneck was in the serialization library (it also may be the case that I am misusing it). According to the profiler, above 97% of the CPU time was spent in a singe function. Profiler results can be found at this address: http://www.cs.washington.edu/homes/balkan/gprof.txt For the reference, the signature of that function is: boost::serialization::serialize_adl< boost::archive::text_iarchive, std::vector...> and I am using text archive. I as mentioned before, this issue only occurs on a server side. I would appreciate if anybody could explain why this is happening and more importantly how to circumvent the issue. Thank you! Vjeko