
Robert Ramey wrote:
Matthias Troyer wrote:
Dave's design does not change anything in your archives or serialization functions, but only adds an additional binary archive using save_array and load_array.
Hmm - that's not the way I read it. I've touched on this in another post.
As already explained elsewhere, you mis-read it. The archive was in a sub-namespace. Perhaps it would have been clearer if Dave had used a completely different namespace name, boost::array_serialization_extensions, or boost::not_the_serialization_namespace ?.
c) The premise that one will save a lot of coding (see d) above) compared to to the current method of overloading based on the pair of archive/type is overyly optimistic.
Actually I have implemented two new archive classes (MPI and XDR) which can profit from it, and it does save lots of code duplication. All of the serialization functions for types that can make use of such an optimization can be shared between all these archive types. In addition formats such as HDF5 and netCDF have been mentioned, which can reuse the *same* serialization function to achieve optimal performance.
There is nothing "optimistic" here since we have the actual implementations, which show that code duplication can be avoided.
OK - I can really only comment on that which I've seen.
Are we talking at cross-purposes here? Matthias is talking about sharing *serialization* functions. That is, for each data type, there is only *one* serialization function that calls load/save_array (or whatever the array hook function is called...). You seem to be disputing the code duplication issue by saying that different *archives* will not typically(*) be able to share implementations of array processing. This I completely agree with. But that is a completely separate to the number of *serialization* functions that need to be written. Matthias, it might help if you show an example of a serialization function for some vector type, and the implementation of the array processing for the MPI and XDR archives, do demonstrate the orthogonality of the serialization vs archive ideas. (*) of course there are some counter-examples. That is the idea for deriving one archive from another, is it not? [snip]
As you can see the overhead of the serialization library (less than 2%) is insignificant compared to the cost of doing lots of individual insertion operations into the buffer instead of one big one. The bottleneck is thus clearly the many calls to save() instead of a single call to save_array().
Well, this is interesting data. the call to save() resolves inline to a call to std::vector get element and stuffing the value into the buffer. I wonder how much of this in std::vector and how much is in the save to the buffer?.
As described here http://lists.boost.org/Archives/boost/2005/11/97156.php the effect of using a custom buffer versus a buffer based around vector::push_back is exactly a factor 2, irrespective of cache effects. Matthias' benchmark showed that the time taken to serialize an array into a vector buffer is almost the same as the time taken to push_back the array in a loop (ie. the serialization library itself introduces negligable overhead in this case). Thus, a serialization archive based on the same buffer I used in my benchmark should achieve the same factor 2 speedup. Note that the speedup using save_array was of the order of 30, so that, even with a factor 2 speedup from using an optimized buffer, save_array would still be 15 times faster! (This is using the first set of data. Using the set for small arrays would only give a modest factor 3x improvement for save_array versus a cusom buffer archive). Cheers, Ian