
To summarize how we arrived here. ================================= a) Mattias augmented binary_?archive to replace element by serialization of primitive types with save/load binary for C++ arrays, std::vectors and boost::val_array. This resulted in a 10 x speed up of the serialization process. b) From this it has been concluded that binary archives should be enhanced to provide this facility automatically and transparently to the user. c) The structure of the library an the documentation suggest that the convenient way to do this is to specify an overload for each combination of archive/type which can benefit from special treatment. d) The above (c) is deemed inconvenient because it has been supposed that many archive classes will share a common implementation of load/save array. This would suggest that using (c) above, though simple and straight forward, will result in code repetition. e) So it has been proposed binary_iarchive be re-implemented in the following way iarchive - containg default implementation of load_array binary_iarchive - ? presumablu contains implementaion of load_array in terms of currently defined load_binary Its not clear whether all archives would be modified in this way or just binary_iarchive. The idea is that each type which can benefit from load_array can call it and the version of load_array corresponding to that particular archive will be invoked. This will require i) the serialization functino for types which can benefit from some load_array function would call this. ii) Only a small number of load_array functions would have ot be written for each archive. So the number of special functions to be written would be One for each type which might use load_array and "one" for each archive. Problems with the Design ======================== a) It doesn't address the root cause of "slow" performance of binary archives. The main problem is that it doesn't address the cause of the 10 X speed up. Its a classic case of premature optimization. The 10x speed up was based on test program. For a C++ array, the test boils down to replacing 10,000 invocations for stream write(..) with one invocation of stream write 10,000 times longer. Which is of course faster. Unfortunatly, the investigation stopped here with the conclusion that the best way to improve performance is to reduce the number of stream write calls in a few specific cases. As far as I know, the test was never profiled so I can't know for sure, but past experience and common sense suggest that stream write is a costly operation for binary i/o. This design proposal (as well as the previous one) fail to address this so its hard to take it as a serious proposal to speed up native binary serializaition. The current binary archives are implemented in terms of stream i/o. This was convenient to do so and has worked well. But basing the implemention on streams results in a slow implemenation. The documentation explicitly states that archives do not have to be implemented in terms of streams. The binary archives don't use any of the stream interface other than read(.. write(.. so it would be quite easy to make another binary archive which isn't based on stream i/o. It could be based on fread/fwrite. Given that the concern of the proposal of the authors is to make the library faster for machine to machine communication and the desired protocols (MPI) don't use file i/o, the fastest would be just a buffer say buffer_archve which doesn't do any i/o at all. It would just fill up a user specified buffer whose address was handed at buffer_archive construction time. This would totally eliminate stream i/o from the equation. Note that this would be easy to do. Just clone binary_archive, and modify it so it doesn't use a stream. (probably don't want to derive from basic_binary_archive). I would guess that that would take about a couple of hours at most. I would be surprised to see if the 10x speed up still exists with this "buffered_archive". note that for the intended application - mpi communication - some archive which doesn't use stream i/o have to be created anyway. b) re-implemenation of binary_archive in such a way so as not to break existing archives would be an error prone process. The switching between new and old method "should" result in exactly the same byte sequence. But it could easily occur that a small subtle change might render archives create under the previous binary_archive unreadable. c) The premise that one will save a lot of coding (see d) above) compared to to the current method of overloading based on the pair of archive/type is overyly optimistic. This is explained in Peter Dimov's post here: http://lists.boost.org/Archives/boost/2005/11/97089.php I'm aware this is speculative. I haven't investigated MPI, XDR and other's enough to know how much code sharing is possible. It does seem that there will be no sharing with the the "fast binary archive" of the previous submission. From the short descriptions of MPI I've seen on this list along with my cursory investigation of XDR, I'm doubtful that there is any sharing there either. Conclusions =========== a) The proposal suffers from "premature optimization". A large amount of design effort has been expended on areas which are likely not the source of observed performance bottlenecks. b) The proposal suffers from "over generalizaton". The attempt to generalize results in a much more complex system. Such a system will result in a net loss of conceptual integregrity and implementation transparancey. The claim that this generalization will actually result in a reduction of code is not convincing. c) by re-implementing a currently existing and used archive, it risks creating a maintainence headache for no real benefit. Suggestions =========== a) Do more work in finding the speed bottlenecks. Run a profiler. Make a buffer based non-stream based archive and re-run your tests. b) Make your MPI, XDR and whatever archives. Determine how much opportunity for code sharing is really available. c) If you still believe your proposal has merit, make your own "optimized binary archive". Don't derive from binary_archive but rather from common_?archive or perhaps basic_binary_archive. In this way you will have a totally free hand and won't have to achieve consensus with the rest of us which will save us all a huge amount of time. Robert Ramey