
On Nov 12, 2005, at 9:33 PM, Robert Ramey wrote:
I've been perusing the files you checked, your example, and this list.
Summary ======= First of all, a little more complete narrative description as to what the submission was intended to acommplish and how it would change the way the user uses the library would have been helpful. I'm going to summarize here what I think I understand about this. Please correct me if I get something wrong.
a) a new trait is created.
template <class Archive, class Type> struct has_fast_array_serialization : public mpl::bool_<false> {};
Yes, I wrote that in my original e-mail
b) new functions save_array and load_array are implemented in those archives which have the above trait set to true. In this case the following is added to the binary_iarchive.hpp file. The effect is that this trait will return true when a fundamental type is to be saved/loaded to a binary_iarchive.
// specialize has_fast_array_serialization // the binary archive provides fast array serialization for all fundamental types template <class Type> struct has_fast_array_serialization<binary_iarchive,Type> : public is_fundamental<Type> {};
This is just the example for binary archives. The set of types for which direct serialization of arrays is possible is different from archive to archive. E.g. MPI archives support array serialization for all PODs that are not pointers and do not contain pointer members.
Some Observations ================= Inmediatly the following come to mind.
a) I'm not sure about the portability of enable_if. Would this not break the whole serialization system for those compilers which don't support it?
I mentioned this issue in my initial e-mail, and if there are compilers that are supported by the serialization library but do not support enable_if, we can replace it by tag dispatching.
b) what is the the point of save_array? why not just invoke save_binary directly?
Because we might want to do different things than save_binary. Look back at the thread. I gave four different examples.
c) The same could be said for built arrays - just invoke save_binary
same as above.
d) There is no provision for NVP in the non-binary version above while in the binary version there is NVP around count. Presumably, these are oversights.
The count is not saved by save_array, but separately, and there the same code as in your version is used. Hence, the count is also stored as an NVP.
e) The whole thing isn't obvious and its hard to follow. It couples the implementation code in i/o serializer.hpp to a specific kind of archive adding another dimension to be considered while understanding this thing.
The real problem is that you implement the serialization of arrays in i/o serializer.hpp now. That's why I patched it there. The best solution would be to move array serialization to a separate header.
f) What about bitwise serializble types which aren't fundamental? That is structures which don't have things like pointers in them. They have the same opportunity but aren't addressed. If this is a good idea for fundamental types, someone is going to want to do them as well - which would open up some new problems.
I mentioned above that this is just what we do for MPI archives now. This mechanism can easily be extended to binary archives: First you introduce a new traits class template <class Type> struct is_bitwise_serializable : public is_fundamental< Type > {}; and then use this traits in the definition of template <class Type> struct has_fast_array_serialization<binary_iarchive,Type> : public is_bitwise_serializable <Type> {};
g) I don't see endian-ness addressed anywhere. I believe that protocols such as XDR and MPI are designed to transmit binary data between heterogenious machines. Suppose I save an array of ints as a sequence of raw bits on an intel type machine. Then I use load_binary to reload the same seqence of bits into sparc based machine. I won't get back the same data values. So either either the method will have to be limited to collections of bytes or some extra machinery would have to be added to conditionally to the endian translation depending on the source/target machine match/ mismatch.
That's is EXACTLY the reason why I propose to call save_array instead of save_binary. In a portable binary archve, save_array and load_array will take care of the endianness issue. XDR, CDR, MPI, PVM, HDF and other libraries do it just like that.
f) Similar issues confront bitwise serialization of floats and doubles. I believe the "canonical" format for floats/doubles is ieee 80 bit. (I think that's what XDR uses - I could be wrong.) I believe that many machines store floats as 32 bit word and doubles as 64 bit words. I doubt they all are guarenteed to have the same format as far as exponent, sign and representation of value. So that's something else to be addressed. Of course endian-ness plays into this as well.
Same answer as above. IEEE has 32 and 64 bit floating point types, and they are used also by XDR and CDR. As far as I know the 80 bit type is an Intel extension. Again you see that save_binary and load_binary will not do the trick. That's why we need save_array and load_array.
g) I looked at the "benchmark" results. I notice that they are run with -O2 on the gcc compiler. Documentation for the gcc compiler command line specifies that this optimization level should does not enable automatic inlining for small functions. This is a crucial optimization to be effective in the serialization library. The library is written with the view that compilers will collapse inline code when possible. But this happens only in the gcc compiler when the -O3 optimization switch is used. Furthermore, with this compiler, it might be necessary to also specify max-inline-insns-recursive-auto switch. to gain maximum performance on boost type code. This latter is still under investigation.
You can drop the double quotes around the "benchmark". I have been involved in benchmarking of high performance computers for 15 years, and know what I'm doing. I have also run the codes under -O3, with the same results. Regarding the inlining: -O2 inlines all the functions that are declared as inline. -O3 in addition attempts to inline small functions that are not declared inline. I surely hope that all such small functions in the library are declared inline, and the fact that there is no significant difference in performance
h) my own rudimentary benchmark (which was posted on this list) used 1000 instances of a structure which contained all C++ primitive data types plus an std::string made up of random characters. It was compiled as a boost test and built with bjam so it used the standard boost options for release mode. It compared timings against using raw stream i/o. Timings for binary_archive and standard stream i/o where comparable. I'm still working on this test. The problem is that standard stream i/o uses text output/input. Of course no one for whom performance is an issue would do this so I have to alter my timing test to use binary i/o to the standard stream as a comparison. But for now, I'm comfortable in asserting that there is not a large performance penalty using serialization as opposed to "rolling your own". As an aside, the test executable doing the same test for 3 different types of archives and all primitive data types only came to 238K. So there isn't a significant code bloat issue either.
Nobody who cares for performance would use text based I/O. All your benchmark shows is that the overhead of the serialization library is comparable to that of text/based I/O onto a hard disk. For this purpose you are right, the overhead can be ignored. On the other hand, my benchmark used binary I/O into files and into memory buffers, and that's where the overhead of the serialization library really hurts. A 10x slowdown is horrible and makes the library unusable for high performance applications.
i) somehow I doubt that this archive type has been tested with all the serialization test suite. Instructions for doing so are in the documenation and the serialization/test directory includes batch files for doing this with one's own archives. Was this done? What where the results? With which compiler? It costs nothing to do this.
Just ask if you had a doubt. The short answer is "I have done this". After adding the fast array serialization to the binary and polymorphic archives, I ran all your regression tests, without any problem (using gcc 4 under MacOS X).
end of observations ===================
Admitedly, this is only a cursory examination. But its more than enough to make me skeptical of the whole idea. I you want, I could expand upon my reasons for this view, but I think they should be obvious.
I will stop this e-mail here since as you can see there is nothing to be skeptical about. Actually I had already replied to all these issues before. I would appreciate if you read my replies instead of making the same statements over and over again without considering my arguments. The endianness issue you raise above is, as you can see from my reply, not a problem in my approach, but instead a killer argument for your proposal to use save_binary instead. I will reply to your alternative proposal in a seocnd e-mail. Matthias