
I've been perusing the files you checked, your example, and this list. Summary ======= First of all, a little more complete narrative description as to what the submission was intended to acommplish and how it would change the way the user uses the library would have been helpful. I'm going to summarize here what I think I understand about this. Please correct me if I get something wrong. a) a new trait is created. template <class Archive, class Type> struct has_fast_array_serialization : public mpl::bool_<false> {}; b) new functions save_array and load_array are implemented in those archives which have the above trait set to true. In this case the following is added to the binary_iarchive.hpp file. The effect is that this trait will return true when a fundamental type is to be saved/loaded to a binary_iarchive. // specialize has_fast_array_serialization // the binary archive provides fast array serialization for all fundamental types template <class Type> struct has_fast_array_serialization<binary_iarchive,Type> : public is_fundamental<Type> {}; c) A user with the following class my_class { valarray<int> m_a; template<class Archive> void save(Archive & ar, const unsigned int){ ar << m_a } ... }; d) In order for this to work, data types which want to exploit bitwise serialization have to look like the following - taken from valarray - serialization. This would have to be applied to serialization template<class Archive, class U> inline boost::disable_if<boost::archive::has_fast_array_serialization<Archive,U>
::type save( Archive & ar, const STD::valarray<U> &t, const unsigned int /* file_version */) { const boost::archive::container_size_type count(t.size()); ar << BOOST_SERIALIZATION_NVP(count); for (std::size_t i=0; i<t.size();+i) ar << t[i]; }
// with fast array serialization template<class Archive, class U> inline boost::enable_if<boost::archive::has_fast_array_serialization<Archive,U>
::type save( Archive & ar, const STD::valarray<U> &t, const unsigned int /* file_version */) { const boost::archive::container_size_type count(t.size()); ar << BOOST_SERIALIZATION_NVP(count); if (count) ar.save_array(boost::detail::get_data(t),t.size()); }
Presumably this treatment would be applied to: std::vector ublas::vector ublas::matrix mtl::vector blitz::array custom_lib::fast_matrix ... as well as others. For built-in arrays, the core library is tweaked to include code similar to the above for built-in arrays. Some Observations ================= Inmediatly the following come to mind. a) I'm not sure about the portability of enable_if. Would this not break the whole serialization system for those compilers which don't support it? b) what is the the point of save_array? why not just invoke save_binary directly? c) The same could be said for built arrays - just invoke save_binary d) There is no provision for NVP in the non-binary version above while in the binary version there is NVP around count. Presumably, these are oversights. e) The whole thing isn't obvious and its hard to follow. It couples the implementation code in i/o serializer.hpp to a specific kind of archive adding another dimension to be considered while understanding this thing. f) What about bitwise serializble types which aren't fundamental? That is structures which don't have things like pointers in them. They have the same opportunity but aren't addressed. If this is a good idea for fundamental types, someone is going to want to do them as well - which would open up some new problems. g) I don't see endian-ness addressed anywhere. I believe that protocols such as XDR and MPI are designed to transmit binary data between heterogenious machines. Suppose I save an array of ints as a sequence of raw bits on an intel type machine. Then I use load_binary to reload the same seqence of bits into sparc based machine. I won't get back the same data values. So either either the method will have to be limited to collections of bytes or some extra machinery would have to be added to conditionally to the endian translation depending on the source/target machine match/mismatch. f) Similar issues confront bitwise serialization of floats and doubles. I believe the "canonical" format for floats/doubles is ieee 80 bit. (I think that's what XDR uses - I could be wrong.) I believe that many machines store floats as 32 bit word and doubles as 64 bit words. I doubt they all are guarenteed to have the same format as far as exponent, sign and representation of value. So that's something else to be addressed. Of course endian-ness plays into this as well. g) I looked at the "benchmark" results. I notice that they are run with -O2 on the gcc compiler. Documentation for the gcc compiler command line specifies that this optimization level should does not enable automatic inlining for small functions. This is a crucial optimization to be effective in the serialization library. The library is written with the view that compilers will collapse inline code when possible. But this happens only in the gcc compiler when the -O3 optimization switch is used. Furthermore, with this compiler, it might be necessary to also specify max-inline-insns-recursive-auto switch. to gain maximum performance on boost type code. This latter is still under investigation. h) my own rudimentary benchmark (which was posted on this list) used 1000 instances of a structure which contained all C++ primitive data types plus an std::string made up of random characters. It was compiled as a boost test and built with bjam so it used the standard boost options for release mode. It compared timings against using raw stream i/o. Timings for binary_archive and standard stream i/o where comparable. I'm still working on this test. The problem is that standard stream i/o uses text output/input. Of course no one for whom performance is an issue would do this so I have to alter my timing test to use binary i/o to the standard stream as a comparison. But for now, I'm comfortable in asserting that there is not a large performance penalty using serialization as opposed to "rolling your own". As an aside, the test executable doing the same test for 3 different types of archives and all primitive data types only came to 238K. So there isn't a significant code bloat issue either. i) somehow I doubt that this archive type has been tested with all the serialization test suite. Instructions for doing so are in the documenation and the serialization/test directory includes batch files for doing this with one's own archives. Was this done? What where the results? With which compiler? It costs nothing to do this. end of observations =================== Admitedly, this is only a cursory examination. But its more than enough to make me skeptical of the whole idea. I you want, I could expand upon my reasons for this view, but I think they should be obvious. Now if someone feels differently and wants to implement such a thing, they have my full support. There is no need to modify the core library and no benefit - performance or otherwise. The following shows how to go about this. For purposes of this exposition, I am going to limit myself to how one would go about crafting a system similar to the one submitted. That is, I will not address concerns such as binary portability as the are not addressed in the submission as I see it. I'm assuming that the only issue is how best to arrange things so that save_binary/load_binary are invoked in for contiguous collections of fundamental types. Suggestions =========== I do see the utility and merit in what you're trying to do here - finally. Honestly it just wasn't obvious from the initial package. So here is how I would have gone about it. I previously posted a demo "fast_oarchive.hpp" I will expand upon it here. the archive class tree would look like the following (setting aside polymorphic archives). I would envision the following class basic_archive basic_oarchive common_oarchive basic_binary_oarchive binary_oarchive fast_oarchive_impl MPI_oarchive XDR_oarchive ... fast_oarchive_impl is an adaptor which can be applied to any legal archive. The current submission derives from binary_?archive. If this is all that is required then it doesn't have to be a template, it could just as well be derived directly from binary_oarchive. It includes overloads for the following: template<class T> void save_array(T & t, collection_size_t count){ // this version not certified for more complex types !!! BOOST_STATIC_ASSERT(boost::is_primitive<T>::value); // or pointers either !!! BOOST_STATIC_ASSERT(boost::is_pointer<T>::value); ... // if T is NOT a fundamental type - some mpl magic required here // foward call to base class this->Base::save_override(t, 0); // else - *(this->This()) << make_nvp("count", t.size() * sizeof(T)); *(this->This()) << make_nvp(make_binary_object(t.size() * sizeof(T), & t)); // note - the nvp wrappers probably not necessary of we're only // only going to apply this to binary archives. } // here's a way to do it for all vectors in one shot template<class T> void save_override(const std::vector<T> & t, int){ save_array(t, t.size()); } ... boost::valarray? ... // for C++ built-in arrays template<typename T, int N> void save_override(const T (& t)[N] , int){ save_array(t, sizeof(t)); } It might or might not contain similar implementations for std::vector ublas::vector ublas::matrix mtl::vector blitz::array custom_lib::fast_matrix etc For now assume that it does - later we'll relax that assumption. derived from fast_oarchive_impl are your archive classes for specific archive types - MPI_oarchive, .... These would handle the specific features of each particular archive. If it turns out that MPI can only apply the save_binary optimization to types that are no more than a character wide (perhaps for endien issues) then it would include something like: template<class T> void save_override(const std::vector<T> & t, int){ // suppose endian issues preclude save_binary preclude if(sizeof(T) > 1){ // skip over save_binary optimization this->binary_oarchive::save_override(t, 0) } else // just forward to the base class this->fast_oarchive_impl::save_override(t, 0) } So net result is: a) save_binary optimizations are invoked from the fast_oarchive_impl class. They only have to be specified once even though they are used in more than one variation of the binary archive. That is, if there are N types to be subjected to the treatment by M archives - there are only N overrides - regardless of the size of M. b) save_binary optimizations can be overriden for any particular archive types. (It's not clear to me how the current submission would address such a situation). c) There is no need alter the current library. d) It doesn't require that anything in the current library be conditioned on what kind of archive is being used. Insertion of such a coupling would be extremely unfortunate and create a lot of future maintainence work. This would be extremely unfortunate for such a special purpose library. This is especially true since its absolutly unnecessary. e) The implemenation above, could easily be improved to be resolved totally at compile time. Built with a high quality compiler (with the appropriate optimization switches set), this would result in fastest possible code. f) all code for save_binary is in one place - within fast_oarchive_impl. If fast_oarchive_impl is implemented as a template, it could be applied to any existing archive class - even text and xml. I don't know if there would be any interest in doing that - but it's not inconcievable. Note also that including all the save_binary optimizations in for all of std::vector ublas::vector ublas::matrix mtl::vector blitz::array custom_lib::fast_matrix doesn't require that the headers for these libraries be included. The code in the header isn't required until the template is instantiated. So there wouldn't be any "header bloat"(tm) g) Now, f above could also be seen as a disadvantage. That is, it might seem better to let each one involved in serialization of a particular collection keep his stuff separate. There are a couple of options here I will sketch out. i) one could make a new trait is_bitwise_serializable whose default value is false. For each collection type one would specialize this like: template<class T> struct is_bitwise_serializable<vector <T> > { ... is_fundamental<T> ... get_size(){ // override default options which is sizeof(T) .. } Now fast_oarchive_impl would contain something like: // here's a way to do it for all vectors in one shot template<class T> void save_override(const T & t, int){ // if T is NOT bitwise serializable - insert mpl magic required here // foward call to base class this->Base::save_override(t, 0); // else - *(this->This()) << make_nvp("count", t.size() * sizeof(T)); *(this->This()) << make_nvp(make_binary_object(...get_size(), & t)); // note - the nvp wrappers probably not necessary of we're only // only going to apply this to binary archives. } Which would implement the save_binary optimization for all types with the the is_bitwise_serializable trait set. Of course any class derived from fast_oarchive_impl could override this as before. Note that this would address the situation whereby one has something like struct RGB { unsigned char red; unsigned char green; unsigned char blue; }; which certainly would be candidate for save_binary optimization - it would even be portable across machines of incompatible machines - but then there might be alignment issues. Also, if someone tracks an object of type RGB somewhere, then this would work differently on different archives - not a good thing. So this would require some careful documentation on how to use such a trait. This would move the information about save_binary optimization out of the fast_oarchive_impl.hpp header file and into the header file for each particular type - arguable a better choice. ii) another option would be to implement differing serializations depending upon the archive type. So that we might have template<class T> void save(fast_oarchive_impl &ar, const std::vector<T> & t, const unsigned int){ // if T is a fundamental type or .... ar << t.size(); ar.save_binary(t.size() * sizeof(T), t.data?()); } This would basically much simpler substitute for the "fast_archive_trait" proposed by the submission. Wrap up ======= I guess it should be pretty obvious that I believe that a) The applicability and utility of a save/load binary optimization are narrow than claimed. b) The claims of performance improvement are suspect. c) This implementation doesn't address all the issues that need to be addressed for something like this. d) Such an idea could be implemented in a much simpler, more transparent, robust, and efficient manner. e) Such "special" implementation features are easily accommodated by the current library by extension. There is no need to change any of the core library implementation or interface. Finally, I have to comment on the way this has been proposed. Checking a diverse group into the CVS system on a separate branch is not convenient for most people. This would better be a zip file uploaded to the vault. It should also include some sort of document outlining what its intended to do and how it does it. Had this been done, its likely that many more people would have had a look at it and been able to commment. I'm sure the issues i've noted above would be apparent to a lot of people and I wouldn't have had to spend a few hours preparing this critique. If the above weren't bad enough, I'm pretty much been forced to do what really amounts to doing a private consulting job for free for a specific application. I really can't do this anymore. Its one think to point a user in the right direction (which often results in tweak to the manual) or incorporate a bug fix - which some user has tracked down, fixed and submitted. But to have to spend the time to critique something like this - whose problems should be obvious to most of us. is really unfair. I've been through it a couple of times - I'm not doing it again. If you feel that you need to split off the serialization library to do what you want, I've a better idea: How about you and Matthias taking it over yourself? After all, I've achieved all I expect to get from it. Then you would be free to make any changes you want without wasting time on these discussions. Of course if the problems in the above are only obvious to me, then it probably would indicate that I'm out of sync with the boost development community - in which case it really should be taken over by someone else. Robert Ramey