
On Oct 9, 2005, at 11:15 PM, Robert Ramey wrote:
Attached is a sketch of what I have in mind. It does compile without error on VC 7.1
With this approach you would make one fast_oarchive adaptor class and one small and trivial *.hpp file for each archive it is adapted to.
---- SUMMARY --------- If I may summarize this solution as follows: template<class Base> class fast_oarchive_impl : public Base { public: ... // custom specializations void save_override(const std::vector<int> & t, int){ save_binary(t, sizeof(int) * t.size()); } // here's a way to do it for all vectors in one shot template<class T> void save_override(const std::vector<T> & t, int){ save_binary(t, sizeof(T) * t.size()); // this version not certified for more complex types !!! BOOST_STATIC_ASSERT(boost::is_primitive<T>::value); // or pointers either !!! BOOST_STATIC_ASSERT(boost::is_pointer<T>::value); } ... }; then I see several major disadvantages of this approach: 1.) it fixes the value types for which fast array serialization can be done 2.) for M types types to be serialized and N archives there is an MxN problem in this approach. 3.) it leads to a tight coupling between archives and all classes that can profit from fast array serialization (called "array-like classes" below), and makes the archive depend on implementation details of the array-like classes 4.) it is not easily extensible to new array-like classes Let me elaborate on these points below and provide a possible solution to each of them. The simplest solution, as I see it will be to - provide an additional traits class has_fast_array_serialization - archives offering (the optional) fast array serialization provide a save_array member function in addition to save and save_binary - The dispatch to either save() or save_array() is the responsibility of the serialization code of the class, and not the responsibility of the archive These are minor extensions to the serialization library, that do not break any existing code, that do not make it harder to write a new archive or a new serialize function, but they allow new types of archives and can give huge speedups for large data sets. ------- DETAILS ----------- Now the details ad 1.: you need to fix the types for which the fast version is used, either by providing explicit overloads for K types (such as int above), which would give a KxMxN problem, or by using a template- based approach. You are probably aware that the BOOST_STATIC_ASSERT in your example will cause this archive to fail working with std::vectors of more complex types. One easy way to solve this is by restricting the applicability of the template, using boost::enable_if: template<class T> void save_override ( const std::vector<T> & t, int, typename boost::enable_if<has_fast_array_serialization<Base,T> >::type *=0 ); where the traits class has_fast_array_serialization<Base,T> specifies whether the fast version should be used for the type T with the archive Base. The reason to provide a traits class instead of hard-coding the types, and restricting them to primitive non-pointer types is that the set of types that can use an optimized serialization depends on the archive type. A non-portable binary archive could support all POD- types that contain no pointer members (e.g. the gps_position class in your example), while an MPI archive can support fast serialization for all default-constructible types. Hence a traits class depending on both the archive type and the value_type of the vector. ad 2.: there is still an MxN problem: you propose to dispatch to save_binary: void save_override(const std::vector<T> & t, int) { save_binary(t, sizeof(T) * t.size()); } where the signature of save_binary is void save_binary(void const *, std::size_t *) This is an acceptable solution for binary archives, and maybe a few others, but is NOT a general solution. To illustrate this, let me show how the fast saving is or might be implemented for some other archives. a) a potential portable binary archive might need to do byte reordering: void save_override(const std::vector<T> & t, int) { save_with_reordering(&(t[0]), t.size()); } where the save_with_reordering function will need type information to do the byte reordering, and might have a signature template <class T> void save_with_reordering(void T *, std::size_t *) b) an XDR archive, using XDR streams needs to make a call to an XDR function, and pass type information as well, as in class xdr_oarchive { ... void save_override(const std::vector<int> & t, int) { xdr_vector(&stream,&(t[0]),t.size(),sizeof(T),&xdr_int); } XDR* stream; } and a templated version could also be provided easily. Note that again I need the address, size and type information and cannot make this call from within save_binary. I have an archive implementation based on the UNIX xdr calls, and this is thus no hypothetical example/ c) let's next look at a packed MPI archive (of which I also have an implementation), where the override would be // simplified version of MPI archive class packed_mpi_oarchive { ... void save_override(const std::vector<int> & t, int) { MPI::Datatype datatype(MPI::INTEGER); datatype.Pack(&(t[0]), t.size(), buffer, buffer_size, position, communicator); } char* buffer; int buffer_size; int position; MPI::Comm& communicator; } and again, I need type information and cannot just call save_binary. d) as a fourth example I want to mention that MPI allows for serialization by message passing without the need to pack the data into a buffer first, but only the addresses and types of all data members need to be stored, to create a custom MPI type. An incomplete implementation (I have a complete implementation, based on the original idea by Daniel Egloff) would be: class mpi_oarchive { ... template <class T> void save_override(const std::vector<T> & t, int) { register_member(&(t[0]),t.size()); } template <class T> void register_member(T const* t, std::size_t l) { addresses.push_back(MPI::Get_address(t)); sizes.push_back(l); types.push_back(mpi_type<T>::value); } std::vector<MPI::Aint> addresses; std::vector<int> sizes; std::vector<MPI::Datatype> types; } Note that again save_binary does not do the trick, since we need type information. For this reason my proposed solution is to dispatch to a save_array function for those types and archives supporting it: template<class Base> class fast_oarchive_impl : public Base { public: // here's a way to do it for all vectors in one shot template<class T> void save_override ( const std::vector<T> & t, int, typename boost::enable_if<has_fast_array_serialization<Base,T> >::type *=0 ) { save_binary(&(t[0]),t.size()); } ... }; where all archive classes provide a function like void Archive::save_array(Type const *, std::size_t) for all types for which the traits has_fast_array_serialization<Archive,Type> is true. That way a single overload suffices for all the N=5 archive types presented above, and the MxN problem is solved. Note also, that archives not supporting this fast array serialization do not need to implement anything, as the default for has_fast_array_serialization<Archive,Type> is false. ad 3.: your proposal leads to a tight coupling between archives and classes to be serialized. Consider what I would need to do to add support for some future MTL matrix type. Again I present a simplified example showing the problem: template<class T> void save_override( const mtl_dense_matrix & m, int) { T const * data = implemetation_dependent_function_to_get_pointer(m); std::size_t length = implemetation_dependent_function_to_get_size(m) save_binary(data,length); } This introduces implementation details of the mtl_dense_matrix class into the archive, breaks orthogonality, and leads to a tight coupling. Change in these implementation details of the mtl_dense_matrix might require changes to the archive classes. The solution is easy: - some archives provide fast array serialization through the save_array member function - let the MTL be responsible for serialization of it own classes, and use save_array where appropriate ad 4.: in order to use fast array serialization with other classes such as - std::vector - std::valarray - boost::multi_array - uBlas vectors and matrices - blitz::Array save_override functions for ALL of these classes have to be added to the archive. This means, that to support any new class, be it a new ublas matrix, future MTL matrices, Blitz++ arrays, ...., the archive class needs to be modified. This is clearly not a scalable design. To summarize, with three minor extensions to the serialization library, none of which breaks any existing code, we can get 10x speedups for serialization of large date sets, enable new types of archives such as MPI archives, and all of that without introducing any of the four problems discussed here. Matthias