Re: [boost] [serialization] fast array serialization (10x speedup)

13 Nov 2005

      On Nov 12, 2005, at 9:33 PM, Robert Ramey wrote:
...
I've been perusing the files you checked, your example, and this list.
Summary
=======
First of all, a little more complete narrative description as to what
the submission was intended to acommplish and how it would change
the way the user uses the library would have been helpful.  I'm going
to summarize here what I think I understand about this.  Please  
correct
me if I get something wrong.
a) a new trait is created.
template <class Archive, class Type>
struct has_fast_array_serialization
  : public mpl::bool_<false>
{};
Yes, I wrote that in my original e-mail
...
b) new functions save_array and load_array are implemented in those
archives which have the above trait set to true. In this case the
following is added to the binary_iarchive.hpp file.  The effect
is that this trait will return true when a fundamental type is
to be saved/loaded to a binary_iarchive.
// specialize has_fast_array_serialization
   // the binary archive provides fast array serialization for all
fundamental types
   template <class Type>
   struct has_fast_array_serialization<binary_iarchive,Type>
     : public is_fundamental<Type>
   {};
This is just the example for binary archives. The set of types for  
which direct serialization of arrays is possible is different from  
archive to archive. E.g. MPI archives support array serialization for  
all PODs that are not pointers and do not contain pointer members.
...
Some Observations
=================
Inmediatly the following come to mind.
a) I'm not sure about the portability of enable_if.  Would this not  
break
the whole serialization system for those compilers which don't  
support it?
I mentioned this issue in my initial e-mail, and if there are  
compilers that are supported by the serialization library but do not  
support enable_if, we can replace it by tag dispatching.
...
b) what is the the point of save_array?  why not just invoke  
save_binary
directly?
Because we might want to do different things than save_binary. Look  
back at the thread. I gave four different examples.
...
c) The same could be said for built arrays - just invoke save_binary
same as above.
...
d) There is no provision for NVP in the non-binary version above while
in the binary version there is NVP around count. Presumably, these
are oversights.
The count is not saved by save_array, but separately, and there the  
same code as in your version is used. Hence, the count is also stored  
as an NVP.
...
e) The whole thing isn't obvious and its hard to follow.  It couples
the implementation code in i/o serializer.hpp to a specific kind of  
archive
adding another dimension to be considered while understanding this  
thing.
The real problem is that you implement the serialization of arrays in  
i/o serializer.hpp now. That's why I patched it there. The best  
solution would be to move array serialization to a separate header.
...
f) What about bitwise serializble types which aren't fundamental?  
That is
structures which don't have things like pointers in them. They have  
the
same opportunity but aren't addressed.  If this is a good idea for
fundamental
types, someone is going to want to do them as well - which would  
open up
some
new problems.
I mentioned above that this is just what we do for MPI archives now.  
This mechanism can easily be extended to binary archives: First you  
introduce a new traits class

template <class Type>
struct is_bitwise_serializable
   : public is_fundamental< Type >
{};

and then use this traits in the definition of

template <class Type>
    struct has_fast_array_serialization<binary_iarchive,Type>
      : public is_bitwise_serializable <Type>
    {};
...
g) I don't see endian-ness addressed anywhere.  I believe that  
protocols
such as XDR and MPI are designed to transmit binary data between
heterogenious machines.  Suppose I save an array of ints as a  
sequence of
raw
bits on an intel type machine. Then I use load_binary to reload the  
same
seqence
of bits into sparc based machine.  I won't get back the same data  
values.
So either either the method will have to be limited to collections  
of bytes
or some extra machinery would have to be added to conditionally to the
endian translation depending on the source/target machine match/ 
mismatch.
That's is EXACTLY the reason why I propose to call save_array instead  
of save_binary. In a portable binary archve, save_array and  
load_array will take care of the endianness issue. XDR, CDR, MPI,  
PVM, HDF and other libraries do it just like that.
...
f) Similar issues confront bitwise serialization of floats and  
doubles. I
believe
the "canonical" format for floats/doubles is ieee 80 bit.  (I think  
that's
what
XDR uses - I could be wrong.)  I believe that many machines store  
floats as
32 bit word and doubles as 64 bit words.  I doubt they all are  
guarenteed
to have the same format as far as exponent, sign and representation of
value.
So that's something else to be addressed.  Of course endian-ness  
plays into
this
as well.
Same answer as above. IEEE has 32 and 64 bit floating point types,  
and they are used also by XDR and CDR. As far as I know the 80 bit  
type is an Intel extension. Again you see that save_binary and  
load_binary will not do the trick. That's why we need save_array and  
load_array.
...
g) I looked at the "benchmark" results.  I notice that they are run  
with -O2
on the gcc compiler. Documentation for the gcc compiler command line
specifies
that this optimization level should does not enable automatic  
inlining for
small functions.  This is a crucial optimization to be effective in  
the
serialization library.  The library is written with the view that  
compilers
will collapse inline code when possible.  But this happens only in  
the gcc
compiler when the -O3 optimization switch is used.  Furthermore,  
with this
compiler,
it might be necessary to also specify max-inline-insns-recursive-auto
switch.
to gain maximum performance on boost type code. This latter is  
still under
investigation.
You can drop the double quotes around the "benchmark". I have been  
involved in benchmarking of high performance computers for 15 years,  
and know what I'm doing. I have also run the codes under -O3, with  
the same results.

Regarding the inlining: -O2 inlines all the functions that are  
declared as inline. -O3 in addition attempts to inline small  
functions that are not declared inline. I surely hope that all such  
small functions in the library are declared inline, and the fact that  
there is no significant difference in performance
...
h) my own rudimentary benchmark (which was posted on this list)  
used 1000
instances of a structure which contained all C++ primitive data types
plus an std::string made up of random characters. It was compiled as
a boost test and built with bjam so it used the standard boost options
for release mode.  It compared timings against using raw stream i/o.
Timings for binary_archive and standard stream i/o where comparable.
I'm still working on this test.  The problem is that standard  
stream i/o
uses text output/input.  Of course no one for whom performance is  
an issue
would do this so I have to alter my timing test to use binary i/o to
the standard stream as a comparison.  But for now, I'm comfortable
in asserting that there is not a large performance penalty using
serialization
as opposed to "rolling your own".  As an aside, the test executable  
doing
the same test for 3 different types of archives and all primitive data
types only came to 238K.  So there isn't a significant code bloat  
issue
either.
Nobody who cares for performance would use text based I/O. All your  
benchmark shows is that the overhead of the serialization library is  
comparable to that of text/based I/O onto a hard disk. For this  
purpose you are right, the overhead can be ignored. On the other  
hand, my benchmark used binary I/O into files and into memory  
buffers, and that's where the overhead of the serialization library  
really hurts. A 10x slowdown is horrible and makes the library  
unusable for high performance applications.
...
i) somehow I doubt that this archive type has been tested with all
the serialization test suite.  Instructions for doing so are in the
documenation and the serialization/test directory includes batch files
for doing this with one's own archives.  Was this done?  What where  
the
results?  With which compiler?  It costs nothing to do this.
Just ask if you had a doubt. The short answer is "I have done this".  
After adding the fast array serialization to the binary and  
polymorphic archives, I ran all your regression tests, without any  
problem (using gcc 4 under MacOS X).
...
end of observations
===================
Admitedly, this is only a cursory examination.  But its more than  
enough to
make me skeptical of the whole idea. I you want, I could expand  
upon my
reasons
for this view, but I think they should be obvious.
I will stop this e-mail here since as you can see there is nothing to  
be skeptical about. Actually I had already replied to all these  
issues before.

I would appreciate if you read my replies instead of making the same  
statements over and over again without considering my arguments. The  
endianness issue you raise above is, as you can see from my reply,  
not a problem in my approach, but instead a killer argument for your  
proposal to use save_binary instead.

I will reply to your alternative proposal in a seocnd e-mail.

Matthias

Re: [boost] [serialization] fast array serialization (10x speedup)

Matthias Troyer