Re: Re: Re: [boost] Re: [serialization] + [boost.python serialization of python object

Ralf W. Grosse-Kunstleve wrote:
Interesting. If I serialize and deserialize std::vector<double> as a text archive but on the same machine, will I always get back exactly the same bit patterns for the double values?
The text archive uses a stream manipulator to set the precision of the output to capture all the precision in the double. It uses the numeric<limits> to determine this. So I can't say it will be exactly the same bit stream but it will be close to the original number. If you want to guarantee the exact representation you can either use the included native binary archive or serialize the data element (double) as a (non-portable) binary object. Robert Ramey

--- Robert Ramey <ramey@rrsd.com> wrote:
Ralf W. Grosse-Kunstleve wrote:
Interesting. If I serialize and deserialize std::vector<double> as a text archive but on the same machine, will I always get back exactly the same bit patterns for the double values?
The text archive uses a stream manipulator to set the precision of the output to capture all the precision in the double. It uses the numeric<limits> to determine this. So I can't say it will be exactly the same bit stream but it will be close to the original number. If you want to guarantee the exact representation you can either use the included native binary archive or serialize the data element (double) as a (non-portable) binary object.
FWIW: For the serialization of C++ arrays wrapped as Python objects (via Python's pickle) I implemented a small "library" (it really is just one header file) for converting integers and floating-point numbers to a pseudo text format. In principle it works just like the conversion to base-10 numbers, but uses base-256 instead. I.e. the result looks like a binary format, but it is as machine-independent as a text format. The serialized strings are smaller than regular text format, but larger than raw binary format. Integers are serialized like this: NXX...X The first character N is the length of the encoded number to follow, i.e. the number of X above. X encodes the number in base-256 format. Floating point numbers are stored as two integers, one for the mantissa and one for the exponent. This can be done portably and without loss of precision because <cmath> provides std::frexp() and std::ldexp(). I chose the base-256 conversion because it is the most efficient in terms of memory required for storing the serialized objects. However, the same approach could be used for portable base-128 or base-64 conversions. The conversion would just be a little bit slower and the resulting string a little bit larger. My current implementation can be found here: http://cvs.sourceforge.net/viewcvs.py/cctbx/scitbx/include/scitbx/serializat... I did not have to change the code in 16 months, from which I conclude that the approach is robust and mature, and it is known to work on a large number of platforms (http://cci.lbl.gov/cctbx_build/). The base_256.h file can be copied, modified and redistributed without any restrictions. I'll comment some more if there is an interest and need, but the approach is fundamentally so simple that the 300 lines (not counting the copyright notice) should be self-explanatory ;-) Ralf __________________________________ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25� http://photos.yahoo.com/ph/print_splash
participants (2)
-
Ralf W. Grosse-Kunstleve
-
Robert Ramey