[serialization] round-trip serialization of float

The boost serialization of float doesn't appear to use enough digits to guarantee that deserialization of a serialized value will produce the same value. The relevant code is in boost/archive/basic_text_oprimitive.hpp: void save(const float t) { os << std::setprecision(std::numeric_limits<float>::digits10 + 2); ... } std::numeric_limits<float>::digits10+2=8, but float actually requires numeric_limits<float>::max_digits10=9 decimal digits to guarantee unique representation. The code below (with boost/1.52) demonstrates the problem: #include <sstream> #include "boost/archive/text_iarchive.hpp" #include "boost/archive/text_oarchive.hpp" #include <iostream> int main(int argc, char **argv){ float oldf = -1.20786635e-05; std::ostringstream os; boost::archive::text_oarchive oa(os); oa & BOOST_SERIALIZATION_NVP(oldf); float newf; std::istringstream is(os.str()); boost::archive::text_iarchive ia(is); ia & BOOST_SERIALIZATION_NVP(newf); printf("before serialization: %.12g\nafter serialization: %.12g\n", oldf, newf); } // Output: // before serialization: -1.20786635307e-05 // after serialization: -1.20786644402e-05 Note: There's some discussion of this topic in a slightly different context here: http://boost.2283326.n4.nabble.com/serialization-Serialisation-deserialisati...

Adam Lerer wrote:
Note: There's some discussion of this topic in a slightly different context here: http://boost.2283326.n4.nabble.com/serialization-Serialisation-deserialisati...
Remember that the serialization library uses standard i/o for input/output. So this discussion above is actually the same issue. In general, libraries won't guarentee bit for bit equality. This is due to a number of reasons: a) not every binary floating number has en exact representation when rendered as decimal. b) different libraries can't be expected to handle this in all the same way. c) text type archives are meant to be portable between architectures. So, for example, on can create floating point numbers in an environment which uses an 80 ieee754 representation, write them to a file, and read them on another architecture - say a 64 bit ieee754 representation and expect things to work. That is, the streams i/o attampts to preserve values (rather than bit representation ) to the extent that it makes sense to do so. d) It gets much worse with floating point values such as NaN - since different libraries differ on how to address this. e) Remember, this is not a serialization library issue-- but rather one related to floating point numbers and standard i/o. i So, If you feel you need to preserver the exact set of bits - don't use floating point - use something else. Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Thursday, January 10, 2013 4:28 PM To: boost@lists.boost.org Subject: Re: [boost] [serialization] round-trip serialization of float
Adam Lerer wrote:
Note: There's some discussion of this topic in a slightly different context here: http://boost.2283326.n4.nabble.com/serialization-Serialisation-deseria lisation-of-floating-point-values-td2604169i20.html
Remember that the serialization library uses standard i/o for input/output. So this discussion above is actually the same issue.
In general, libraries won't guarentee bit for bit equality. This is due to a number of reasons:
a) not every binary floating number has en exact representation when rendered as decimal.
b) different libraries can't be expected to handle this in all the same way.
c) text type archives are meant to be portable between architectures. So, for example, on can create floating point numbers in an environment which uses an 80 ieee754 representation, write them to a file, and read them on another architecture - say a 64 bit ieee754 representation and expect things to work. That is, the streams i/o attampts to preserve values (rather than bit representation ) to the extent that it makes sense to do so.
d) It gets much worse with floating point values such as NaN - since different libraries differ on how to address this.
e) Remember, this is not a serialization library issue-- but rather one related to floating point numbers and standard i/o. i
So, If you feel you need to preserver the exact set of bits - don't use floating point - use something else.
As the originator of the thread above, I just want to add that I agree with this summary. You can only find out if serialization-deserialization will work by both examination of the bit layouts and careful testing. Your best bet is to use the max_digits10 precision (or, sadly for now, better use the formula given - because on the previous VS version, std::numeric_limits<float>::max_digits10 was wrong :-) and to use scientific format (to avoid a previous VS buglet). Nan and infinity now has a fair chance of working, after work by Johan Rode, but testing is still essential. Good luck! Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

On Thu, Jan 10, 2013 at 5:28 PM, Robert Ramey <ramey@rrsd.com> wrote:
So, If you feel you need to preserver the exact set of bits - don't use floating point - use something else.
Is it really unreasonable to expect a round trip to result in the same value on the same platform with the same code/binary? -- Olaf

Olaf van der Spek wrote:
On Thu, Jan 10, 2013 at 5:28 PM, Robert Ramey <ramey@rrsd.com> wrote:
So, If you feel you need to preserver the exact set of bits - don't use floating point - use something else.
Is it really unreasonable to expect a round trip to result in the same value on the same platform with the same code/binary?
yes but don't take my word for it - ask an author of a standard floating point library. btw - if it's the same code/binary you can use binary_archive which saves/loads the bits - not really the value. In this circumstance the issue never arises. Maybe that's what you really want. even with a text base archive, you would have the option of saving your data as binary object. This would load bit for bit the same. But the archive wouldn't be portable accross platforms - thus defeating the motivating purpose for a text base archive in the first place. can't have it both ways. Robert Ramey

a) not every binary floating number has en exact representation when rendered as decimal. True, but if max_digits10 digits is used, you can at least guarantee that the binary representation will deserialize as the same float. This is very helpful for common use cases of the serialization library
I understand that float serialization cannot be guaranteed to work correctly in all cases. That said, the precision used for float serialization is a choice, and some choices are better than others. It seems to me that the choice of max_digits10 (or even better, Paul's alternative formula to work around the VS bug) is superior to digits10+2. It might also be helpful to note that lexical_cast uses exactly the formula that Paul recommends. In lcast_precision.hpp: BOOST_STATIC_CONSTANT(unsigned long, precision_bin = 2UL + limits::digits * 30103UL / 100000UL ); On Thu, Jan 10, 2013 at 3:07 PM, Robert Ramey <ramey@rrsd.com> wrote:
Olaf van der Spek wrote:
On Thu, Jan 10, 2013 at 5:28 PM, Robert Ramey <ramey@rrsd.com> wrote:
So, If you feel you need to preserver the exact set of bits - don't use floating point - use something else.
Is it really unreasonable to expect a round trip to result in the same value on the same platform with the same code/binary?
yes
but don't take my word for it - ask an author of a standard floating point library.
btw - if it's the same code/binary you can use binary_archive which saves/loads the bits - not really the value. In this circumstance the issue never arises. Maybe that's what you really want.
even with a text base archive, you would have the option of saving your data as binary object. This would load bit for bit the same. But the archive wouldn't be portable accross platforms - thus defeating the motivating purpose for a text base archive in the first place.
can't have it both ways.
Robert Ramey
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Adam Lerer wrote:
a) not every binary floating number has en exact representation when rendered as decimal. True, but if max_digits10 digits is used, you can at least guarantee that the binary representation will deserialize as the same float. This is very helpful for common use cases of the serialization library
I understand that float serialization cannot be guaranteed to work correctly in all cases. That said, the precision used for float serialization is a choice, and some choices are better than others. It seems to me that the choice of max_digits10 (or even better, Paul's alternative formula to work around the VS bug) is superior to digits10+2.
It might also be helpful to note that lexical_cast uses exactly the formula that Paul recommends. In lcast_precision.hpp:
BOOST_STATIC_CONSTANT(unsigned long, precision_bin = 2UL + limits::digits * 30103UL / 100000UL );
Feel free to add a trac item for this Robert Ramey

On Fri, Jan 11, 2013 at 3:45 PM, Robert Ramey <ramey@rrsd.com> wrote:
Adam Lerer wrote:
a) not every binary floating number has en exact representation when rendered as decimal. True, but if max_digits10 digits is used, you can at least guarantee that the binary representation will deserialize as the same float. This is very helpful for common use cases of the serialization library
I understand that float serialization cannot be guaranteed to work correctly in all cases. That said, the precision used for float serialization is a choice, and some choices are better than others. It seems to me that the choice of max_digits10 (or even better, Paul's alternative formula to work around the VS bug) is superior to digits10+2.
It might also be helpful to note that lexical_cast uses exactly the formula that Paul recommends. In lcast_precision.hpp:
BOOST_STATIC_CONSTANT(unsigned long, precision_bin = 2UL + limits::digits * 30103UL / 100000UL );
Feel free to add a trac item for this
Robert Ramey
FWIW, here's a patch that calls boost::detail::lcast_get_precision instead of digits10+2. It appears that the lexical_cast authors have thought through several portability/compatibility issues. Serialization might as well benefit from their work. The patch also has an overload of save(long double) which follows the same pattern as save(float) and save(double). John Salmon diff -Naur boost_1_52_0.orig/boost/archive/basic_text_oprimitive.hpp boost_1_52_0/boost/archive/basic_text_oprimitive.hpp --- boost_1_52_0.orig/boost/archive/basic_text_oprimitive.hpp 2011-01-19 12:33:55.000000000 -0500 +++ boost_1_52_0/boost/archive/basic_text_oprimitive.hpp 2013-01-14 10:21:26.000000000 -0500 @@ -33,6 +33,7 @@ #include <boost/config.hpp> #include <boost/static_assert.hpp> #include <boost/detail/workaround.hpp> +#include <boost/detail/lcast_precision.hpp> #if BOOST_WORKAROUND(BOOST_DINKUMWARE_STDLIB, == 1) #include <boost/archive/dinkumware.hpp> #endif @@ -130,7 +131,7 @@ boost::serialization::throw_exception( archive_exception(archive_exception::output_stream_error) ); - os << std::setprecision(std::numeric_limits<float>::digits10 + 2); + os << std::setprecision(boost::detail::lcast_get_precision<float>()); os << t; } void save(const double t) @@ -140,7 +141,17 @@ boost::serialization::throw_exception( archive_exception(archive_exception::output_stream_error) ); - os << std::setprecision(std::numeric_limits<double>::digits10 + 2); + os << std::setprecision(boost::detail::lcast_get_precision<double>()); + os << t; + } + void save(const long double t) + { + // must be a user mistake - can't serialize un-initialized data + if(os.fail()) + boost::serialization::throw_exception( + archive_exception(archive_exception::output_stream_error) + ); + os << std::setprecision(boost::detail::lcast_get_precision<long double>()); os << t; } BOOST_ARCHIVE_OR_WARCHIVE_DECL(BOOST_PP_EMPTY())

John Salmon wrote:
On Fri, Jan 11, 2013 at 3:45 PM, Robert Ramey <ramey@rrsd.com> wrote:
Adam Lerer wrote:
a) not every binary floating number has en exact representation when rendered as decimal. True, but if max_digits10 digits is used, you can at least guarantee that the binary representation will deserialize as the same float. This is very helpful for common use cases of the serialization library
I understand that float serialization cannot be guaranteed to work correctly in all cases. That said, the precision used for float serialization is a choice, and some choices are better than others. It seems to me that the choice of max_digits10 (or even better, Paul's alternative formula to work around the VS bug) is superior to digits10+2.
It might also be helpful to note that lexical_cast uses exactly the formula that Paul recommends. In lcast_precision.hpp:
BOOST_STATIC_CONSTANT(unsigned long, precision_bin = 2UL + limits::digits * 30103UL / 100000UL );
Feel free to add a trac item for this
Robert Ramey
FWIW, here's a patch that calls boost::detail::lcast_get_precision instead of digits10+2. It appears that the lexical_cast authors have thought through several portability/compatibility issues. Serialization might as well benefit from their work.
The patch also has an overload of save(long double) which follows the same pattern as save(float) and save(double).
John Salmon
diff -Naur boost_1_52_0.orig/boost/archive/basic_text_oprimitive.hpp boost_1_52_0/boost/archive/basic_text_oprimitive.hpp --- boost_1_52_0.orig/boost/archive/basic_text_oprimitive.hpp 2011-01-19 12:33:55.000000000 -0500 +++ boost_1_52_0/boost/archive/basic_text_oprimitive.hpp 2013-01-14 10:21:26.000000000 -0500 @@ -33,6 +33,7 @@ #include <boost/config.hpp> #include <boost/static_assert.hpp> #include <boost/detail/workaround.hpp> +#include <boost/detail/lcast_precision.hpp> #if BOOST_WORKAROUND(BOOST_DINKUMWARE_STDLIB, == 1) #include <boost/archive/dinkumware.hpp> #endif @@ -130,7 +131,7 @@ boost::serialization::throw_exception(
archive_exception(archive_exception::output_stream_error) ); - os << std::setprecision(std::numeric_limits<float>::digits10 + 2); + os << std::setprecision(boost::detail::lcast_get_precision<float>()); os << t; } void save(const double t) @@ -140,7 +141,17 @@ boost::serialization::throw_exception(
archive_exception(archive_exception::output_stream_error) ); - os << std::setprecision(std::numeric_limits<double>::digits10 + 2); + os << std::setprecision(boost::detail::lcast_get_precision<double>()); + os << t; + } + void save(const long double t) + { + // must be a user mistake - can't serialize un-initialized data + if(os.fail()) + boost::serialization::throw_exception( + archive_exception(archive_exception::output_stream_error) + ); + os << std::setprecision(boost::detail::lcast_get_precision<long double>()); os << t; } BOOST_ARCHIVE_OR_WARCHIVE_DECL(BOOST_PP_EMPTY())
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
How about adding this as a comment/alternative/comparison to https://svn.boost.org/trac/boost/ticket/7886 Robert Ramey
participants (5)
-
Adam Lerer
-
John Salmon
-
Olaf van der Spek
-
Paul A. Bristow
-
Robert Ramey