[serialization] Serialisation/deserialisation of floating-point values

I'm having problems with deserialising floating-point (double) values that are written to an XML file. I'm reading the values back in and comparing them to what I saved to ensure that my file has been written correctly. However, some of the values differ in about the seventeenth significant figure (or thereabouts). I thought Boost serialization used some numerical limit to make sure that values are serialised exactly to full precision, so what is happening here? Example: Value in original object, written to file: 0.0019075645054089487 Value actually stored in file (by examination of XML file): 0.0019075645054089487 [identical to value written to file] Value after deserialisation: 0.0019075645054089489 It looks like there is a difference in the least-significant bit, as examining the memory for these two values gives: Original value: b4 83 9b ca e7 40 5f 3f Deserialised value: b5 83 9b ca e7 40 5f 3f (where the least-significant byte is on the left) Note the difference in the first bytes. I'm using Boost 1.33.1 with Visual Studio 7.1.3088 in debug mode. Paul

Paul Giaccone wrote:
This is a common cause of errors when using floating point values. Writing a floating point value to a string representation, as are XML values, and attempting to read that string representation back, does not guarantee that the floating point value will remain exactly the same since there are a number of floating point values which have no exact representation in the C++ floating point formats. That is simply because of the nature of floating point representation used in C++ and most modern languages. After all, the number of floating point values within any range of numbers is infinite while the C++ floating point representation cab not be. The only way to guarantee what you want for floating point values is to write and read back to a binary representation of the value.

On Tue, Mar 14, 2006 at 01:11:07PM -0500, Edward Diener wrote:
It should still be possible to write out a value that can be uniquely mapped back to the exact bits. For example, you could write out enough digits that the exact value is the closest representable number at the given resolution. In my experience, having the i/o system treat floating point values exactly is extremely important to debugging, especially when dealing with chaotic systems (which includes most scientific applications). If you need your program to run for three hours to find a corner case that crashes it, and restarting the program from an output dump "fixes" the problem due to a change in the last bit, debugging becomes nearly impossible. Binary output solves this, but disallowing text output for debugging purposes would be unfortunate. Geoffrey

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Edward Diener | Sent: 14 March 2006 18:11 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation of floating-point values | | Paul Giaccone wrote: | > I'm having problems with deserialising floating-point | (double) values | > that are written to an XML file. I'm reading the values | back in and | > comparing them to what I saved to ensure that my file has | been written | > correctly. However, some of the values differ in about the | seventeenth | > significant figure (or thereabouts). | > | > I thought Boost serialization used some numerical limit to | make sure | > that values are serialised exactly to full precision, so what is | > happening here? | | This is a common cause of errors when using floating point values. | Writing a floating point value to a string representation, as are XML | values, and attempting to read that string representation | back, does not | guarantee that the floating point value will remain exactly the same | since there are a number of floating point values which have no exact | representation in the C++ floating point formats. That is | simply because | of the nature of floating point representation used in C++ and most | modern languages. After all, the number of floating point | values within | any range of numbers is infinite while the C++ floating point | representation cab not be. The only way to guarantee what you | want for | floating point values is to write and read back to a binary | representation of the value. I think that this is unduly pessimistic. In practice, provided the archive is written and read where floating point have the same IEEE format, usually 64-bit doubles, it should work OK. You could check in your program by reading if (std::numeric_limits<long double>::digits != 53) giveup! that this is so, and warn if it is not. If the binary representation is different, it won't work anyway! So you would be any better off! Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

While it's true that some decimal values have no exact binary representation and vice-versa, I believe you *should* be able to write as a decimal string and read back in, and get the same value, provided: * You write enough digits to the file, numeric_limits<T>::digits + 2 seems to be enough, but I wouldn't want to guarantee that. * You're std lib is bug free: there certainly have been cases of std lib's that don't round-trip numbers in this way (I know because I've reported these as bugs!), getting round-trip binary-decimal-binary conversion right is actually pretty hard. The classic "What Every Computer Scientist Should Know About Floating-Point Arithmetic" at http://docs.sun.com/source/806-3568/ncg_goldberg.html fills in the details: 9 decimal digits are required for single precision reals, and 17 for double precision, a formula is also given that allows you to check that you have enough decimal digits for some p-digit binary number. It's also apparent that reading in a decimal number correctly requires extended precision arithmetic, so I suspect most problems are likely to occur when serialising the widest floating point type on the system. Even Knuth says "leave it to the experts" when discussing binary-decimal conversion BTW :-) HTH, John.

If you want to test round-tripping on your platform and std lib without actually using serialization, may I suggest a loop including something like: double a = some start value; double aa; // to hold the read back. std::stringstream s; s.precision(2+std::numeric_limits<double>::digits * 3010/10000); // cout << "output " << a; s << a; // output to string s //cout << ", s.str() is " << s.str(); s >> aa; // read back in. //cout << ", read back " << aa << endl; if (a != aa) { cout << "error " << a << tab << aa << endl; } a = nextafter(a, std::numeric_limits<double>::max()); // Make one bit bigger? of course this may take too long for the full range of possible double! - some years ;-)) Took overnight for al possible floats on my aging system. //a *= 10.; // And times 10 too to make test run in reasonable time. Ran OK 8.0 16 sep 04 This should give you a feel for the risk of failure. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 14 March 2006 19:23 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | While it's true that some decimal values have no exact binary | representation | and vice-versa, I believe you *should* be able to write as a | decimal string | and read back in, and get the same value, provided: | | * You write enough digits to the file, | numeric_limits<T>::digits + 2 seems | to be enough, but I wouldn't want to guarantee that. | * You're std lib is bug free: there certainly have been cases | of std lib's | that don't round-trip numbers in this way (I know because | I've reported | these as bugs!), getting round-trip binary-decimal-binary | conversion right | is actually pretty hard. | | The classic "What Every Computer Scientist Should Know About | Floating-Point | Arithmetic" at | http://docs.sun.com/source/806-3568/ncg_goldberg.html fills | in the details: 9 decimal digits are required for single | precision reals, | and 17 for double precision, a formula is also given that | allows you to | check that you have enough decimal digits for some p-digit | binary number. | It's also apparent that reading in a decimal number correctly | requires | extended precision arithmetic, so I suspect most problems are | likely to | occur when serialising the widest floating point type on the | system. Even | Knuth says "leave it to the experts" when discussing binary-decimal | conversion BTW :-) | | HTH, John.

Paul A Bristow wrote:
[...]
This should give you a feel for the risk of failure.
Paul
Funnily enough, I've just written a program to test the value I originally posted about (below, followed by its output): #include <string> #include <sstream> #include <iostream> #include <iomanip> #include <limits> int main(void) { const double orig_value = 0.0019075645054089487; std::stringstream stream; double num; stream << std::setprecision(2 + std::numeric_limits<double>::digits * 3030/10000); stream << orig_value; stream >> num; if (num == orig_value) { std::cout << "Match" << std::endl; } else { std::cout << "Deserialisation error" << std::endl; std::cout << std::setprecision(2 + std::numeric_limits<double>::digits * 3030/10000); std::cout << "Original numerical value: " << orig_value << std::endl; std::cout << "Contents of stream: " << stream.str() << std::endl; std::cout << "Deserialised value: " << num << std::endl; } return 0; } Output: Deserialisation error Original numerical value: 0.0019075645054089487 Contents of stream: 0.0019075645054089487 Deserialised value: 0.0019075645054089489 This is the same result as in my original thread, so it does indeed look like a Microsoft issue with redirection (>>), not with serialisation itself. Paul

I have also re-tested this on VS 2005 release :-((( My test and/or memory was obviously faulty. Using 100 tests with nextafter, starting with the value you found faulty, I find 38 failures (all one bit wrong on input) - about the one third I found in previous tests. So I confirm my view that this is a Microsoft 'Lack Of Quality Feature'. To be fair, this is a rather hard problem, though there are proven solutions which have been proposed, but not widely implemented, as I asked about and got some info on a recent post. lists.boost.org/Archives/boost/2006/02/date.php But I fear that this does NOT help you just now. Sorry. I am having a crack at using the Burger and Dvbvig method but have yet to fully understand it. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul Giaccone | Sent: 15 March 2006 11:06 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | Paul A Bristow wrote: | | >If you want to test round-tripping on your platform and std | lib without | >actually using serialization, | > | >may I suggest a loop including something like: | > | > double a = some start value; | > double aa; // to hold the read back. | > | > std::stringstream s; | > s.precision(2+std::numeric_limits<double>::digits * 3010/10000); | > // cout << "output " << a; | > s << a; // output to string s | > //cout << ", s.str() is " << s.str(); | > s >> aa; // read back in. | > //cout << ", read back " << aa << endl; | > if (a != aa) | > { | > cout << "error " << a << tab << aa << endl; | > } | > a = nextafter(a, std::numeric_limits<double>::max()); | // Make one | >bit bigger? | > | > | > | [...] | | >This should give you a feel for the risk of failure. | > | >Paul | > | > | Funnily enough, I've just written a program to test the value I | originally posted about (below, followed by its output): | | #include <string> | #include <sstream> | #include <iostream> | #include <iomanip> | #include <limits> | | int main(void) | { | const double orig_value = 0.0019075645054089487; | std::stringstream stream; | | double num; | stream << std::setprecision(2 + | std::numeric_limits<double>::digits | * 3030/10000); | stream << orig_value; | stream >> num; | | if (num == orig_value) | { | std::cout << "Match" << std::endl; | } | else | { | std::cout << "Deserialisation error" << std::endl; | std::cout << std::setprecision(2 + | std::numeric_limits<double>::digits * 3030/10000); | std::cout << "Original numerical value: " << orig_value << | std::endl; | std::cout << "Contents of stream: " << stream.str() | << std::endl; | std::cout << "Deserialised value: " << num << std::endl; | } | | return 0; | } | | | Output: | | Deserialisation error | Original numerical value: 0.0019075645054089487 | Contents of stream: 0.0019075645054089487 | Deserialised value: 0.0019075645054089489 | | | This is the same result as in my original thread, so it does | indeed look | like a Microsoft issue with redirection (>>), not with | serialisation itself. | | Paul | | | _______________________________________________ | Unsubscribe & other changes: | http://lists.boost.org/mailman/listinfo.cgi/boost |

Oh shucks: I wonder would this be solved by using the solution I often see in the literature when an author wants to give a binary floating point value exactly: which is to represent it as an integer + a base 2 exponent. Values in that form can be serialised/deserialised exactly using ldexp/frexp, but unfortunately aren't very human readable (or rather are open to mis-interpretation because the exponent is a power of 2 not 10). This format is similar to the "A" format specifier in the C99 version of printf, and I believe the code is quite simple as well BTW, John.

As has been noted, the serialization library inherits the behavior of the basic_stream library used in its implementation. One could try another library - e.g. STLPort. If you need a solution for some specific datum in a specific application you could apply the binary_object wrapper and serialize the binary representation of the double. Of course you're back the the problem of portable representation of floats - another problem discussed on this list which has never arrived at a successful resolution. Robert Ramey John Maddock wrote:

Edward Diener wrote:
Thanks for the feedback from everyone. I should point out that this value (and all others) are not constants but rather computed by my program. They therefore have exact representations in binary, because they are the values of variables. If I were trying to serialise and deserialise pi to 100 decimal places, that would be a different matter, but these are plain old doubles. If I were to serialise them using ordinary redirection to and from a text file, I would imagine (I have not tried) that this would work. Maybe I should check that as it would indicate whether the error is with the assumption that the serialisation and deserialisation should be symmetric rather than there being something wrong with the Boost library. Paul

You are right to expect this, but apparently the Standard does not REQUIRE it. The formulae for the number of decimla digits required is given in http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf which is derived from Kahan's paper: http://http.cs.berkley.edu/~wkahan/ieee754status/ieee754.ps max_decimal_digits = 2 + significand_digits * 3010/1000 For example: #define FLT_MAXDIG10 (2+(FLT_MANT_DIG * 3010)/10000) #define DBL_MAXDIG10 (2+ (DBL_MANT_DIG * 3010)/10000) #define LDBL_MAXDIG10 (2+ (LDBL_MANT_DIG * 3010)/10000) which yield the following values on typical implementations: FLT_DIG 6, FLT_MAXDIG10 9 DBL_DIG 15, DBL_MAXDIG10 17 LDBL_DIG 19, LDBL_MAXDIG10 21 For C++, using numeric limits, So it is convenient instead to use the following formula which can be calculated at compile time: 2 + std::numeric_limits<double>::digits * 3010/10000; HOWEVER, during my tests of VS 2005 BETA, float did not read back in correctly (for 1/3 of values, off by 1 bit!), and when I queried this was claimed by Microsoft to be 'by design'. Mysteriously, in the VS 2005 final _release_, float and double (and thus long double == double) all work as expected for a 'quality product'. As far as I recollect, VS 7.1 also worked correctly for all FP types, so I would surmise that the number of digits used for serialisation is insufficient? For double, it should be 17. There do appear to be 17 digits, so I am slightly puzzled. But: I:\boost-06-01-13-0500\boost\archive\basic_text_oprimitive.hpp(124) uses os << std::setprecision(std::numeric_limits<double>::digits10 + 2); and I suggest that this should be: os << std::setprecision(2 + std::numeric_limits<double>::digits * 3030/10000); HTH Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul Giaccone | Sent: 14 March 2006 17:39 | To: boost@lists.boost.org | Subject: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | I'm having problems with deserialising floating-point (double) values | that are written to an XML file. I'm reading the values back in and | comparing them to what I saved to ensure that my file has | been written | correctly. However, some of the values differ in about the | seventeenth | significant figure (or thereabouts). | | I thought Boost serialization used some numerical limit to make sure | that values are serialised exactly to full precision, so what is | happening here? | | Example: | Value in original object, written to file: 0.0019075645054089487 | Value actually stored in file (by examination of XML file): | 0.0019075645054089487 [identical to value written to file] | Value after deserialisation: 0.0019075645054089489 | | It looks like there is a difference in the least-significant bit, as | examining the memory for these two values gives: | | Original value: b4 83 9b ca e7 40 5f 3f | Deserialised value: b5 83 9b ca e7 40 5f 3f | | (where the least-significant byte is on the left) | | Note the difference in the first bytes. | | I'm using Boost 1.33.1 with Visual Studio 7.1.3088 in debug mode. | | Paul

Paul A Bristow wrote:
Yes, fair enough, but that wouldn't make any difference in this case. The problem is clearly with the *de*serialisation. Tracing back through the functions, this seems to be the function (in basic_text_iprimitive.hpp) that reads doubles from XML files: void load(T & t) { if(is.fail()) boost::throw_exception(archive_exception(archive_exception::stream_error)); is >> t; } This suggests that it is a broken feature of Microsoft's operator>> for doubles read from filestreams. Paul

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul Giaccone | Sent: 15 March 2006 10:46 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | The problem is clearly with the *de*serialisation. | | Tracing back through the functions, this seems to be the function (in | basic_text_iprimitive.hpp) that reads doubles from XML files: | | void load(T & t) | { | if(is.fail()) | | boost::throw_exception(archive_exception(archive_exception::st | ream_error)); | is >> t; | } | | This suggests that it is a broken feature of Microsoft's | operator>> for doubles read from filestreams. I agree with you and suggest some simple tests as my previous mail to confirm this. But perhaps stringstreams are different from filestreams? My recollection is that MS VC8 worked correctly both double and float for stringstreams. But do check. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Janek Kozicki | Sent: 15 March 2006 12:56 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | Paul A Bristow said: (by the date of Wed, 15 Mar 2006 | 11:00:23 -0000) | | > | The problem is clearly with the *de*serialisation. | > | | > | Tracing back through the functions, this seems to be the | function (in | > | basic_text_iprimitive.hpp) that reads doubles from XML files: | > | | > | void load(T & t) | > | { | > | if(is.fail()) | > | | > | boost::throw_exception(archive_exception(archive_exception::st | > | ream_error)); | > | is >> t; | > | } | > | | > | This suggests that it is a broken feature of Microsoft's | > | operator>> for doubles read from filestreams. | | | why not just read the string, ans use boost::lexical_cast<double> ? | | -- | Janek Kozicki I fear that lexical_cast uses the same method of reading from a stringstream :-(( And that our testing of lexical_cast is insufficient to catch it. Some very rough tests of lexical_cast suggest that several percent of not very random double values fail to loopback ( 1 bit wrong), but I don't have time to work on this at present. IMO this seems a Microsoft problem/feature. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Paul A Bristow said: (by the date of Wed, 15 Mar 2006 15:23:50 -0000)
but if during saving lexical_cast<string>(some_double_value) is used IMHO no data should be lost at all. operator<< for string cannot make a mistake. Also operator>> for string shouldn't make a mistake, otherwise copying around textual data with << and >> should lead to data corruption. (think copying text file using microsoft's operators << and >> ) bear in mind that lexical_cast<string>(foo) converts double to string without any data loss (that's the exact purpose of lexical_cast). Then later if this data is treated as string then it's not possible to loss any information (otherwise we have a plain corruption of textual data). so during save you convert double to string, then save. when you load it - you load a string, then convert it to double. operator>> is not converting anything, it just works with strings. -- Janek Kozicki |

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Janek Kozicki | Sent: 15 March 2006 18:47 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | Paul A Bristow said: (by the date of Wed, 15 Mar 2006 | 15:23:50 -0000) | | > | > | This suggests that it is a broken feature of Microsoft's | > | > | operator>> for doubles read from filestreams. | > | | > | | > | why not just read the string, ans use | boost::lexical_cast<double> ? | > | | > | -- | > | Janek Kozicki | > | > I fear that lexical_cast uses the same method of reading | from a stringstream | > :-(( | > | > And that our testing of lexical_cast is insufficient to catch it. | > | > Some very rough tests of lexical_cast suggest that several | percent of not | > very random double values fail to loopback ( 1 bit wrong), | but I don't have | > time to work on this at present. | | but if during saving lexical_cast<string>(some_double_value) is used | IMHO no data should be lost at all. operator<< for string | cannot make a | mistake. Also operator>> for string shouldn't make a mistake, | otherwise | copying around textual data with << and >> should lead to | data corruption. | (think copying text file using microsoft's operators << and >> ) | | bear in mind that lexical_cast<string>(foo) converts double to string | without any data loss (that's the exact purpose of lexical_cast). Then | later if this data is treated as string then it's not possible to loss | any information (otherwise we have a plain corruption of | textual data). | | so during save you convert double to string, then save. | when you load it - you load a string, then convert it to double. std::stringstream stream; double num; stream << std::setprecision(3 + std::numeric_limits<double>::digits * 3030/10000); stream << orig_value; stream >> num; // <<<<<<<<<< This is where I believe it goes wrong, sometimes, by one 1 bit :-(( so orig_value != num. Or am I misunderstanding your suggestion? Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Paul A Bristow said: (by the date of Wed, 15 Mar 2006 20:15:10 -0000)
try following modifications in above code. If you still get a mistake by one bit, then .... well, I'd be very surprised. #include <boost/lexical_cast.hpp> std::stringstream stream; double num; stream << boost::lexical_cast<std::string>(orig_value); std::string tmp; stream >> tmp; num=boost::lexical_cast<double>(tmp); -- Janek Kozicki |

On 3/15/06, Janek Kozicki <janek_listy@wp.pl> wrote:
try following modifications in above code. If you still get a mistake by one bit, then .... well, I'd be very surprised.
Prepare to be suprised. Here's the exact code I compiled: #include <iostream> #include <sstream> #include <cassert> #include <boost/lexical_cast.hpp> int main () { std::stringstream stream; double orig_value = 0.0019075645054089487; stream << boost::lexical_cast<std::string> (orig_value); double num = boost::lexical_cast<double> (stream.str()); assert (num == orig_value); } On gcc + Linux this fails: lc: lc.cpp:12: int main(): Assertion `num == orig_value' failed. Breakpoint 1, main () at lc.cpp:12 12 assert (num == orig_value); (gdb) print num $3 = 0.0019075645054089489 (gdb) print orig_value $4 = 0.0019075645054089487 On MSVC 8, the program also asserts and the values are similarly mismatched: orig_value 0.0019075645054089487 double num 0.0019075645054089489 double Note that boost::lexical_cast uses a precision of std::numeric_limits<T>::digits10 + 1 in its T-to-string conversions. For double, this is 16 which would probably explain the mismatch on the 17th significant digit on two separate platforms. -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein said: (by the date of Wed, 15 Mar 2006 18:19:45 -0500)
ok, so I am surprised :) Therefore I back off ;) but honestly - there must be a bug somewhere (I belive it's in lexical_cast), right? Maybe authors of lexical_cast should add one significant place to the string? If not - then why this won't work? Is it really impossible to avoid losing data in conversions? best if one of boost::lexical_cast authors would answer.... -- Janek Kozicki |

No, as already discussed the problem is that many iostreams libraries do not round trip the binary floating point representation to decimal and back again. This is technically possible to do (albeit with quite heroic efforts), but apparently std lib vendors don't consider it crucial :-( That was why I suggested using the C99 style hex format for floats in the serialisation lib: that format is both portable and readily round-trippable since there's no binary-to-decimal conversion involved (just binary to hex). John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 16 March 2006 10:28 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisationoffloating-point values | | > but honestly - there must be a bug somewhere (I belive it's in | > lexical_cast), right? Maybe authors of lexical_cast should add one | > significant place to the string? If not - then why this | won't work? Is | > it really impossible to avoid losing data in conversions? | | No, as already discussed the problem is that many iostreams | libraries do not | round trip the binary floating point representation to | decimal and back | again. This is technically possible to do (albeit with quite heroic | efforts), but apparently std lib vendors don't consider it crucial :-( | | That was why I suggested using the C99 style hex format for | floats in the | serialisation lib: that format is both portable and readily | round-trippable | since there's no binary-to-decimal conversion involved (just | binary to hex). Anything that guarantees a round trip MUST be a good. (Getting output and input right would be even better! There are papers which present methods for doing it which claim to be proven correct - but these are not the methods used for popular implmentations.) Is a hex fully portable? It surely just promises to be as close as the FP representation will allow? Or should we store the FP representation in the serialization and only deserialize if it matches exactly? This sounds a prudent move to me. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Um, what do you mean by fully portable? It is in the sense that: * If you do a write-then-read cycle on the same machine you get back exactly the same result. * If you do a write-then-read cycle on different machines you only get the same result back if the machine reading the value has at least as many bits in it's mantissa as the machine used for writing. But that goes without saying really. I guess I really should put my money where my mouth is and present some sample code, I'll see what I can do later.... John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 16 March 2006 13:43 | To: boost@lists.boost.org | Subject: Re: | [boost][serialization]Serialisation/deserialisationoffloating- | point values | | > Anything that guarantees a round trip MUST be a good. | > | > (Getting output and input right would be even better! There are | > papers which present methods for doing it which claim to be proven | > correct - but these are not the methods used for popular | > implmentations.) | > | > Is a hex fully portable? It surely just promises to be as close as | > the FP representation will allow? Or should we store the FP | > representation in the serialization and only deserialize if it | > matches exactly? This sounds a prudent move to me. | | Um, what do you mean by fully portable? It is in the sense that: | | * If you do a write-then-read cycle on the same machine you | get back exactly | the same result. | * If you do a write-then-read cycle on different machines you | only get the | same result back if the machine reading the value has at | least as many bits | in it's mantissa as the machine used for writing. But that | goes without | saying really. | | I guess I really should put my money where my mouth is and | present some | sample code, I'll see what I can do later.... | | John. That would be excellent. My suggestion is to store the FP format somehow and somewhere in the serialization. http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html#tables lists half a dozen IEEE formats, so a single byte would suffice, but it might be better to cater for User Defined Types by storing the number of significand and exponent bit counts separately? Some 128-bit types like doubledouble Darwin and 265-bit reals are in use, as well as arbitrary precision like NTL ZZ. Some users might also want to use 'exact reals', for example http://keithbriggs.info/xrc.html (note that C and C++ implmentations exist) so it might be useful to cater for this as well. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Hmm - I don't remember that suggestion. How does using hex address the placement of the decimal point? Assuming it managed the issues of lost precision, I would think that those using a text type serialization format would expect an intuitively readable floating point representation. Robert Ramey John Maddock wrote:

Caleb Epstein wrote:
so it looks like lexical_cast has a bug (repeated) as well then see Paul Bristow's comment earlier, here is the code in lexical_cast.hpp. lexical_stream() { stream.unsetf(std::ios::skipws); if(std::numeric_limits<Target>::is_specialized) stream.precision(std::numeric_limits<Target>::digits10 + 1); else if(std::numeric_limits<Source>::is_specialized) stream.precision(std::numeric_limits<Source>::digits10 + 1); } Should both be tweaked to add the extra digits: stream.precision(2+std::numeric_limits<Target>::digits * 3010/10000); and stream.precision(2+std::numeric_limits<Source>::digits * 3010/10000); or am I missing the boat on my quick inspection Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

Actually looking at this further... boost_1_33_1/libs/numeric/conversion/test/bounds_test.cpp: cout << setprecision( std::numeric_limits<long double>::digits10 ) ; boost_1_33_1/libs/numeric/conversion/test/traits_test.cpp: std::cout << std::setprecision( std::numeric_limits<long double>::digits10 ) ; boost_1_33_1/libs/numeric/conversion/test/converter_test.cpp: std::cout << std::setprecision( std::numeric_limits<long double>::digits10 ) ; boost_1_33_1/libs/numeric/interval/examples/io.cpp: boost::io::ios_precision_saver state(stream, std::numeric_limits<T>::digits10); boost_1_33_1/libs/numeric/interval/examples/io.cpp: boost::io::ios_precision_saver state(stream, std::numeric_limits<T>::digits10); May also want to be looked at in terms of 'recomending to the users by example' the correct thing todo. (this was based upon a quick grep of the code BTW) Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Caleb Epstein | Sent: 15 March 2006 23:20 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisationoffloating-point values | | On 3/15/06, Janek Kozicki <janek_listy@wp.pl> wrote: | > | > | > try following modifications in above code. If you still get | a mistake by | > one bit, then .... well, I'd be very surprised. | | | Prepare to be suprised. | | Here's the exact code I compiled: | | #include <iostream> | #include <sstream> | #include <cassert> | #include <boost/lexical_cast.hpp> | | int main () | { | std::stringstream stream; | double orig_value = 0.0019075645054089487; | stream << boost::lexical_cast<std::string> (orig_value); | double num = boost::lexical_cast<double> (stream.str()); | assert (num == orig_value); | } | | On gcc + Linux this fails: | | lc: lc.cpp:12: int main(): Assertion `num == orig_value' failed. | | Breakpoint 1, main () at lc.cpp:12 | 12 assert (num == orig_value); | (gdb) print num | $3 = 0.0019075645054089489 | (gdb) print orig_value | $4 = 0.0019075645054089487 | | On MSVC 8, the program also asserts and the values are | similarly mismatched: | | orig_value 0.0019075645054089487 double | num 0.0019075645054089489 double | | Note that boost::lexical_cast uses a precision of | std::numeric_limits<T>::digits10 | + 1 in its T-to-string conversions. For double, this is 16 | which would | probably explain the mismatch on the 17th significant digit | on two separate | platforms. Well I have pointed this mistake out over two years ago, but it still hasn't been changed. I have to say I think that this is a bit poor. Our testing of this very widely used utility is also not up to Boost standards either. Sadly though, I fear this is not the only problem. I think that there is also a problem in the Microsoft input string to double, even with enough decimal digits, for a small proportion of decimal digits strings. There testing / quality aspriation is obviously not brilliant either. I may get round to checking this out more fully later. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Note the difference between the "definition" formulae of 3010/10000 and the suggested formulae using 3030/10000. Perhaps this is on purpose, if not may explain why the tests done later in this thread wich use the 3030/10000 version had troubles? ;;peter Paul A Bristow wrote:

Ooops - this is a typo. It should of course be 3010/10000. (All this is because floating point calculations, especially log10(2) = 0.3010.... can't be done at compile time - a shame because it could be - and has received some consideration for the next C++0x). Sorry. Paul | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Broadwell | Sent: 15 March 2006 20:43 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | Note the difference between the "definition" formulae of 3010/10000 | and the suggested formulae using 3030/10000. | | Perhaps this is on purpose, if not may explain why the tests done | later in this thread wich use the 3030/10000 version had troubles? | | ;;peter | | | Paul A Bristow wrote: | > [...] | > For C++, using numeric limits, | > | > So it is convenient instead to use the following formula | which can be | > calculated at compile time: | > 2 + std::numeric_limits<double>::digits * 3010/10000; | > | > [...] | > and I suggest that this should be: | > | > os << std::setprecision(2 + std::numeric_limits<double>::digits * | > 3030/10000); | > | > HTH | > | > Paul | > | > | > | _______________________________________________ | Unsubscribe & other changes: | http://lists.boost.org/mailman/listinfo.cgi/boost |

On 3/17/06, Paul A Bristow <pbristow@hetp.u-net.com> wrote:
Ooops - this is a typo.
It should of course be 3010/10000.
Which is one of the reasons magic numbers in code are best avoided. It sure would be nice if one could just use numeric_limits<T>::digits10 + 2 instead of numeric_limits<T>::digits * 3010 / 10000, but the former gives a different result for float (8 instead of 9). Perhaps this cryptic calculation might be best addressed by a boost::numeric_limits<T> which could extend std::numeric_limits<T> and include Paul Bristow's proposed max_digits10 (see http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf)? -- Caleb Epstein caleb dot epstein at gmail dot com

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Caleb Epstein | Sent: 17 March 2006 15:13 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisationoffloating-point values | | On 3/17/06, Paul A Bristow <pbristow@hetp.u-net.com> wrote: | > | > Ooops - this is a typo. | > | > It should of course be 3010/10000. | | Which is one of the reasons magic numbers in code are best avoided. Touche! Case proven! | It sure would be nice if one could just use | numeric_limits<T>::digits10 + 2 instead | of numeric_limits<T>::digits * 3010 / 10000, but the former gives a | different result for float (8 instead of 9). | | Perhaps this cryptic calculation might be best addressed by a | boost::numeric_limits<T> which could extend std::numeric_limits<T> and | include Paul Bristow's proposed max_digits10 (see | http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1822.pdf)? Well this would be faster than the glacial speed of simple no-brain changes like this to Standards. And while we are at it, the macros I proposed to WG14 for C could also be added, in case these are more convenient for (C++ AND C) users. http://www2.open-std.org/JTC1/SC22/WG14/www/docs/n1151.pdf Date: 2005-11-30, version 1 For example: #define FLT_MAXDIG10 (2+(FLT_MANT_DIG * 3010)/10000) #define DBL_MAXDIG10 (2+ (DBL_MANT_DIG * 3010)/10000) #define LDBL_MAXDIG10 (2+ (LDBL_MANT_DIG * 3010)/10000) which yield the following values on typical implementations: FLT_DIG 6, FLT_MAXDIG10 9 DBL_DIG 15, DBL_MAXDIG10 17 LDBL_DIG 19, LDBL_MAXDIG10 21 Should it go into boost/detail/limits.hpp? I don't feel qualfied to do this, but it is long overdue to sort out this trivial problem. (and get on with the much more important problem of failure of round-tripping of all uses of stringstream, crucially lexical_cast and serialization). Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

If you are using >> to convert decimal digit strings to floating-point and expect to get **exactly** the right result, read on. There was some discussion in this thread some weeks ago and agreement that there was a problem with serialization of floating point (and with lexical cast). Although the change is only 1 bit, if you repeatedly read back and re-serialized floating-points, the values would drift 1 bit each time. I've now found a (some - quite a few) moments to look into this. The basic problem is failure to 'round-trip/loopback' float f = ?; // should work for ALL values, both float and double. std::stringstream s; // or files. s.precision(max_digits10); // 9 decimal digits for 32-bit float, 17 for 64-bit double. s.str().erase(); // see note below on why. s << f; // Output to string. float rf; s >> rf; // Read back into float. assert(f == rf); // Check get back **exactly** the same. With MSVC, the problem is with s >> rf; For some values, the input is a single least significant bit wrong (greater). The ***Good News*** is that, unlike what I found for VS 7.1, where 1/3 of float values are read in 1 bit wrong, VS 8.0 works correctly in release mode for ALL 32-bit float values. (Digression - because of the memory leak in stringstream in VS 8.0 (it is disgraceful that we haven't had an SP1 for this), the naïve test runs out of real and virtual memory after half an hour if you try a brute force loop re-creating stringstream for each value. So it is necessary (and quicker) to create the string just once and erase the string contents before each test. I used my own nextafterf to test all 2130706431 float values and it took 70:53 (must get my new dual-core machine going ;-). The ***Bad News*** is that, as shown by John Maddock, for double there is a bizarre small range of values where every third value of significand are read in one bit wrong. Murphy's law applies - it is fairly popular area. Of course, testing all the double values would take longer than some of us are likely to be above ground to be interested in the result ;-) So I created vaguely random double values using 5 15-bit rand() calls to fill all the bits, and then excluding NaN and infs. (Unlike the more expertly random John Maddock, I decided it was best to keep it simple to submit as a bug report to MS rather than any of the Boost fancy randoms - which in any case seem to have bits which never get twiddled - not my idea of random - but then I am not a statistican or mathematican.) For example: Written : 0.00019879711946838022 == 3f2a0e8640d90401 Readback : 0.00019879711946838024 == 3f2a0e8640d90402 << note 1 bit greater. This shows that failed 77 out of 100000 double values, fraction 0.0007. The range of 'wrong' reads is roughly shown by wrong min 0.00013372562138477771 == 3f2187165749cbef wrong max 0.0038160481887855135 == 3f6f42d545772497 I suspect the 'bad' range is more like 0.0001 to 0.005 from some runs. All have an exponent in the range 3f2 to 3f6. And if you use nextafter to test sucessive double values in this range, each 3rd value is read in 'wrong. I think we really can claim this is 'a bug not a feature' (MS reponse to my complaint about 7.1 floats) and I will submit this soon. With the info above, it should be possible to find the obscure mistake. I suspect this problem exists in many previous MS versions. I doubt even Dinkumware would apply an extensive random double value test like this - it takes some time to run. If anyone wants to test other compilers, please mail me and I will dump my crude test in the vault. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul Giaccone | Sent: 14 March 2006 17:39 | To: boost@lists.boost.org | Subject: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | I'm having problems with deserialising floating-point (double) values | that are written to an XML file. I'm reading the values back in and | comparing them to what I saved to ensure that my file has | been written | correctly. However, some of the values differ in about the | seventeenth | significant figure (or thereabouts). | | I thought Boost serialization used some numerical limit to make sure | that values are serialised exactly to full precision, so what is | happening here? | | Example: | Value in original object, written to file: 0.0019075645054089487 | Value actually stored in file (by examination of XML file): | 0.0019075645054089487 [identical to value written to file] | Value after deserialisation: 0.0019075645054089489 | | It looks like there is a difference in the least-significant bit, as | examining the memory for these two values gives: | | Original value: b4 83 9b ca e7 40 5f 3f | Deserialised value: b5 83 9b ca e7 40 5f 3f | | (where the least-significant byte is on the left) | | Note the difference in the first bytes. | | I'm using Boost 1.33.1 with Visual Studio 7.1.3088 in debug mode. | | Paul

Paul A Bristow wrote:
(Digression - because of the memory leak in stringstream in VS 8.0 (it is disgraceful that we haven't had an SP1 for this),
Another digression. VS2002 had a single SP1 issued in 2005, long after most people switched to the essentially free ( shipping cost from MS of DVDs ) VS2003 upgrade. VS2003 has never had an SP issued. VC 8.0 is in VS2005 and, at the rate which MS has released SPs for the previous releases... <g>. MS did issue workarounds for problems in VS200(2,3) if one reported bugs to them verbally. Whether they still do I do not know. They have an online bug reporting system at http://lab.msdn.microsoft.com/productfeedback/default.aspx. So far they appear to be very responsive to any bugs reported.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Edward Diener | Sent: 04 April 2006 17:36 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | | Paul A Bristow wrote: | > | > (Digression - because of the memory leak in stringstream in | VS 8.0 (it is | > disgraceful that we haven't had an SP1 for this), | | Another digression. | | They have an online bug reporting system at | http://lab.msdn.microsoft.com/productfeedback/default.aspx. | So far they appear to be very responsive to any bugs reported. A workaround for the problem above has been issued - but it involves re-compiling to produce a new .dll, a significant hassle. My view is that this is not good enough - the very least is issue of a new .dll. But an SP1 would be better - there seem to be number of things they have fixed. I have expressed this view to their forum (and I am not alone!). The response to the original report on 7.1 loopback was http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=7 bf2f26d-171f-41fe-be05-4169a54eef9e http://tinyurl.com/mpk72 essentially "it's a feature" - but then it was fixed in 8.0! So don't hold your breath on this - VS 2008?? I suspect that the Fair Dinkumware version may be corrected already - would anyone like to test? Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Paul A Bristow wrote: ...
I noticed only a single vote on the importance of this bug( which is now at 2 with my vote :-) ). Perhaps if all of those interested added there voices something would get done. Does Herb Sutter have a louder voice in that regard? Also the bug (it's still listed as bug I think), only addresses float and not double. Did you ever enter one for double? I too would like to see a vc7.1 SP. I just finally moved the last of our products from vc6.5, and don't have the desire to move to vc8 just yet. :) Jeff

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Jeff Flinn | Sent: 04 April 2006 19:17 | To: boost@lists.boost.org | Subject: Re: [boost] | [serialization]Serialisation/deserialisationoffloating-point values | | Also the bug (it's still listed as bug I think), only | addresses float and not double. | Did you ever enter one for double? No because we have only just clearly identified it. A problem with numbers in the range 0.005 to 0.0001 is pretty bizarre! I will now. | I too would like to see a vc7.1 SP. I just finally moved the | last of our products from vc6.5, | and don't have the desire to move to vc8 just yet. I can imagine you are a bit tired, but my impression is that 8.0 is at last std, and I found few things that stopped working on upgrade to 8.0 from 7.1. But your milrage may vary ;-) Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Paul A Bristow wrote:
MS has been in no-SP mode since VS .NET has come out.
It is demoralizing but you have to argue with MS once they gave one of their inane responses justifying bugs as "as designed".
So don't hold your breath on this - VS 2008??
Possibly, but of course there is no guarantee it will be fixed even in a future new release. But I think once you force MS to admit that something is truly a bug, they will fix it. This is much better than Borland who, even when they knew a bug existed, simply decided to ignore both the bug report or any future fix.
I suspect that the Fair Dinkumware version may be corrected already - would anyone like to test?
What is Fair Dinkumware ?

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Edward Diener | Sent: 04 April 2006 21:50 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisation offloating-point values | > I suspect that the Fair Dinkumware version may be corrected | already - would | > anyone like to test? | | What is Fair Dinkumware ? See www.dinkumware.com for the origin of their company name. "Fair dinkum" is Australian for "genuine" "kosher" "authentic" ... Their pay-extra for software from MS usually contains later corrections. So those who fear trouble with serialization of doubles might try throwing money at the problem. Perhaps some Booster is using Dinkumware Library and would like to test? A quick test shows the problem: #include <iostream> #include <sstream> #include <cassert> int main () { std::stringstream stream; double orig_value = 0.0019075645054089487; stream.precision(17); // max_digits10 for double stream << orig_value; // write out. double num; stream >> num; // read back assert (num == orig_value); // should be the same! } and my test program explores it more fully - mail me for a copy. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Apologies - this test does NOT fail for me - that will teach me to actually run things before I post. I get failures with random doubles, but I can't reproduce the fault simply like this. I am looking into this further, but I smell wild geese. Sorry. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul A Bristow | Sent: 05 April 2006 10:32 | To: boost@lists.boost.org | Subject: Re: [boost] [serialization] | Serialisation/deserialisationoffloating-point values | | A quick test shows the problem: | | #include <iostream> | #include <sstream> | #include <cassert> | | int main () | { | std::stringstream stream; | double orig_value = 0.0019075645054089487; | stream.precision(17); // max_digits10 for double | stream << orig_value; // write out. | double num; | stream >> num; // read back | assert (num == orig_value); // should be the same! | } | | and my test program explores it more fully - mail me for a copy. |

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Damien Fisher | Sent: 06 April 2006 16:23 | To: boost@lists.boost.org | Subject: Re: [boost] | [serialization]Serialisation/deserialisationoffloating-point values | | >From your previous comments it sounds like there is a real | problem in MSVC. | But from this comment it sounds like maybe not? Or maybe I'm | misunderstanding your comment (i.e., does it just apply to | the example you gave?). It helps to use debug mode if you expect an assert to fire ;-) Doh! Attached is a test that DOES fail for me, and I will post a fuller test to the vault when I can. Sorry for noise. Paul PS A possible workaround: std::stringstream s; ss << scientific << output_value; works for all cases tried (100000000). This will increase the size of the XML (for exmaple) archive file, of course, but... PPS I note that if you repeatedly serialize and restore then you may get a 1 bit creep up each time - it could build up to significant differences. -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul A Bristow | Sent: 06 April 2006 17:34 | To: boost@lists.boost.org | Subject: Re: | [boost][serialization]Serialisation/deserialisationoffloating-point values loopback.zip with two tests now in Boost vault. http://www.boost-consulting.com/vault/index.php?action=downloadfile&filename =loopback.zip&directory=& http://tinyurl.com/fbqz6 Comments and reports for other compilers/versions welcome. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

I have reported this as a bug to Microsoft http://lab.msdn.microsoft.com/ProductFeedback/viewfeedback.aspx?feedbackid=c 1f1ea71-2f7b-4ac1-b75b-68370c367aae aka http://tinyurl.com/rvp4j with the following reply: "Resolved as By Design by Microsoft on 2006-04-19 at 13:50:29 Thanks for the report. We don't agree with the premise of your bug. Because of the imprecise nature of floating point, exact comparisions are never appropriate. Round-tripping through all the machinery of input and output passes through various representations, and cannot be guaranteed to be identical to the original. Martyn Lovell Development Lead Visual C++ Libraries " The Standard is imprecise on this issue, but I feel it is a very poor do that such a bizarre small range of values should be wrong. Feels like an off-by-one rounding bug to me. Does anyone have the Pukker Dinkumware library and would like to carry out the same test on their version? Paul | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul A Bristow | Sent: 06 April 2006 17:34 | To: boost@lists.boost.org | Subject: Re: | [boost][serialization]Serialisation/deserialisationoffloating- | point values I have posted a fuller test to the vault. | | PS A possible workaround: | | std::stringstream s; | ss << scientific << output_value; | | works for all cases tried (100000000). | | This will increase the size of the XML (for exmaple) archive file, of | course, but... | | PPS I note that if you repeatedly serialize and restore then | you may get a 1 | bit creep up each time - it could build up to significant differences. | | -- | Paul A Bristow | Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB | Phone and SMS text +44 1539 561830, Mobile and SMS text +44 | 7714 330204 | mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html | http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | | | | |

Paul A Bristow wrote:
I tend to agree with the MS engineers here. I've found out only yesterday that the FPU/math library is not entirely deterministic in some calculations (including square roots and trigonometry, typical 3d stuff), so I think worrying about serialization/deserialization is useless. Sebastian Redl

IMO The real issue here is not whether round-tripping should work, but that input from a long enough decimal digit string should always give you the nearest floating-point representation. For float this is true, and for double is almost (but not quite) true. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Sebastian Redl | Sent: 02 May 2006 15:55 | To: boost@lists.boost.org | Subject: Re: [boost] | [serialization]Serialisation/deserialisationoffloating-point values | | Paul A Bristow wrote: | | >The Standard is imprecise on this issue, but I feel it is a | very poor | >do that such a bizarre small range of values should be wrong. | > | >Feels like an off-by-one rounding bug to me. | > | > | I tend to agree with the MS engineers here. I've found out | only yesterday that the FPU/math library is not entirely | deterministic in some calculations (including square roots | and trigonometry, typical 3d stuff), so I think worrying | about serialization/deserialization is useless. | | Sebastian Redl | _______________________________________________ | Unsubscribe & other changes: | http://lists.boost.org/mailman/listinfo.cgi/boost |

Could we consider removing the word "serialization" from this subject here? This is really about rendering floating point numbers between binary and decimal representations. And for better worse, serialization on this list has come to mean something else. Also, a real contribution could be made to address the variety of NaN's and how they should be rendered in different systems. This is currently undefined and will create problems porting data between programs compiled with diferent compilers and/or libraries. Robert Ramey Paul A Bristow wrote:

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey | Sent: 02 May 2006 19:33 | To: boost@lists.boost.org | Subject: Re: | [boost][serialization]Serialisation/deserialisationoffloating | -point values | | Could we consider removing the word "serialization" from | this subject here? This is really about rendering floating | point numbers between binary and decimal representations. OK - we are veering OT - but it started with a serialization problem and I thought a new item would confuse the thread. I think we have reached the end of the road and await the next Microsoft release. I will start of new thread if I have anything more to report. Meanwhile, if you want to serialize doubles on MSVC, I suggest considering using << setprecision(17) << scientific. | Also, a real contribution could be made to address the | variety of NaN's and how they should be rendered in | different systems. This is currently undefined and will | create problems porting data between programs compiled with | diferent compilers and/or libraries. Jeff Garland made a suggestion: "The types in date_time have the ability to serialize and deserialize NADT (not a date time), -infinity and +infinity. Why couldn't there be a simple extension to the numpunct<charT> facet to define an appropriate output string? Basically something like: //see Langer and Kreft p 414... template<class charT> class numpunct : public locale::facet { //new functions for nan and infinity string_type not_a_number_name() const; string_type infinity_name() const; And you'd have to fix num_get as well." but it's not something I have time or skill or inclination to tackle. SoC? Or switch at least NaNs and Infs to a hex format as sketched by John Maddock? Or get WG21 to agree a Standard representation for NaNs and Infs instead of everyone doing their own thing? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS

On 5/2/06, Paul A Bristow <pbristow@hetp.u-net.com> wrote:
[i've only read the abstract below, not the whole article, but if you want to pursue your goal (always getting the nearest floating point representation), you should check it first -- my impression from the abstract that it is not generally possible (using fixed precision arithmetic). of course it might still be possible to find an algorithm for any given precision, but most likely that isn't trivial either: at least the author seems to be happy with his result of getting the closest floating point number about 99% of the time] William D. Clinger How to read floating point numbers accurately Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation Abstract Consider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the IEEE Standard for Binary Floating-Point Arithmetic does not require the result of such a conversion to be the best approximation. This paper presents an efficient algorithm that always finds the best approximation. The algorithm uses a few extra bits of precision to compute an IEEE-conforming approximation while testing an intermediate result to determine whether the approximation could be other than the best. If the approximation might not be the best, then the best approximation is determined by a few simple operations on multiple-precision integers, where the precision is determined by the input. When using 64 bits of precision to compute IEEE double precision results, the algorithm avoids higher-precision arithmetic over 99% of the time. The input problem considered by this paper is the inverse of an output problem considered by Steele and White: Given a binary floating point number, print a correctly rounded decimal representation of it using the smallest number of digits that will allow the number to be read without loss of accuracy. The Steele and White algorithm assumes that the input problem is solved; an imperfect solution to the input problem, as allowed by the IEEE standard and ubiquitous in current practice, defeats the purpose of their algorithm. br, andras

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Andras Erdei | Sent: 03 May 2006 21:24 | To: boost@lists.boost.org | Subject: Re: | [boost][serialization]Serialisation/deserialisationoffloating | -point values | | On 5/2/06, Paul A Bristow <pbristow@hetp.u-net.com> wrote: | > | > IMO The real issue here is not whether round-tripping | should work, but that | > input from a long enough decimal digit string should | always give you the nearest floating-point representation. | > | > For float this is true, and for double is almost (but not quite) true. | | [i've only read the abstract below, not the whole article, | | William D. Clinger | How to read floating point numbers accurately Proceedings | of the ACM SIGPLAN 1990 conference on Programming language | design and implementation I've actually read the article! - though I must confess I didn't understand it all ;-( However 'correct' input seems to be achieved by gcc, and by MSVC for float, and almost for double. So I stick to my guess that it is a bug. It shouldn't be a feature IMO. But I don't think there is anything that we should do (except be aware of the potential problem). It is up to Microsoft and/or Dinkumware to do nothing or resolve. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS

Sebastian Redl wrote:
But we're not talking about calculations here - you always get rounding error in floating-point calculations, and we handle it accordingly. But if I write out 1.2345 and it becomes 1.2346 when I read it back, that's cause for concern. Paul

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul Giaccone | Sent: 02 May 2006 17:46 | To: boost@lists.boost.org | Subject: Re: [boost] | [serialization]Serialisation/deserialisationoffloating-point values | > | >>The Standard is imprecise on this issue, but I feel it is | a very poor | >>do that such a bizarre small range of values should be wrong. | >> | >>Feels like an off-by-one rounding bug to me. | But we're not talking about calculations here - you always | get rounding error in floating-point calculations, and we | handle it accordingly. But if I write out 1.2345 and it | becomes 1.2346 when I read it back, that's cause for concern. That is, in effect, what is happening - but at the 17th decimal digit. I suspect a rounding in the operator>> code - but I am only wildly guessing ;-) Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS

On Tue, May 02, 2006 at 04:54:40PM +0200, Sebastian Redl wrote:
Do you have example code / pointers to documentation for that? I've always been under the impression that basic math is deterministic regardless of IEEE compliance, and would really like to know if/where there are cases where that doesn't hold. Thanks, Geoffrey

I disagree, it is certainly possible to serialise/deserialise exactly, glibc manages it OK, so I see no reason why MS can't. Square roots are exactly-rounded under IEEE arithmetic BTW, as are the usual + - * / operators: it's the functions that may return transcendental values (cos sin exp, pow) which can never give exact answers purely as a matter of principal: although in practice most implementations are last-bit-correct for the vast majority of inputs.
I assume you've read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" at http://docs.sun.com/source/806-3568/ncg_goldberg.html ? John.

On Tue, May 02, 2006 at 06:33:30PM +0100, John Maddock wrote:
In practice, nothing can be assumed to be exactly rounded if it goes through a decent set of optimizations, since the compiler gets to choose when values go in and out of 80 bit registers.
Actually I have not read it in its entirety, though I will do that shortly. I have skimmed it and read similar things in the past. However, I couldn't find any mention of determinism in that document. There are plenty of discussions of non-portability, but the determinism question is separate. Specifically, I've been assuming the following: If I have a function that accesses no global source of nondeterministic (e.g., other global variables, threads, etc.), and I compile it once into a separate translation unit from whatever calls it (to avoid inlining or other interprocedural weirdness), and call it twice on the same machine at different times with exactly the same bits as input, I will get the same result. I also usually assume that the compiler is determistic given the same set of optimization flags on the same machine with the same environment. If this assumption is false, it would be great to understand why. If the answer is contained in that document, I apologize in advance for doubting it. Thanks, Geoffrey

Ah, that's a whole other issue: having an extended 80-bit double really screws things up because you can get double-rounding of a result pushing it off by one bit. Machines that don't have that data type, or if you force the Intel FPU into 64-bit mode, don't have that problem I believe. I'm not entirely sure, but I believe that the AMD64 model effectively deprecates the old x87 80-bit registers infavour of 64-bit SMD registers, so again the problem goes away there.
Actually I have not read it in its entirety, though I will do that shortly.
Good luck, if a tough read in places, but very useful.
Correct.
Yep, but the IEEE standard is much stronger than that: you will get exactly the same result from the same input on different machines and/or architectures. In practice certain optimisations can mess things up, as can the 80-bit double rounding problem, but we're remarkably close to that result even now. Of course this assumes you don't make any std lib calls, since the quality of implementation of exp/pow etc can vary quite a bit. John.

On Wed, May 03, 2006 at 10:37:04AM +0100, John Maddock wrote:
A whole other issue? The addendum to Goldberg seems to think it's pretty important...
Yes. That's nice, but not portable.
Yes, but "remarkably close as long as you never call exp" is irrelevant. The initial point was that exact serialization was unimportant because math operations are nondetermistic. This is false because in practice, floating point computations, even those that call exp, on all current implementations, are deterministic. If you do a trillion operations, dump the results to disk, and do a trillion more, you can repeat the last trillion by reading them back from disk...if you have exact serialization (or you wrote binary). Sorry if I confused the issue by seeming to not understand floating point. I was really just trying to humbly correct the nondeterminism post. I'll correct with statements from now on. Thanks, Geoffrey
participants (14)
-
Andras Erdei
-
Caleb Epstein
-
Damien Fisher
-
Edward Diener
-
Geoffrey Irving
-
Janek Kozicki
-
Jeff Flinn
-
John Maddock
-
Kevin Wheatley
-
Paul A Bristow
-
Paul Giaccone
-
Peter Broadwell
-
Robert Ramey
-
Sebastian Redl