A question regarding serialization lib

Dear experts, I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place? Thanks in advance, Arunava.

Hmmmm - a very interesting question I've never considered. All the serialization tests do the following: create a structure, serialize to a file, load the file to a new structure and check for equality. I would expect that in your test text1.doc would be identical to text2.doc. UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc. There may be other cases, but that's all that occurs to me right now. Robert Ramey Arunava Saha wrote:
Dear experts,
I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place?
Thanks in advance,
Arunava.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

----- Mensaje original ----- De: Robert Ramey <ramey@rrsd.com> Fecha: Domingo, Diciembre 30, 2007 6:20 pm Asunto: Re: [boost] A question regarding serialization lib Para: boost@lists.boost.org
Hmmmm - a very interesting question I've never considered.
All the serialization tests do the following: create a structure, serialize to a file, load the file to a new structure and check for equality.
I would expect that in your test text1.doc would be identical to text2.doc.
UNLESS you have have floats/double in your classes. In general, it cannotbe guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers representedas decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.
There may be other cases, but that's all that occurs to me right now.
Off the top of my hat, I think there's another case where text1 is not identical to text2, namely when using hashed containers: If a rehash occurs during container population, the resulting traversal order won't be the same as the original. I don't know if this applies to the OP scenario. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 16:53 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib
I would expect that in your test text1.doc would be identical to text2.doc.
UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.
The chances of this a very small 1 in 3 values of a very narrow range- and only MS is known to have this problem as the link below shows. The following feedback item you submitted at Microsoft Connect has been updated: Product/Technology - Visual Studio and .NET Framework Feedback ID - 98770 Feedback Title - Decimal digit string input to double may be 1 bit wrong. The following fields or values changed: Field Status changed from [Resolved] to [Closed] To view these changes, click the following link, or paste the link into your web browser: http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?Feedbac... (requires sign-in) Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com
Arunava Saha wrote:
Dear experts,
I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Paul A Bristow wrote:
The chances of this a very small 1 in 3 values of a very narrow range-
I disagree with this. I don't think the chances are very small.
only MS is known to have this problem
I don't think it's isolated to microsoft. I think it's an inherent limitation of trying to represent some binary fractions exactly as decimal fractions.
as the link below shows.
I don't think the link shows that.
Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error.
I think microsoft's response in this case is exactly correct. Robert Ramey

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 18:59 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib
Paul A Bristow wrote:
The chances of this a very small 1 in 3 values of a very narrow range-
I disagree with this. I don't think the chances are very small.
This is not conjecture - it was an observed behaviour as my original report shows. It only emerged by accident - and was only confirmed by a random sampling of round-tripping values. My understanding was that other compilers did NOT show this problem.
only MS is known to have this problem
I don't think it's isolated to microsoft. I think it's an inherent limitation of trying to represent some binary fractions exactly as decimal fractions.
I am confident from my investigation that it is failure to provide the *nearest* representable floating-point value from a decimal digit string - but, bizarrely, only in a very small region, (IIRC from 0.0001 to 0.0005). With this (as is provided by the C++ *compiler* standard when 'reading' decimal digit strings into floating-points like float, double & long double), you can round-trip to decimal digit strings and back - provided of course that you use enough decimal digits. But there is no similar requirement on reading with std:: iostreams, perhaps surprisingly (I think the authors assumed it but didn't think to specify it). (Of course, you can only expect 'round-tripping' to work if the floating point format is the same). The same applies to lexical_cast. So an answer to the first question: probably - but if the floating-point types differ, then all float-point value 'round-tripped' will be a few bits wrong, and you might get a tiny proportion wrong for the reasons in my original (rejected) report to Microsoft. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Paul A Bristow wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 16:53 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib
I would expect that in your test text1.doc would be identical to text2.doc.
UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.
The chances of this a very small 1 in 3 values of a very narrow range- and only MS is known to have this problem as the link below shows.
The following feedback item you submitted at Microsoft Connect has been updated:
Product/Technology - Visual Studio and .NET Framework Feedback ID - 98770 Feedback Title - Decimal digit string input to double may be 1 bit wrong.
The following fields or values changed:
Field Status changed from [Resolved] to [Closed]
To view these changes, click the following link, or paste the link into your web browser:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?Feedbac... (requires sign-in)
Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error.
What is most amusing is that under .Net there is actually a format specifier ('r') for converting a .Net String to and from a double which guarantees that the textual representation will stay exactly the same. Yet at the same time you are told in response to your report that "Round-tripping through all the machinery of input and output passes through various representations, and cannot be guaranteed to be identical to the original." However I did not see a way of guaranteeing this in native C++ under VC++.
participants (5)
-
"JOAQUIN LOPEZ MU?Z"
-
Arunava Saha
-
Edward Diener
-
Paul A Bristow
-
Robert Ramey