A question regarding serialization lib

older
Re: [boost] [Boost-users] [review]...

Arunava Saha

30 Dec 2007 30 Dec '07

4:17 p.m.

Dear experts, I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place? Thanks in advance, Arunava.

Show replies by date

Robert Ramey

30 Dec 30 Dec

4:52 p.m.

Hmmmm - a very interesting question I've never considered. All the serialization tests do the following: create a structure, serialize to a file, load the file to a new structure and check for equality. I would expect that in your test text1.doc would be identical to text2.doc. UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc. There may be other cases, but that's all that occurs to me right now. Robert Ramey Arunava Saha wrote:

...

Dear experts,

I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place?

Thanks in advance,

Arunava.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"JOAQUIN LOPEZ MU?Z"

5:36 p.m.

----- Mensaje original ----- De: Robert Ramey <ramey@rrsd.com> Fecha: Domingo, Diciembre 30, 2007 6:20 pm Asunto: Re: [boost] A question regarding serialization lib Para: boost@lists.boost.org

...

Hmmmm - a very interesting question I've never considered.

All the serialization tests do the following: create a structure, serialize to a file, load the file to a new structure and check for equality.

I would expect that in your test text1.doc would be identical to text2.doc.

UNLESS you have have floats/double in your classes. In general, it cannotbe guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers representedas decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.

There may be other cases, but that's all that occurs to me right now.

Off the top of my hat, I think there's another case where text1 is not identical to text2, namely when using hashed containers: If a rehash occurs during container population, the resulting traversal order won't be the same as the original. I don't know if this applies to the OP scenario. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Paul A Bristow

5:39 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 16:53 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib

...

I would expect that in your test text1.doc would be identical to text2.doc.

UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.

The chances of this a very small 1 in 3 values of a very narrow range- and only MS is known to have this problem as the link below shows. The following feedback item you submitted at Microsoft Connect has been updated: Product/Technology - Visual Studio and .NET Framework Feedback ID - 98770 Feedback Title - Decimal digit string input to double may be 1 bit wrong. The following fields or values changed: Field Status changed from [Resolved] to [Closed] To view these changes, click the following link, or paste the link into your web browser: http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?Feedbac... (requires sign-in) Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

...

Arunava Saha wrote:

...
Dear experts,

I am using boost serialization library to serialize a huge database (has a lot of stl containers, shared pointers etc. ). I have a general question regarding serialization lib of boost. Suppose I have serialized a database in text format (say, text1.doc), cleared the existing database, import the exported one back and re-export (say, text2.doc). Now, should I see any difference between text1.doc and text2.doc? Actually I am seeing some difference. If I shouldn't, can someone point me to the common mistakes one can do for which it differences can take place?

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey

6:59 p.m.

Paul A Bristow wrote:

...

The chances of this a very small 1 in 3 values of a very narrow range-

I disagree with this. I don't think the chances are very small.

...

only MS is known to have this problem

I don't think it's isolated to microsoft. I think it's an inherent limitation of trying to represent some binary fractions exactly as decimal fractions.

...

as the link below shows.

I don't think the link shows that.

...

Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error.

I think microsoft's response in this case is exactly correct. Robert Ramey

Paul A Bristow

10:58 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 18:59 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib

Paul A Bristow wrote:

...
The chances of this a very small 1 in 3 values of a very narrow range-

I disagree with this. I don't think the chances are very small.

This is not conjecture - it was an observed behaviour as my original report shows. It only emerged by accident - and was only confirmed by a random sampling of round-tripping values. My understanding was that other compilers did NOT show this problem.

...

...
only MS is known to have this problem

...

I don't think it's isolated to microsoft. I think it's an inherent limitation of trying to represent some binary fractions exactly as decimal fractions.

I am confident from my investigation that it is failure to provide the *nearest* representable floating-point value from a decimal digit string - but, bizarrely, only in a very small region, (IIRC from 0.0001 to 0.0005). With this (as is provided by the C++ *compiler* standard when 'reading' decimal digit strings into floating-points like float, double & long double), you can round-trip to decimal digit strings and back - provided of course that you use enough decimal digits. But there is no similar requirement on reading with std:: iostreams, perhaps surprisingly (I think the authors assumed it but didn't think to specify it). (Of course, you can only expect 'round-tripping' to work if the floating point format is the same). The same applies to lexical_cast. So an answer to the first question: probably - but if the floating-point types differ, then all float-point value 'round-tripped' will be a few bits wrong, and you might get a tiny proportion wrong for the reasons in my original (rejected) report to Microsoft. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Edward Diener

9:14 p.m.

Paul A Bristow wrote:

...

...
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 30 December 2007 16:53 To: boost@lists.boost.org Subject: Re: [boost] A question regarding serialization lib

...
I would expect that in your test text1.doc would be identical to text2.doc.

UNLESS you have have floats/double in your classes. In general, it cannot be guarenteed that there is a one-to-one correspondence between floatiing point values as represented by binary in ram to text numbers represented as decimal based text. So in this case I would expect text1.doc to not be identical to text2.doc.

The chances of this a very small 1 in 3 values of a very narrow range- and only MS is known to have this problem as the link below shows.

The following feedback item you submitted at Microsoft Connect has been updated:

Product/Technology - Visual Studio and .NET Framework Feedback ID - 98770 Feedback Title - Decimal digit string input to double may be 1 bit wrong.

The following fields or values changed:

Field Status changed from [Resolved] to [Closed]

To view these changes, click the following link, or paste the link into your web browser:

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?Feedbac... (requires sign-in)

Amazingly (to me at least) this is a 'feature' and may not be fixed - despite it looking very much like a simple out-by-1 error.

What is most amusing is that under .Net there is actually a format specifier ('r') for converting a .Net String to and from a double which guarantees that the textual representation will stay exactly the same. Yet at the same time you are told in response to your report that "Round-tripping through all the machinery of input and output passes through various representations, and cannot be guaranteed to be identical to the original." However I did not see a way of guaranteeing this in native C++ under VC++.

6400

Age (days ago)

6400

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

"JOAQUIN LOPEZ MU?Z"
Arunava Saha
Edward Diener
Paul A Bristow
Robert Ramey