Serialization with NaN and infinity

newer
Boost General Interest mailing list

older
Problem compiling the libraries...

Austin Bingham

5 Jan 2005 5 Jan '05

10:43 p.m.

I saw a thread on this topic from a few months ago, and I was wondering if there had been any progress or new thought on the topic. Essentially, it appears that the serialization library can't reliably deserialize the serialized version of NaN and infinity for doubles and floats. This seems to be a result of relying on the, AFAIK, undefined behavior of writing NaN/infinity to a stream; it will work correctly with some standard library implementations, but not all. I think the problem can be addressed with these changes: basic_text_oprimitive::save(float or double): on NaN or Infinity, write out some known, stable string (i.e. "nan" or "inf"); don't rely on std implementation. basic_text_iprimitive::load(float or double): look for the "known values" printed by save(), generating the correct values when they're seen. That's a lot of hand waving, to be sure, but something like this would really help out. Of course, a better solution would be fine with me, but there are definitely cases where it's necessary to serialize these values. Currently, there doesn't seem to be a way to reliably to it. Austin Bingham

Show replies by date

Robert Ramey

6 Jan 6 Jan

1:17 a.m.

The simple truth is I never consider this. When it came up the last time I didn't really think about it very much as I was involved in other things and I hoped intereste parties might come to a consensus without my having to bend my over-stretched brain. Austin Bingham wrote:

...

I saw a thread on this topic from a few months ago, and I was wondering if there had been any progress or new thought on the topic. Essentially, it appears that the serialization library can't reliably deserialize the serialized version of NaN and infinity for doubles and floats. This seems to be a result of relying on the, AFAIK, undefined behavior of writing NaN/infinity to a stream; it will work correctly with some standard library implementations, but not all.

I think the problem can be addressed with these changes:

basic_text_oprimitive::save(float or double): on NaN or Infinity, write out some known, stable string (i.e. "nan" or "inf"); don't rely on std implementation.

basic_text_iprimitive::load(float or double): look for the "known values" printed by save(), generating the correct values when they're seen.

I'm not convinced this would work very well. when loading a data value, you have to know ahead of time what type its going to be - float or string. Here are a couple random thoughts on this issue a) I believe that native binary archives will handle this without out change as they just copy the bits to the archive and back. As long as you read the archive on the same compiler/os/machine, there should be no issue. b)define a special type for Nan: class NanType {}; Use variant serialization ar << boost::variant<NanType, float>(value) boost::variant x; ar >> x; Of course this presumes that one has implemented serialization for boost::variant. I havn't done this but I did receive code from someone who did. I wanted to upload it to the boost file section but the fiile section was full. I noted this on this list but so far no one has responded. c) a simpler approximation of the above could easily be made. class NanOrNot { bool Nan; float & value; // its its not a Nan template<class Archive> void save(Archvive &ar, unsigned int version){ ar << Nan; if(! Nan) ar << Value; } template<class Archive> void load(Archive &ar, unsigned int version{ ar >> Nan; if(! Nan) ar >> Value; } NanOrNot(float & t){ value(t) { // initialize Nan } BOOST_MEMBER_SPLIT }; ... float x; ... ar << NanOrNot(x) etc. which i believe is more or less what you have in mind. This is would be a serialization wrapper which is explained in the documentation. An example of a complete serialization wrapper is NameValuePair. Once you had a wrapper you could just This approach would have a couple of valuable features: 1) its usage is optional. This would keep machine cycle misers happy. 2) it wouldn't require changing any archive class implemention - this would keep me happy. Just random ideas - I'm not going to start defending them. Robert Ramey

Austin Bingham

3:54 p.m.

...

I'm not convinced this would work very well. when loading a data value, you have to know ahead of time what type its going to be - float or string.

As I understand things, istream::peek() is always going to work, meaning that you could check to see if the next char is, for example, an 'n'. If so, this would indicate that nan was written; otherwise, a normal float could be read. At least in the toy code I've written, this works. This approach (assuming it works completely) has the benefit of not requiring extra information pertaining to the nan-ness of the value. It has the downside, as you point out, of taking some extra cycles. I'll address this in bit.

...

b)define a special type for Nan: c) a simpler approximation of the above could easily be made. ... The problems I have with these approaches deal, essentially, with the cognitive load on the programmer. Now a serialization lib user has to remember to use the wrappers if dealing with NaN, or face the wrath of a compiler that is not going to tell you what broke when you try to read a NaN. Maybe this is not as big a deal as I suppose, but I can envision scenarios where this would be a problem.

These approaches (although I haven't seen the variant serialization solution) would incur extra storage for each float/double. So, taking all of your comments into account (I hope), I have another idea. Would it be possible to make the text-primitive functionality of xml and text archives a programmer modifiable property? The most obvious approach, I think, would be to give the archive_impls a template parameter of TextPrimitive. Rather than having a hard-coded inheritance from basic_text_i/oprimitive, this TextPrimitive would be the base class. By default, of course, the basic_text_primitives would be used, but alternatives could be supplied by anyone. This has, I think, the great benefit of keeping the primitive representation and overall file structure orthogonal. Again, I'm waving my hands a lot here, but I don't see any reasons in the code why this couldn't be done, but neither do I have intimate knowledge of the code. Austin Bingham

Robert Ramey

5:39 p.m.

Austin Bingham wrote:

...

...
b)define a special type for Nan: c) a simpler approximation of the above could easily be made. ... The problems I have with these approaches deal, essentially, with the cognitive load on the programmer. Now a serialization lib user has to remember to use the wrappers if dealing with NaN, or face the wrath of a compiler that is not going to tell you what broke when you try to read a NaN. Maybe this is not as big a deal as I suppose, but I can envision scenarios where this would be a problem.

The problem is that someone is going to say "I don't need this and I don't want to slow down my application" or something like that. My method permits one to choose weather or not Nan is going to get special attention on an item by item basis.

...

These approaches (although I haven't seen the variant serialization solution) would incur extra storage for each float/double.

The variant serialization is mentioned as an incentive to get someone interested in implementing this. This wouldn't be that hard, but could be a little bit subject to contraversy depending on the implementation. Not that the serialization wrapper I proposed could be implemented differently for native binary files - which don't need anything special. This would give each platform what it needs.

...

So, taking all of your comments into account (I hope), I have another idea. Would it be possible to make the text-primitive functionality of xml and text archives a programmer modifiable property?

This is pretty much what the wrapper functionality above does.

...

The most obvious approach, I think, would be to give the archive_impls a template parameter of TextPrimitive. Rather than having a hard-coded inheritance from basic_text_i/oprimitive, this TextPrimitive would be the base class. By default, of course, the basic_text_primitives would be used, but alternatives could be supplied by anyone.

On the other hand, one could modify the code so the default is to flag Nan on text primitives and require usage of the wrapper to override it.

...

This has, I think, the great benefit of keeping the primitive representation and overall file structure orthogonal. Again, I'm waving my hands a lot here, but I don't see any reasons in the code why this couldn't be done, but neither do I have intimate knowledge of the code.

Well, everything is doable, the problem is coming to agreement on what to do. Actually, my main reluctance is really just inertia. If I had thought about this point long ago, I probably would have included it the text primitives. If one is using a text archive, the extra overhead of using a Nan flag is not going to be noticiable. If efficiency at this level is a concern, one is going to be using a native binary archive anyway. Adding in this in to the text primitives would require that I go investigate Nan and what it means in different environments (e.g. IEEE 80 bit ) and to what extent there are portable functions for checking whether or not a float, double, (complex ?) is a Nan. I'm also a little concerned at this point of invalidating portable text archives created by previous versions. So its really is just inertia. (I'm also bogged down in other stuff now) Robert Ramey

Austin Bingham

5:48 p.m.

...

a) I believe that native binary archives will handle this without out change as they just copy the bits to the archive and back. As long as you read the archive on the same compiler/os/machine, there should be no issue.

I think I've been misinterpreting this bit here. I originally took this to mean that the binary archives are not portable across platforms. Is this the case, or is it just that NaN specifically isn't portable (to non-IEEE 754 machines, I guess)? In the end, we're hoping to use the binary format anyway, and i can work around the text format limitations since I'll only be using it for debugging. Austin

Robert Ramey

6:53 p.m.

native binary archives are generally NOT portable accross platforms. This is highlighted in the documentation an is the motivating factor in preparing the demo demo_portable archive. Text base archives are meant to be portable, but of course at the cost of speed and archive size. I did a tiny bit of googling and found that Nan isn't the only issue. There is also +/- INF . So the whole subject would need a more thorough treatment Robert Ramey Austin Bingham wrote:

...

...
a) I believe that native binary archives will handle this without out change as they just copy the bits to the archive and back. As long as you read the archive on the same compiler/os/machine, there should be no issue.

I think I've been misinterpreting this bit here. I originally took this to mean that the binary archives are not portable across platforms. Is this the case, or is it just that NaN specifically isn't portable (to non-IEEE 754 machines, I guess)? In the end, we're hoping to use the binary format anyway, and i can work around the text format limitations since I'll only be using it for debugging.

7513

Age (days ago)

7514

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Austin Bingham
Robert Ramey