serialization - NaN, +/Inf and others

Troy S. and I have been looking at the question of implementing serialization of NaN, +/-Inf for floats, and doubles for portable text or binary archives. Turns out there's a fly in the ointment. For determining if a given double contains a "special" value, float.h contains the following handy function int _fpclass( double x ); which returns one of the following values: _FPCLASS_SNAN Signaling NaN _FPCLASS_QNAN Quiet NaN _FPCLASS_NINF Negative infinity ( -INF) _FPCLASS_NN Negative normalized non-zero _FPCLASS_ND Negative denormalized _FPCLASS_NZ Negative zero ( - 0) _FPCLASS_PZ Positive 0 (+0) _FPCLASS_PD Positive denormalized _FPCLASS_PN Positive normalized non-zero _FPCLASS_PINF Positive infinity (+INF) So we could write a flag to the archive indicating if its a special value. So far, so good. When the archive is read back, we can read the flag and initialize the variable with the appropriate value. BUT - I can't find any "official" to initialize a float/double to any of these values. They seem to be the result of operations and its certainly not obvious that all compilers would be on the same page here. Note that this same problem arises whenever a float/double is written/read to/from a stream in a way designed to be portable. So it must have come up before. What's the solution here? Robert Ramey

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey | Sent: 17 November 2005 18:00 | To: boost@lists.boost.org | Subject: [boost] serialization - NaN, +/Inf and others | | Troy S. and I have been looking at the question of implementing | serialization | of NaN, +/-Inf for floats, and doubles for portable text or | binary archives. | | Turns out there's a fly in the ointment. | | For determining if a given double contains a "special" value, float.h | contains the following handy function | | int _fpclass( | double x | ); | | which returns one of the following values: | | _FPCLASS_SNAN Signaling NaN | _FPCLASS_QNAN Quiet NaN | _FPCLASS_NINF Negative infinity ( -INF) | _FPCLASS_NN Negative normalized non-zero | _FPCLASS_ND Negative denormalized | _FPCLASS_NZ Negative zero ( - 0) | _FPCLASS_PZ Positive 0 (+0) | _FPCLASS_PD Positive denormalized | _FPCLASS_PN Positive normalized non-zero | _FPCLASS_PINF Positive infinity (+INF) | | | So we could write a flag to the archive indicating if its a | special value. | | So far, so good. | | When the archive is read back, we can read the flag and initialize | the variable with the appropriate value. | | BUT - I can't find any "official" to initialize a | float/double to any of | these | values. They seem to be the result of operations and its certainly | not obvious that all compilers would be on the same page here. | | Note that this same problem arises whenever a float/double is | written/read | to/from a stream in a way designed to be portable. So it | must have come | up before. What's the solution here? | | Robert Ramey Dinkumware says that C99 math.h (to be added to C++ by TR1) provides fpclassify #define fpclassify(x) <int rvalue> [added with C99, int functions in C++] The generic-function macro accepts an rvalue argument x of some real floating-point type and evaluates to: * FP_INFINITE for an argument that is positive or negative infinity * FP_NAN for an argument that is not-a-number (NaN) * FP_NORMAL for an argument that is finite and normalized * FP_SUBNORMAL for an argument that is finite and denormalized * FP_ZERO for an argument that is positive or negative zero or possibly some other implementation-defined value. But of course their VALUES are not specified in the standard. C Macros for INFINITY, NAN are provided (in C99 if not alrady) and will probably be used to implement the C++ functions which return std::numeric-limits<floating-pointType>::NaN() and ::quiet_NaN(). But you need to know the floating-point type of course. Denormalised probably don't need to be treated any differently. For hardware that doesn't deal with denormalised (VAX/ALPHA?) can't be portable anyway? The positive and negative infinties and zeros really are different in representation in MS world at least, but I fear you may just have to ignore that. http://www2.open-std.org/JTC1/SC22/WG14/www/C99RationaleV5.10.pdf does not provide any rationale for the lack of sign on infinity and zero. Getting -zero is unlikely to be a problem, but getting +infinity instead of -infinity could be most confusing - as different as you can get ;-) I suspect C99 put it in the 'too difficult' box? HTH Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com www.hetp.u-net.com

At 6:57 PM +0000 11/17/05, Paul A Bristow wrote:
fpclassify
#define fpclassify(x) <int rvalue> [added with C99, int functions in C++]
[...]
The positive and negative infinties and zeros really are different in representation in MS world at least, but I fear you may just have to ignore that.
http://www2.open-std.org/JTC1/SC22/WG14/www/C99RationaleV5.10.pdf
does not provide any rationale for the lack of sign on infinity and zero.
Getting -zero is unlikely to be a problem, but getting +infinity instead of -infinity could be most confusing - as different as you can get ;-)
I suspect C99 put it in the 'too difficult' box?
To test for negative infinity and negative zero, use fpclassify(x) and signbit(x). To build a negative infinity, I think copysign of a positive infinity and a negative sign supplier ought to work, and likely will work in practice; C99 specifies that it works for NaNs, but is actually silent about what it does for infinities (sigh!). copysign can also be used to obtain a negative zero, for those platforms which actually support negative zero.

"Robert Ramey" wrote: [ nan, infinite etc ]
BUT - I can't find any "official" to initialize a float/double to any of these values. They seem to be the result of operations and its certainly not obvious that all compilers would be on the same page here.
Note that this same problem arises whenever a float/double is written/read to/from a stream in a way designed to be portable. So it must have come up before. What's the solution here?
unsigned char nan[sizeof(double)]; nan[0] = ... .... double d; memcpy(&d, nan, sizeof(double); I assume the double/floats are by IEEE standard. /Pavel

On Thu, Nov 17, 2005 at 10:00:01AM -0800, Robert Ramey wrote:
Troy S. and I have been looking at the question of implementing serialization of NaN, +/-Inf for floats, and doubles for portable text or binary archives.
Turns out there's a fly in the ointment.
For determining if a given double contains a "special" value, float.h contains the following handy function
int _fpclass( double x );
which returns one of the following values:
_FPCLASS_SNAN Signaling NaN _FPCLASS_QNAN Quiet NaN _FPCLASS_NINF Negative infinity ( -INF) _FPCLASS_NN Negative normalized non-zero _FPCLASS_ND Negative denormalized _FPCLASS_NZ Negative zero ( - 0) _FPCLASS_PZ Positive 0 (+0) _FPCLASS_PD Positive denormalized _FPCLASS_PN Positive normalized non-zero _FPCLASS_PINF Positive infinity (+INF)
So we could write a flag to the archive indicating if its a special value.
So far, so good.
When the archive is read back, we can read the flag and initialize the variable with the appropriate value.
BUT - I can't find any "official" to initialize a float/double to any of these values. They seem to be the result of operations and its certainly not obvious that all compilers would be on the same page here.
Note that this same problem arises whenever a float/double is written/read to/from a stream in a way designed to be portable. So it must have come up before. What's the solution here?
Robert Ramey
So. The real-world use cases that brought this up are that "I overload the meaning of NaN to mean uninitialized, and/or pos/neg inf are valid values for my floats, and I want to serialize them". There are a whole spectrum of bit patterns that constitute NaN. In fact a whole bunch of bit patterns that can represent lots of numbers if you take into account denormalization and so forth: (from some website)
The 32-bit IEEE 754 representations of these values are:
Positive infinity: 0x7f800000 Negative infinity: 0xff800000 Signaling NaN: any bit pattern between 0x7f800001 and 0x7fbfffff or any bit pattern between 0xff800001 and 0xffbfffff Quiet NaN: any bit pattern between 0x7fc00000 and 0x7fffffff or any bit pattern between 0xffc00000 and 0xffffffff
I don't think XML/text archives should attempt to guarantee that floating point types that are denormalized or inf or nan, in any form, are to-the-bit identical after a trip through one of the the serialization library's text archives (and it should guarantee only that zero is still zero). It's a text archive, you get a text representation, and there is no standard for text representations of wacky floating-point types. Fullstop. One reason you would want to write an XML archive is to be able to play with it with tools independent of boost::serialization. This means you will need to be able to understand the text representations, and since no standard exists, the representation may not be too complicated, as that would be a hassle maintenance wise. If a user does want a bit-faithful round-trip, for handling of nans, infs, denormalization or what-have-you, they could, say, wrap their floats/doubles in something that serializes them as 4/8 chars, as Pavel suggests, and if they wanted to be portable w.r.t endianness they'd have to take that into account themselves in the conversion. But this would be a real fringe case. If you want bit-faithful, just use a binary archive. If you want portable bit-faithful, use a portable binary archive. Looks like John Maddock figured the general isinf()/isnan() problem out in a general way: #include <math.h> // isnan where available #include <cmath> namespace boost{ namespace math{ namespace detail{ template <class T> inline bool test_is_nan(T t) { // Comparisons with Nan's always fail: return !(t <= std::numeric_limits<T>::infinity()) || !(t >= -std::numeric_limits<T>::infinity()); } #ifdef isnan template<> inline bool test_is_nan<float>(float t) { return isnan(t); } template<> inline bool test_is_nan<double>(double t) { return isnan(t); } template<> inline bool test_is_nan<long double>(long double t) { return isnan(t); } #endif So I'd suggest the following: for some D of type T, the following expressions will evaluate the same both before and after a round-trip through a text archive: test_is_nan(D) // (also isnan(D) if available) D <= -std::numeric_limits<T>::infinity() // also (isinf(D) && (D < 0)), etc D >= std::numeric_limits<T>::infinity() No more is guaranteed. This addresses the real-world use cases without getting too intimiate with ieee754. So one could use that kind of thing to detect them, then just set them to nan/inf/-inf given by std::numeric_limits<T>::infinity() -std::numeric_limits<T>::infinity() std::numeric_limits<T>::quiet_NAN() Sound reasonable? -t

I don't think XML/text archives should attempt to guarantee that floating point types that are denormalized or inf or nan, in any form, are to-the-bit identical after a trip through one of the the serialization library's text archives (and it should guarantee only that zero is still zero). It's a text archive, you get a text representation, and there is no standard for text representations of wacky floating-point types. Fullstop.
One reason you would want to write an XML archive is to be able to play with it with tools independent of boost::serialization. This means you will need to be able to understand the text representations, and since no standard exists, the representation may not be too complicated, as that would be a hassle maintenance wise.
Actually, there is a standard for floats in XML. See http://www.w3.org/TR/xmlschema-2/#float. Of course this doesn't mean that any particular C++ implementation is able to support these values, but it does suggest the way to represent them in XML files when the implementation does support them. Incidentally I recommend the Clinger paper referenced there to anyone interested in external representation of floating point types. I also recommend The Steele et. al. paper referenced at the end of the Clinger paper.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of troy d. straszheim | Sent: 18 November 2005 15:19 | To: boost@lists.boost.org | Subject: Re: [boost] serialization - NaN, +/Inf and others | | On Thu, Nov 17, 2005 at 10:00:01AM -0800, Robert Ramey wrote: | > The 32-bit IEEE 754 representations of these values are: For the record, I found this by far the most useful site: http://babbage.cs.qc.edu/IEEE-754/IEEE-754references.html | I don't think archives should attempt to guarantee that | floating point types that are denormalized or inf or nan, in any form, | are to-the-bit identical after a trip through one of the the | serialization library's text archives (and it should guarantee only | that zero is still zero). It's a text archive, you get a text | representation, and there is no standard for text representations of | wacky floating-point types. Fullstop. Agree. | | Looks like John Maddock figured the general isinf()/isnan() problem | out in a general way: | | #include <math.h> // isnan where available | #include <cmath> | | namespace boost{ namespace math{ namespace detail{ | | template <class T> | inline bool test_is_nan(T t) | { | // Comparisons with Nan's always fail: | return !(t <= std::numeric_limits<T>::infinity()) | || !(t >= -std::numeric_limits<T>::infinity()); | } | #ifdef isnan | template<> inline bool test_is_nan<float>(float t) { return | isnan(t); } | template<> inline bool test_is_nan<double>(double t) { | return isnan(t); } | template<> inline bool test_is_nan<long double>(long double | t) { return isnan(t); } | #endif In principle, yes, but isnan() and isinf() ideally should be provided by C99/TR1. TR1 8.16.1 Synopsis // C99 macros defined as C++ templates. template<class T> bool signbit(T x); template<class T> int fpclassify(T x); template<class T> bool isfinite(T x); template<class T> bool isinf(T x); template<class T> bool isnan(T x); (For Microsoft, and I believe several other environments, they could be a _isnan() etc lightly wrapped for C++?), something like: #include <cfloat> // for MSVC <float.h> for _isnan, _finite, _fpclass & values. /* IEEE recommended functions */ #ifndef _SIGN_DEFINED _CRTIMP __checkReturn double __cdecl _copysign (__in double _Number, __in double _Sign); _CRTIMP __checkReturn double __cdecl _chgsign (__in double _X); #define _SIGN_DEFINED #endif _CRTIMP __checkReturn double __cdecl _scalb(__in double _X, __in long _Y); _CRTIMP __checkReturn double __cdecl _logb(__in double _X); _CRTIMP __checkReturn double __cdecl _nextafter(__in double _X, __in double _Y); _CRTIMP __checkReturn int __cdecl _finite(__in double _X); _CRTIMP __checkReturn int __cdecl _isnan(__in double _X); _CRTIMP __checkReturn int __cdecl _fpclass(__in double _X); #define _FPCLASS_SNAN 0x0001 /* signaling NaN */ #define _FPCLASS_QNAN 0x0002 /* quiet NaN */ #define _FPCLASS_NINF 0x0004 /* negative infinity */ #define _FPCLASS_NN 0x0008 /* negative normal */ #define _FPCLASS_ND 0x0010 /* negative denormal */ #define _FPCLASS_NZ 0x0020 /* -0 */ #define _FPCLASS_PZ 0x0040 /* +0 */ #define _FPCLASS_PD 0x0080 /* positive denormal */ #define _FPCLASS_PN 0x0100 /* positive normal */ #define _FPCLASS_PINF 0x0200 /* positive infinity */ namespace std { namespace tr1 { bool isnan(double x) { #ifdef BOOST_MSVC || ... return _isnan(x); #elif ... #else #error "isnan not available for this environment!" #endif } bool isinf(double x) #ifdef BOOST_MSVC || ... return _finite(x); } bool signbit(double x) #ifdef BOOST_MSVC || ... return _copysign(x); // or used result from fpclass. if -inf || -normal || - -zero then negative ... } } // namespace tr1 } // namespace std Or this could be done using the result from _fpclass(fpclassify in TR1), also widely available? (Feedback on any for which it is NOT available? Sadly John Maddock hasn't yet completed even these few IEEE recommended from the C99 additions? Has anyone else? Surely we should try to get these into Boost asap? | So I'd suggest the following: for some D of type T, the following | expressions will evaluate the same both before and after a round-trip | through a text archive: | | test_is_nan(D) // (also isnan(D) if available) | | D <= -std::numeric_limits<T>::infinity() // also (isinf(D) && (D < 0)), etc | | D >= std::numeric_limits<T>::infinity() | | No more is guaranteed. This addresses the real-world use cases | without getting too intimiate with ieee754. | | So one could use that kind of thing to detect them, then just set them | to nan/inf/-inf given by | | std::numeric_limits<T>::infinity() | -std::numeric_limits<T>::infinity() | std::numeric_limits<T>::quiet_NAN() So ALL the many possible NaN values will be represented as quiet_NAN(). (This would be useful if NaN is used to represent 'missing' values). Yes, although one could use fpclassify(), another TR1 function, or signbit() to detect negative zero (as well as negative infinity). But the fact that that the XML schema, http://www.w3.org/TR/xmlschema-2/#float , ignores negative zero suggests that it is not worth doing here. ('Real Mathematicians' who want this feature, or true NaN values, will have to use a binary archive, or some other workaround). I trust that this does not imply that EVERY floating point value has to include in the archive a byte signalling if it is a 'funny' number, or not? I trust I misunderstood the serialization proposed. Paul PS If you want to ONLY deal with IEEE formats, code below works - but a macro would be better to make an error ***at compile time***. Anyone know how to do this? I can't find any corresponding MACRO in C99. Do we need one? // Check is IEEE 754 - but really want a macro. if (!numeric_limits<double>::is_iec559) { // Code below is not portable. cout << "is NOT IEC559!" << endl; return -1; } -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com www.hetp.u-net.com
participants (6)
-
Jerry Schwarz
-
Kim Barrett
-
Paul A Bristow
-
Pavel Vozenilek
-
Robert Ramey
-
troy d. straszheim