[math] floating point classification - testing help wanted

For a while now I've been meaning to clean up my floating-point classification code to provide a more reliable way of testing for Nan's and infinities. The new TR1 complex number code could really use this to improved it's reliability. I believe Robert Ramey was looking for this functionality for serialisation, and frankly it's hard to write any serious floating point code without it. So... attached is a first version of an fpclassify implementation for Boost (if there's interest). As long as this one works (not a given unfortunately) then it's obviously trivial to sit isnan, isinfinity etc on top of it. So.... I'm looking for testers to run the attached program on as many different platforms/compilers as possible, basically I want to see if there's any mileage in this or if I'm on to a looser with this code! Please build the test program with optimisations turned on: many compilers make non-IEEE arithmetic assumptions once you tell them to optimise the code. Tested so far: Win32: VC6: Can't detect signaling NaN's (_isnan is broken). VC7, 7.1, 8: OK Intel 7, 8, 9: OK provided you're not using the VC6 runtime. GCC cygwin: OK. Borland 5.6.4: So broken I don't know where to begin :-( HP-UX: gcc-3.4.2: OK. aCC: OK (native fpclassify broken for long doubles, but portable version works). Tru64: cxx : OK (but can't test infinities/NaN's as they're not supported? Suse Linux: gcc-3.3.3 : OK. Many thanks, John.

This will be helpful in serializiing non-number floats and doubles. But there is even a bigger problem. That is, we haven't found a portable way to create all the various flavors of these things. For native binary archives this isn't problem as the orginal bits are recovered. But I can't see how to handle these kinds of floats in portable binary or text based arechives. Robert Ramey John Maddock wrote:
For a while now I've been meaning to clean up my floating-point classification code to provide a more reliable way of testing for Nan's and infinities.
The new TR1 complex number code could really use this to improved it's reliability. I believe Robert Ramey was looking for this functionality for serialisation, and frankly it's hard to write any serious floating point code without it.
So... attached is a first version of an fpclassify implementation for Boost (if there's interest). As long as this one works (not a given unfortunately) then it's obviously trivial to sit isnan, isinfinity etc on top of it.
So.... I'm looking for testers to run the attached program on as many different platforms/compilers as possible, basically I want to see if there's any mileage in this or if I'm on to a looser with this code!
Please build the test program with optimisations turned on: many compilers make non-IEEE arithmetic assumptions once you tell them to optimise the code.
Tested so far:
Win32: VC6: Can't detect signaling NaN's (_isnan is broken). VC7, 7.1, 8: OK Intel 7, 8, 9: OK provided you're not using the VC6 runtime. GCC cygwin: OK. Borland 5.6.4: So broken I don't know where to begin :-(
HP-UX: gcc-3.4.2: OK. aCC: OK (native fpclassify broken for long doubles, but portable version works).
Tru64: cxx : OK (but can't test infinities/NaN's as they're not supported?
Suse Linux: gcc-3.3.3 : OK.
Many thanks, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

This will be helpful in serializiing non-number floats and doubles.
But there is even a bigger problem. That is, we haven't found a portable way to create all the various flavors of these things.
For native binary archives this isn't problem as the orginal bits are recovered. But I can't see how to handle these kinds of floats in portable binary or text based arechives.
Depends what you want to recover, I'm assuming that norms, denorms and zero's are handled OK already? Infinities should be OK too, and on most platforms std::numeric_limits<T>::infinity() will get you want you want (you can always negate it as required). NaN's are a whole other issue, you should be able to get a NaN from numeric_limits<T>::quiet_NaN(), if they're supported, but it won't be the same NaN you had originally. However, an interesting property of NaN's is that: if x is a NaN then: x == x is *false* under IEEE arithmetic rules, so you can't actually tell whether you've got the same NaN back or not anyway! The only thing you can tell is the sign of the NaN: according to C99 you can get the sign of a NaN from signbit, and set the sign of a NaN with copysign, but there's no way to portably implement those two :-( In any case I'm not sure why anyone would want to care about the sign of a NaN. John.

Le jeudi 01 décembre 2005 à 16:49 +0000, John Maddock a écrit :
However, an interesting property of NaN's is that:
if x is a NaN then:
x == x
is *false* under IEEE arithmetic rules, so you can't actually tell whether you've got the same NaN back or not anyway!
The only thing you can tell is the sign of the NaN: according to C99 you can get the sign of a NaN from signbit, and set the sign of a NaN with copysign, but there's no way to portably implement those two :-(
The sign is actually the one information of a NaN you don't want to get back. The IEEE-754 standard explicitly mandates that NaN do not have a sign (or more precisely, the sign of a NaN is not part of its payload). All the other bits of a NaN value are important however. They are its payload and are supposed to be stable through arithmetic operations (except when an operation involves two NaNs). As a matter of fact, NaNs were conceived to carry information (debug information mainly). At a time, there even were discussions on revising the IEEE-754 standard to describe a syntax like NaN0x3ff800. This proposal was withdrawn because it was so high-level, but it should give an idea on what NaNs were about. Best regards, Guillaume PS: because of poor implementations of this part of the standard, NaN payloads were never widely used. So my mail is only here to clear some misconceptions on NaNs, not to influence in any way how they should be handled.

The sign is actually the one information of a NaN you don't want to get back. The IEEE-754 standard explicitly mandates that NaN do not have a sign (or more precisely, the sign of a NaN is not part of its payload).
All the other bits of a NaN value are important however. They are its payload and are supposed to be stable through arithmetic operations (except when an operation involves two NaNs). As a matter of fact, NaNs were conceived to carry information (debug information mainly).
At a time, there even were discussions on revising the IEEE-754 standard to describe a syntax like NaN0x3ff800. This proposal was withdrawn because it was so high-level, but it should give an idea on what NaNs were about.
Understood. Thanks for the information. The basic problem here is that there is no portably way to get at that information, frexp yields undefined behaviour if pass it a NaN for example :-( Likewise there is no portable way to create a NaN with a specific payload (that I'm aware of), not even in C99. John.

John Maddock wrote:
The sign is actually the one information of a NaN you don't want to get back. The IEEE-754 standard explicitly mandates that NaN do not have a sign (or more precisely, the sign of a NaN is not part of its payload).
All the other bits of a NaN value are important however. They are its payload and are supposed to be stable through arithmetic operations (except when an operation involves two NaNs). As a matter of fact, NaNs were conceived to carry information (debug information mainly).
At a time, there even were discussions on revising the IEEE-754 standard to describe a syntax like NaN0x3ff800. This proposal was withdrawn because it was so high-level, but it should give an idea on what NaNs were about.
Understood. Thanks for the information. The basic problem here is that there is no portably way to get at that information, frexp yields undefined behaviour if pass it a NaN for example :-(
Likewise there is no portable way to create a NaN with a specific payload (that I'm aware of), not even in C99.
I guess you are referring to the C99 function double nan(const char *tagp) here. C99 doesn't specify any particular semantics for tagp however, but at least among IEEE-754 platforms, is there much variation? I just tried playing around with this on g++ 3.4, and nan("str") appears to return 0x7FF8000000000000 + strtol("str") (as long as it is in range). But I can't seem to find any sensible way of reading off what the nan payload actually is. Printf() with %f or %F appears to allow an output of "NAN*" where the "*" is unspecified. But g++ doesn't actually do it, it just prints "nan" or "NAN", irrespective of the payload. I would have expected it would be symmetric with strtod, which allows notation strtod("NAN(str)") and is equivalent to nan("str") (that, at least, is C99). Ugh! Are there any floating point formats in use that allow NaN's, other than IEEE-754? It might be easier to just ignore the existing functions that have inconsistent behaviour and write our own. Cheers, Ian

I guess you are referring to the C99 function double nan(const char *tagp) here. C99 doesn't specify any particular semantics for tagp however, but at least among IEEE-754 platforms, is there much variation? I just tried playing around with this on g++ 3.4, and nan("str") appears to return 0x7FF8000000000000 + strtol("str") (as long as it is in range).
Sensible, but the meaning of the "str" part is implementation defined according to C99, so there's still not much you could rely on in portable code.
But I can't seem to find any sensible way of reading off what the nan payload actually is. Printf() with %f or %F appears to allow an output of "NAN*" where the "*" is unspecified. But g++ doesn't actually do it, it just prints "nan" or "NAN", irrespective of the payload. I would have expected it would be symmetric with strtod, which allows notation strtod("NAN(str)") and is equivalent to nan("str") (that, at least, is C99). Ugh!
Are there any floating point formats in use that allow NaN's, other than IEEE-754? It might be easier to just ignore the existing functions that have inconsistent behaviour and write our own.
Sigh, it depends on how many formats there are, and how much configuration you're prepared to engage in, float and double are probably manageable, the problems generally start with long long (Darwin's strange long double is a "pair of doubles" for example). I guess the obvious interfaces would be something like: Real nan(IntType payload); IntType payload(Real nan); John.

John Maddock wrote:
This will be helpful in serializiing non-number floats and doubles.
But there is even a bigger problem. That is, we haven't found a portable way to create all the various flavors of these things.
For native binary archives this isn't problem as the orginal bits are recovered. But I can't see how to handle these kinds of floats in portable binary or text based arechives.
Depends what you want to recover, I'm assuming that norms, denorms and zero's are handled OK already?
Actually, I don't know. The issue currently occurs with text/xml archives. stream output is used to write floats/doubles to he stream. This was based on the assumption that an input stream could read anything an output stream from the same library implementation might write. This turns out not to be the case for at least some standard library implementation. This was discovered for some library (VC 7.1) I'm not sure whether a pair of stream implementations are required to beable to do this or not.
Infinities should be OK too, and on most platforms std::numeric_limits<T>::infinity() will get you want you want (you can always negate it as required).
NaN's are a whole other issue, you should be able to get a NaN from numeric_limits<T>::quiet_NaN(), if they're supported, but it won't be the same NaN you had originally. However, an interesting property of NaN's is that:
if x is a NaN then:
x == x
is *false* under IEEE arithmetic rules, so you can't actually tell whether you've got the same NaN back or not anyway!
The only thing you can tell is the sign of the NaN: according to C99 you can get the sign of a NaN from signbit, and set the sign of a NaN with copysign, but there's no way to portably implement those two :-(
In any case I'm not sure why anyone would want to care about the sign of a NaN.
me neither. Robert Ramey

Actually, I don't know. The issue currently occurs with text/xml archives. stream output is used to write floats/doubles to he stream. This was based on the assumption that an input stream could read anything an output stream from the same library implementation might write. This turns out not to be the case for at least some standard library implementation. This was discovered for some library (VC 7.1) I'm not sure whether a pair of stream implementations are required to beable to do this or not.
I believe they should round-trip correctly, it's certainly possible to do lossless base-2 to base-10 to base-2 round tripping. However, I've reported a bug to one vendor (Intel/Linux, now fixed) when this didn't occur correctly, so these kinds of problems certainly can occur :-( I don't know what you can do about though, except point at the std lib vendor responsible and say "blame them". :-( John.

John Maddock wrote:
Tru64: cxx : OK (but can't test infinities/NaN's as they're not supported?
Did you specify the -ieee option? From cxx manpage: -ieee Support all portable features of the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754-1985), including the treatment of denormalized numbers, NaNs, infinities, and the handling of error cases. This flag also sets the _IEEE_FP macro. To flush floating point underflows to zero, add the -underflow_to_zero option. Boris

Boris Gubenko wrote:
John Maddock wrote:
Tru64: cxx : OK (but can't test infinities/NaN's as they're not supported?
Did you specify the -ieee option? From cxx manpage:
-ieee Support all portable features of the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754-1985), including the treatment of denormalized numbers, NaNs, infinities, and the handling of error cases. This flag also sets the _IEEE_FP macro. To flush floating point underflows to zero, add the -underflow_to_zero option.
I was just going to ask the same... I tried the tests in question and it works like expected when specifying -ieee on the command line. John, maybe it would be best to give a compile error on Tru64/CXX for these tests when _IEEE_FP is not defined. Markus

I tried the tests in question and it works like expected when specifying -ieee on the command line.
John, maybe it would be best to give a compile error on Tru64/CXX for these tests when _IEEE_FP is not defined.
There's no trickery to make it pass without -ieee: infinities don't get tested because numeric_limits<>::has_infinity is false. I've just checked with -ieee and then infinities and NaN's do get tested, and everything works as expected. Thanks for the heads-up. John.

John Maddock wrote:
So.... I'm looking for testers to run the attached program on as many different platforms/compilers as possible, basically I want to see if there's any mileage in this or if I'm on to a looser with this code!
Please build the test program with optimisations turned on: many compilers make non-IEEE arithmetic assumptions once you tell them to optimise the code.
Some results: Linux-x86_64, glibc-2.3.5: gcc 3.3.6 without optimizations: ok gcc 3.3.6 -ffast-math breaks (1) gcc 3.3.6 with a lot of optimizations, without -ffast-math: ok gcc 3.4.4 without optimizations: ok gcc 3.4.4 -ffast-math breaks (1) gcc 3.4.4 with a lot of optimizations, without -ffast-math: ok gcc 4.0.0 without optimizations: ok gcc 4.0.0 with a lot of optimizations, without -ffast-math: ok gcc 4.0.0 with a lot of optimizations, including -ffast-math: ok Linux-x86, glibc-2.3.5: gcc 3.2.3 without optimizations: ok gcc 3.2.3 with a lot of optimizations, including -ffast-math: ok gcc 3.3.6 without optimizations: ok gcc 3.3.6 -ffast-math breaks (1) gcc 3.3.6 with a lot of optimizations, without -ffast-math: ok gcc 3.4.4 without optimizations: ok gcc 3.4.4 -ffast-math breaks (1) gcc 3.4.4 with a lot of optimizations, without -ffast-math: ok gcc 4.0.2 without optimizations: ok gcc 4.0.2 with a lot of optimizations, without -ffast-math: ok gcc 4.0.2 with a lot of optimizations, including -ffast-math: ok icc 8.1 without optimizations, using gcc 3.3.6 libs: breaks (1) icc 8.1 with -O3, using gcc 3.3.6 libs: breaks (1) icc 8.1 without optimizations, using icc libs: ok icc 8.1 with -O3, using icc libs: ok icc 9.0 without optimizations, using gcc 3.3.6 libs: breaks (1) icc 9.0 with -O3, using gcc 3.3.6 libs: breaks (1) icc 9.0 without optimizations, using icc libs: ok icc 9.0 with -O3, using icc libs: ok I didn't use -ffinite-math-only nor -funsafe-math-optimizations flags. HTH, m 1: Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float classify.cpp(167): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [1 != 0] classify.cpp(168): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [1 != 0] classify.cpp(173): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [1 != 0] classify.cpp(174): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [1 != 0] Testing type double classify.cpp(167): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [1 != 0] classify.cpp(168): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [1 != 0] classify.cpp(173): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [1 != 0] classify.cpp(174): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [1 != 0] Testing type long double classify.cpp(167): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [2 != 0] classify.cpp(168): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [2 != 0] classify.cpp(173): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)FP_NAN failed [2 != 0] classify.cpp(174): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)FP_NAN failed [2 != 0] Send instant messages to your online friends http://au.messenger.yahoo.com

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 01 December 2005 13:29 | To: Boost mailing list; boost-testing | Subject: [boost] [math] floating point classification - | testing help wanted | | For a while now I've been meaning to clean up my floating-point | classification code to provide a more reliable way of testing | for Nan's and | infinities. | | The new TR1 complex number code could really use this to | improved it's | reliability. I believe Robert Ramey was looking for this | functionality for | serialisation, and frankly it's hard to write any serious | floating point code without it. Absolutely! | Please build the test program with optimisations turned on: | many compilers | make non-IEEE arithmetic assumptions once you tell them to | optimise the code. Ran thus on MS V 8.0 MSVC version 140050215 In release mode Had to enable C++ exceptions thus Yes With SEH Exceptions (/EHa) To get a clean compile at level 4 with no MS extensions, I had to patch ios_state.ipp #if defined(_MSC_VER) # pragma warning( push) # pragma warning(disable : 4512) // assignment operator could not be generated. #endif and #pragma (pop) at the end of the module. in test_tools.ipp added static_cast<std::streamsize>( - to avoid a warning) m_pimpl->m_pattern.ignore( static_cast<std::streamsize>(m_pimpl->m_synced_string.length() - i - suffix_size)); I added #if defined(_MSC_VER) # pragma warning(disable : 4723) // Deliberate potential divide by 0! #endif at the top of classify.cpp - looks like you really you mean to do this ;-) and #elif defined(_MSC_VER) || defined(__BORLANDC__) if(::_isnan(static_cast<double>(t))) // was if(::_isnan(t)) // warning C4244: 'argument' : conversion from 'long double' to 'double', possible loss of data // Global isnan only REALLY works for double? // _CRTIMP __checkReturn int __cdecl _isnan(__in double _X); return FP_NAN; #endif Is this really right? I would expect to have to have some other code do deal with the long double case, not just a cast? Or are we relying on MS never bothering to provide a true long double, >=80 ? If so, a comment would be helpful. From http://babbage.cs.qc.edu/IEEE-754/IEEE-754references.html#kevin_chart The range of NaNs will be quite quite different for longer long doubles? And what about _isnan(float)? It seems to pass the test but ... as only _isnan(double) is defined - as for _nextafter :-((. (Sloth at MS!) But I note that you have only tested with two of the considerable range of possible NaN. FFFFFFFF to FF800001 and 7F800001 to 7FFFFFFF for 32 bit. FFFFFFFFFFFFFFFF to FFF0000000000001 and 7FF0000000000001 to 7FFFFFFFFFFFFFFF for 64-bit Exhaustive testing would be exhausting, even for 32-bit, and I am not sure how to portably create more test values. It's really a bit twiddling job for vendors. If (or when) copysign is implemented, we can check whether -zero really is negative. I get: Running 1 test case... .\classify.cpp Thu Dec 1 18:02:20 2005 MSVC version 140050727 FP_ZERO: 0 FP_NORMAL: 1 FP_INFINITE: 2 FP_NAN: 3 FP_SUBNORMAL: 4 Testing type float Testing type double Testing type long double *** No errors detected Press any key to continue . . . Which looks fine to me. HTH Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

To get a clean compile at level 4 with no MS extensions, I had to patch
ios_state.ipp
#if defined(_MSC_VER) # pragma warning( push) # pragma warning(disable : 4512) // assignment operator could not be generated. #endif
and #pragma (pop) at the end of the module.
in test_tools.ipp added static_cast<std::streamsize>( - to avoid a warning)
m_pimpl->m_pattern.ignore( static_cast<std::streamsize>(m_pimpl->m_synced_string.length() - i - suffix_size));
That's a Boost.Test issue.
#if defined(_MSC_VER) # pragma warning(disable : 4723) // Deliberate potential divide by 0! #endif
at the top of classify.cpp - looks like you really you mean to do this ;-)
Yes :-)
and
#elif defined(_MSC_VER) || defined(__BORLANDC__) if(::_isnan(static_cast<double>(t))) // was if(::_isnan(t)) // warning C4244: 'argument' : conversion from 'long double' to 'double', possible loss of data // Global isnan only REALLY works for double?
// _CRTIMP __checkReturn int __cdecl _isnan(__in double _X);
return FP_NAN; #endif
Is this really right? I would expect to have to have some other code do deal with the long double case, not just a cast? Or are we relying on MS never bothering to provide a true long double, >=80 ? If so, a comment would be helpful.
I'm relying on the fact that a cast to a narrower or wider type will never create or destroy a NaN: we may get underflow to zero, or overflow to infinity, but otherwise once a NaN always a NaN :-)
The range of NaNs will be quite quite different for longer long doubles?
The bits are different, but a cast won't suddenly turn a NaN into something else. BTW the only reason that bit of code is in there, is to guard against optimising compilers making non-IEEE assumptions in the rest of the code (and yes the compilers exist).
And what about _isnan(float)? It seems to pass the test but ...
as only _isnan(double) is defined - as for _nextafter :-((. (Sloth at MS!)
Indeed, that's why _fpclass can only be used for type double.
But I note that you have only tested with two of the considerable range of possible NaN.
FFFFFFFF to FF800001 and 7F800001 to 7FFFFFFF for 32 bit.
FFFFFFFFFFFFFFFF to FFF0000000000001 and 7FF0000000000001 to 7FFFFFFFFFFFFFFF for 64-bit
Exhaustive testing would be exhausting, even for 32-bit, and I am not sure how to portably create more test values. It's really a bit twiddling job for vendors.
That's the problem: there is no portably way to create NaN's with other payloads, not even with ldexp.
I get:
Running 1 test case... .\classify.cpp Thu Dec 1 18:02:20 2005 MSVC version 140050727 FP_ZERO: 0 FP_NORMAL: 1 FP_INFINITE: 2 FP_NAN: 3 FP_SUBNORMAL: 4 Testing type float Testing type double Testing type long double
*** No errors detected Press any key to continue . . .
Which looks fine to me.
Good, thanks. John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 01 December 2005 19:18 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification | That's a Boost.Test issue. Already reported to Gennadiy. | I'm relying on the fact that a cast to a narrower or wider | type will never | create or destroy a NaN: we may get underflow to zero, or overflow to | infinity, but otherwise once a NaN always a NaN :-) Agreed - but worth a comment? | > But I note that you have only tested with two of the considerable | > range of possible NaN. | > | > FFFFFFFF to FF800001 and 7F800001 to 7FFFFFFF for 32 bit. | > | > FFFFFFFFFFFFFFFF to FFF0000000000001 and 7FF0000000000001 to | > 7FFFFFFFFFFFFFFF for 64-bit | > Exhaustive testing would be exhausting, even for 32-bit, | and I am not sure how to portably create more test values. | It's really a bit twiddling job for vendors. | | That's the problem: there is no portably way to create NaN's | with other payloads, not even with ldexp. That is true. And it leaves me most unsatisfied, feeling frustrated that we are not getting what the IEEE754 designers intended. We don't seem to have made ANY progress since Kahan's note in 1997 http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF http://www.cs.berkeley.edu/~wkahan/ieee754status/why-ieee.pdf I feel that there __ought__ to be some debug use in getting the NaN values, as Guillaume Melquiond noted, but do need to be able to get/put the bits. Considering the amount of MACRO bodging that Boost is doing to create portability, I wonder if we should not try to produce some way of getting/putting at the 'internal' IEEE representation at least. There are surely not THAT many combinations? We should know how many exponet and significand bits from numeric_limits (so also know if VAX/Alpha without denorm), and if big or little-endian. What others (apart from decimal, a different ball game). Stephen Moshier has already done a LOT of this bit twiddling for Cephes for IBMPC (X86), DEC (VAX) and M68000 for 32, 64, 80 and 128 bit sizes. http://www.moshier.net/#Cephes in files isnan*.c for example. (This might also help serialisation (though I am inclined to the view that getting quiet_NaN back for all NaNs (preferably signed though I'm also not fully sure why) is a reasonable expectation for a non-binary serialisation.) However, perhaps there are other priorities? Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

but otherwise once a NaN always a NaN :-) Agreed - but worth a comment?
Agreed.
That's the problem: there is no portably way to create NaN's with other payloads, not even with ldexp.
That is true.
And it leaves me most unsatisfied, feeling frustrated that we are not getting what the IEEE754 designers intended.
We don't seem to have made ANY progress since Kahan's note in 1997
http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
http://www.cs.berkeley.edu/~wkahan/ieee754status/why-ieee.pdf
Very depressing reading.
I feel that there __ought__ to be some debug use in getting the NaN values, as Guillaume Melquiond noted, but do need to be able to get/put the bits.
Considering the amount of MACRO bodging that Boost is doing to create portability, I wonder if we should not try to produce some way of getting/putting at the 'internal' IEEE representation at least. There are surely not THAT many combinations? We should know how many exponet and significand bits from numeric_limits (so also know if VAX/Alpha without denorm), and if big or little-endian. What others (apart from decimal, a different ball game).
Stephen Moshier has already done a LOT of this bit twiddling for Cephes for IBMPC (X86), DEC (VAX) and M68000 for 32, 64, 80 and 128 bit sizes.
Yep: the formats are mostly well known and the bit twiddling isn't that hard once you know what the format is, but there's at least one problem case: long double's on Darwin. I can't find any information on these after a quick google search, but I believe they're effectively a sum of two doubles. If so this is exceptionally bad form because it violates the most fundamental principal of IEEE arithmatic that a number is n*2^k. (Background: epsilon is very small for this type, it's possible to add a number "delta" to 1.0L and still get a distict value when "delta" is much smaller than the number of bit's in a long double would otherwise indicate. This makes a *lot* of IEEE based reasoning invalid. If anyone has more information on this type I'd appreciate a reference). John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 02 December 2005 12:22 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | but there's at least one | problem case: | long double's on Darwin. I can't find any information on | these after a | quick google search, but I believe they're effectively a sum | of two doubles. | If so this is exceptionally bad form because it violates the most | fundamental principal of IEEE arithmatic that a number is n*2^k. | (Background: epsilon is very small for this type, it's | possible to add a | number "delta" to 1.0L and still get a distict value when | "delta" is much | smaller than the number of bit's in a long double would | otherwise indicate. | This makes a *lot* of IEEE based reasoning invalid. If | anyone has more | information on this type I'd appreciate a reference). Is the the same as doubledouble by Keith Briggs http://keithbriggs.info/xrc.html and NTL http://www.shoup.net/ntl/ ? Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

On Fri, Dec 02, 2005 at 12:22:22PM -0000, John Maddock wrote:
Yep: the formats are mostly well known and the bit twiddling isn't that hard once you know what the format is, but there's at least one problem case: long double's on Darwin. I can't find any information on these after a quick google search, but I believe they're effectively a sum of two doubles. If so this is exceptionally bad form because it violates the most fundamental principal of IEEE arithmatic that a number is n*2^k. (Background: epsilon is very small for this type, it's possible to add a number "delta" to 1.0L and still get a distict value when "delta" is much smaller than the number of bit's in a long double would otherwise indicate. This makes a *lot* of IEEE based reasoning invalid. If anyone has more information on this type I'd appreciate a reference).
I don't have any first hand experience with Darwin's long double, but what you wrote above seems correct according to <url:http://tinyurl.com/9thr3> (<url:http://gcc.gnu.org/viewcvs/trunk/gcc/config/rs6000/darwin-ldouble-format?rev=85371&view=markup>) Regards Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html

I don't have any first hand experience with Darwin's long double, but what you wrote above seems correct according to
Actually if that's correct then things are a lot better than I feared: the text implies that the type behaves as a real type with 106-bit's in the significand. It also implies that numeric_limits<>::epsilon must be buggy for that platform. So I'm more confused than ever :-( Thanks for the reference though, that's useful. John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 02 December 2005 17:25 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | Actually if that's correct then things are a lot better than | I feared: the | text implies that the type behaves as a real type with | 106-bit's in the | significand. It also implies that numeric_limits<>::epsilon | must be buggy | for that platform. So I'm more confused than ever :-( If you want to be confused even further, attached is my inexpert stab at numeric_limits for NTL quad_float which I think may be the same. But I have asked the package author for his advice and help on this, but so far no reply (though there are regular package updates). I'm not sure that epsilon has a exactly defined value but I concluded static quad_float epsilon() throw() { return 2.4651903288156618919116517665087069677288E-32;}; // 2^(1-106) == 2^105 was my best guess. But without confirmation of this, in 'testing' NTL math functions, I cheated using the number of decimal digits quad_float tolpc = to_quad_float("1e-30"); // 31 guaranteed decimal digits, so percent tolerance is about 1e-29 BOOST_CHECK_CLOSE(x, one, tolpc); But I'm ready (anxious even) to have someone agree, or correct me. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

If you want to be confused even further, attached is my inexpert stab at numeric_limits for NTL quad_float which I think may be the same.
But I have asked the package author for his advice and help on this, but so far no reply (though there are regular package updates).
I'm not sure that epsilon has a exactly defined value but I concluded
static quad_float epsilon() throw() { return 2.4651903288156618919116517665087069677288E-32;}; // 2^(1-106) == 2^105
was my best guess.
Normally it should be std::pow(two, 1-std::numeric_limits<T>::digits), so your value sounds about right (you can also use ldexp(1.0, exponent) to achieve the same thing a little more efficiently). On Darwin numeric_limits<>::epsilon returns something like 10^-303 which is the same as the value for numeric_limts<>::min() I believe. It's just plain crazy, all IMO of course :-) John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 02 December 2005 19:45 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | >> I'm not sure that epsilon has a exactly defined value but | I concluded | >> | >> static quad_float epsilon() throw() { return | >> 2.4651903288156618919116517665087069677288E-32;}; // 2^(1-106) == | >> 2^105 | >> | >> was my best guess. | | Normally it should be std::pow(two, 1-std::numeric_limits<T>::digits), so | your value sounds about right (you can also use ldexp(1.0,exponent) | to achieve the same thing a little more efficiently). Yes - but I was hoping that this value would be computed at compile time, whereas ldexp would only be used at runtime? | On Darwin numeric_limits<>::epsilon returns something like | 10^-303 which is | the same as the value for numeric_limts<>::min() I believe. | It's just plain crazy, all IMO of course :-) Sounds wrong to me too. But it wouldn't be the first time <limits> values have been dodgy ;-) Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

On Fri, Dec 02, 2005 at 07:45:02PM -0000, John Maddock wrote:
On Darwin numeric_limits<>::epsilon returns something like 10^-303 which is the same as the value for numeric_limts<>::min() I believe. It's just plain crazy, all IMO of course :-)
I tend to agree Darwin's choice of numeric_limits<long double>::epsilon is wrong; it is certainly likely to cause problems for numerical code. I think I see where this value comes from, though: According to section 18.2.1.2/20 of the C++ standard, numeric_limits<T>::epsilon() is Machine epsilon: the difference between 1 and the least value greater than 1 that is representable. The value of a long double ld on Darwin is, according to Geoffrey Keating's explanation in the GCC tree <url:http://tinyurl.com/9thr3>, the sum of two double; let's call them hi and low. If hi == 1.0 and low == numeric_limits<double>::min(), then ld == hi + low > 1.0. If you don't look any further, then you have to conclude epsilon == low. However, Geoffrey Keating lists under "Classification" all valid bit patterns for a long double. He explicitly states: Many possible long double bit patterns are not valid long doubles. These do not represent any value. ld = hi + low as above is not among the bit patterns he lists as valid. If you apply the definition of epsilon to the "normal" long doubles only, then you get a value like the one Paul Bristow computed for NTL's quad_float. Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html

ld = hi + low as above is not among the bit patterns he lists as valid. If you apply the definition of epsilon to the "normal" long doubles only, then you get a value like the one Paul Bristow computed for NTL's quad_float.
I agree: that explanation is quite explicit when it says it follows Kahan's "double double" and the IEEE spec. So the low part must be normalised so that it's bit's "follow on" from those in the high part. As it stands, 1+numeric_limits<>::epsilon() should evaluate to 1, but we really need to check this out. Does anyone of a numerical inclination, want to run some tests on Darwin? John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 03 December 2005 19:09 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | > ld = hi + low as above is not among the bit patterns he lists as | > valid. If you apply the definition of epsilon to the "normal" long | > doubles only, then you get a value like the one Paul | Bristow computed | > for NTL's quad_float. | | I agree: that explanation is quite explicit when it says it | follows Kahan's | "double double" and the IEEE spec. So the low part must be | normalised so | that it's bit's "follow on" from those in the high part. As | it stands, | 1+numeric_limits<>::epsilon() should evaluate to 1, but we | really need to | check this out. Does anyone of a numerical inclination, want | to run some tests on Darwin? Am I right in concluding that Darwin is same as Keith Briggs doubledouble and NTL? http://www.cs.cornell.edu/Info/People/vavasis/qmg2.0/patch2_files/doubledoub le2.h and www.shoup.net/ntl/ gives references to doubledouble and quad_float. For NTL quad_float, using my 'guess' at epsilon. using NTL::quad_float; static quad_float epsilon() throw() { return 2.4651903288156618919116517665087069677288E-32;}; // 2^(1-106) == 2^105 0.246519032881566189191165176650871e-31 2.4651903288156619e-032 == 0x3960000000000000 0 == 0x0000000000000000 1 + epsilon is 1.00000000000000000000000000000002 1 == 0x3ff0000000000000 2.4651903288156619e-032 == 0x3960000000000000 So ls digits is 2 - twice what I would expect. What I get from my nextafter(1) function is 1.00000000000000000000000000000001 1 == 0x3ff0000000000000 1.2325951644078309e-032 == 0x3950000000000000 Note now get a 1 as the least significant bit. So, perhaps naively, I suspect my epsilon is twice as big as it should be. quad_float one; one = to_quad_float("1."); BOOST_CHECK(one + numeric_limits<quad_float>::epsilon() == one); ./unitTestFunc2.cpp(556): error in "log10_test_function": check one + numeric_limits<quad_float>::epsilon() == one failed as I would expect. I'd also like to have nextafter (and the other IEEE recommended) for quadfloat. My tentative stab at nextafter for quad_float is quad_float nextafter(const quad_float& x, double y = numeric_limits<double>::max()) { // Change by one unit in last place (least significant bit), default increases. // Might check that x (and y) are finite? // What about sign of x and y? double direction = (x < y) ? +numeric_limits<double>::max() : -numeric_limits<double>::max(); // If x < y then increment by one ulp, else decrement. quad_float temp = x; if (temp.lo == numeric_limits<double>::max()) { // nextafter would overflow. temp.lo = 0.; temp.hi = std::tr1::nextafter(temp.hi, direction); } else { // Normal case. temp.lo = std::tr1::nextafter(temp.lo, direction); } return temp; } // quad_float nextafter(const quad_float& x) Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

On Dec 3, 2005, at 8:09 PM, John Maddock wrote:
ld = hi + low as above is not among the bit patterns he lists as valid. If you apply the definition of epsilon to the "normal" long doubles only, then you get a value like the one Paul Bristow computed for NTL's quad_float.
I agree: that explanation is quite explicit when it says it follows Kahan's "double double" and the IEEE spec. So the low part must be normalised so that it's bit's "follow on" from those in the high part. As it stands, 1+numeric_limits<>::epsilon() should evaluate to 1, but we really need to check this out. Does anyone of a numerical inclination, want to run some tests on Darwin?
Since i asked the students in my class to write a program calculating the machine epsilon as a homework, I just ran our solution (posted at http://www.itp.phys.ethz.ch/lectures/PT/Exercises/Uebung1/Solution/ eps.cpp). Please overlook that the code is ugly, and does not use any templates - they just had one week of C++ training. Anyway, this code: long double epsilon_long_double() { long double eps = 1.; long double onepluseps; do { eps /= 2; onepluseps = 1. + eps; } while ( onepluseps != 1.); return 2*eps; } returns 0, hence there is really the problem that numeric_limits<long double>::epsilon() == numeric_limits<long double>::min() Matthias

Exercise for the students - get it to work with UDT ;-) Darwin long double is really a doubledouble - I think, so really a UDT, for which getting the righ type is critical. NTL quad_float is the same - I think. First try gave 0 for epsilon, after changing to use ALL quad_float I've got as far as: quad_float epsilon_quad_float() { using NTL::to_quad_float; quad_float one; one = to_quad_float("1.0"); quad_float two; one = to_quad_float("2.0"); quad_float eps = one; quad_float onepluseps; do { eps /= two; onepluseps = one + eps; } while (onepluseps != one); return eps + eps; } // quad_float epsilon_quad_float() but it hangs! Will look at this further.... Paul | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Matthias Troyer | Sent: 04 December 2005 19:45 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | | On Dec 3, 2005, at 8:09 PM, John Maddock wrote: | | >> ld = hi + low as above is not among the bit patterns he lists as | >> valid. If you apply the definition of epsilon to the "normal" long | >> doubles only, then you get a value like the one Paul | Bristow computed | >> for NTL's quad_float. | > | > I agree: that explanation is quite explicit when it says it | follows | > Kahan's | > "double double" and the IEEE spec. So the low part must be | > normalised so | > that it's bit's "follow on" from those in the high part. As it | > stands, | > 1+numeric_limits<>::epsilon() should evaluate to 1, but we really | > need to | > check this out. Does anyone of a numerical inclination, want to | > run some | > tests on Darwin? | | Since i asked the students in my class to write a program | calculating | the machine epsilon as a homework, I just ran our solution | (posted at | http://www.itp.phys.ethz.ch/lectures/PT/Exercises/Uebung1/Solution/ | eps.cpp). Please overlook that the code is ugly, and does not | use any | templates - they just had one week of C++ training. Anyway, this code: | | | long double epsilon_long_double() { | long double eps = 1.; | long double onepluseps; | | do { | eps /= 2; | onepluseps = 1. + eps; | } while ( onepluseps != 1.); | | return 2*eps; | | } | | | returns 0, hence there is really the problem that | numeric_limits<long | double>::epsilon() == numeric_limits<long double>::min() | | Matthias | | _______________________________________________ | Unsubscribe & other changes: | http://lists.boost.org/mailman/listinfo.cgi/boost |

Doh! Of course previous with deliberate mistake hangs... But this looks more sensible, but final eps shown is still zero. quad_float epsilon_quad_float() { using NTL::to_quad_float; quad_float one; one = to_quad_float("1.0"); quad_float two; two = to_quad_float("2.0"); quad_float eps = one; quad_float onepluseps; do { eps /= two; onepluseps = one + eps; } while (onepluseps != one); return eps + eps; } // quad_float epsilon_quad_float() gives float: sizeof(float) = 4 epsilon(float) = 1.19209e-007 std::numeric_limits<float>::epsilon() = 1.19209e-007 double: sizeof(double) = 8 epsilon(double) = 2.22045e-016 std::numeric_limits<double>::epsilon() = 2.22045e-016 long double: ,, MSVC == double sizeof(long double) = 8 epsilon(long double) = 2.22045e-016 std::numeric_limits<long double>::epsilon() = 2.22045e-016 quad_float: sizeof(quad_float) = 16 epsilon(quad_float) = 0 <<<<<<<<<<<< also zero std::numeric_limits<quad_float>::epsilon() = 0.2465190329e-31 << my 'guess' A little debugging shows eps {hi=4.9303806576313238e-032 lo=0.00000000000000000 oprec=10 } NTL::quad_float count 104 eps {hi=1.2325951644078309e-032 lo=0.00000000000000000 oprec=10 } NTL::quad_float count 106 eps {hi=9.881312916825e-324#DEN lo=0.00000000000000000 oprec=10 } NTL::quad_float at loop count 1075, eps == zero and onepluseps {hi=1.0000000000000000 lo=4.940656458412e-324#DEN oprec=10 }NTL::quad_float Note lo double has become denormalised. So what is the right value of epsilon - or rather the 'useful' value of eps? Paul | Darwin long double is really a doubledouble - I think, so really a UDT, for | which getting the right type is critical? NTL quad_float is the same - I think. | | -----Original Message----- | | From: boost-bounces@lists.boost.org | | [mailto:boost-bounces@lists.boost.org] On Behalf Of Matthias Troyer | | Sent: 04 December 2005 19:45 | | To: boost@lists.boost.org | | Subject: Re: [boost] [math] floating point classification - | | testinghelpwanted | | | | | | On Dec 3, 2005, at 8:09 PM, John Maddock wrote: | | | | >> ld = hi + low as above is not among the bit patterns he lists as | | >> valid. If you apply the definition of epsilon to the | "normal" long | | >> doubles only, then you get a value like the one Paul | | Bristow computed | | >> for NTL's quad_float. | | > | | > I agree: that explanation is quite explicit when it says it | | follows | | > Kahan's | | > "double double" and the IEEE spec. So the low part must be | | > normalised so | | > that it's bit's "follow on" from those in the high part. As it | | > stands, | | > 1+numeric_limits<>::epsilon() should evaluate to 1, but we really | | > need to | | > check this out. Does anyone of a numerical inclination, want to | | > run some | | > tests on Darwin? | | | | Since i asked the students in my class to write a program | | calculating | | the machine epsilon as a homework, I just ran our solution | | (posted at | | http://www.itp.phys.ethz.ch/lectures/PT/Exercises/Uebung1/Solution/ | | eps.cpp). Please overlook that the code is ugly, and does not | | use any | | templates - they just had one week of C++ training. Anyway, this code: | | | | | | long double epsilon_long_double() { | | long double eps = 1.; | | long double onepluseps; | | | | do { | | eps /= 2; | | onepluseps = 1. + eps; | | } while ( onepluseps != 1.); | | | | return 2*eps; | | | | } | | | | | | returns 0, hence there is really the problem that | | numeric_limits<long double>::epsilon() == numeric_limits<long double>::min() | | | | Matthias

But this looks more sensible, but final eps shown is still zero.
Note lo double has become denormalised.
So what is the right value of epsilon - or rather the 'useful' value of eps?
I haven't looked at NTL, but I have been playing with the darwin long double source, and have confirmed that you can can create numbers that have a non-contiguous series of digits. i.e. rather than R = x*2^y we have R = x1*2^y1 + x2*2^y2 where y2 < y1 - std::numeric_limits<double>::digits. If the digits were contiguous then the low part would be normalised so that y2 == y1 - std::numeric_limits<double>::digits in all cases, and the type would behave as a normal floating point number. But as it is, you get the behaviour that Paul's observed: there's no way to calculate epsilon, and maybe any value you calculate is meaningless anyway. I guess if you take epsilon as a measure of error, then the worst case, is also the usual definition of: epsilon = 2^(1-D) If there are D digits in the significand. But the best case, is the same as the formal definition (the difference between one and the next representable number), and gives: epsilon = numeric_limts<double>::denorm_min() So take your pick :-) BTW, I think I can see why they did things this way: if long double is used to temporarily increase the precision of some arithmetic operation, then potentially the format used preserves additional information: particularly for additions and subtractions, it *may* avoid catestrophic cancellation errors. The trouble is you can't easily reason about when it will do it's magic and when it won't. You can't even treat it as a number with twice the precision of a double and ignore the occational extra precision: I suspect (but can't prove)that it leads to the same problems you get with 10-byte long doubles and double-rounding and other issues. It's counter intuitive, but it's well known that there are some operations for which a little bit of extra precision can actually give you the wrong answer (see http://docs.sun.com/source/806-3568/ncg_goldberg.html#3098). Just thought I'd confuse you all some more :-) John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 05 December 2005 11:27 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | I guess if you take epsilon as a measure of error, then the | worst case, is | also the usual definition of: | | epsilon = 2^(1-D) | | If there are D digits in the significand. | | But the best case, is the same as the formal definition (the | difference | between one and the next representable number), Then for epsilon 2 ^-105 and 2 ^ -106 are below outAsHex(one + numeric_limits<quad_float>::epsilon()); // 1.00000000000000000000000000000002 1 == 0x3ff0000000000000 2.4651903288156619e-032 == 0x3960000000000000 2 ^-105 // 1.00000000000000000000000000000001 1==0x3ff0000000000000 1.2325951644078309e-032==0x3950000000000000 2 ^ -106 I note that the 2 ^ -106 value is the one I would naively expect - a single least significant bit different. I also feel it is absurd for a value for epsilon to be so much smaller than for a proper IEEE quad (128-bit) as used by Sparc? which would surely obey the 2^(1-significands) formula. | So take your pick :-) SO I don't pick min or denorm_min ;-) But I getting further, perhaps terminally, confused by using BOOST unit test and getting message like this: BOOST_CHECK_SMALL((one + numeric_limits<quad_float>::epsilon()) - one, epsilon); // Check epsilon with value 'by definition' // absolute value of (one + numeric_limits<quad_float>::epsilon()) - one{0.123259516440783094595582588325435e-31} exceeds 0.123259516440783094595582588325435e-31 Note values appear to be the same, but are one is said to exceed the other! If display in hex as well, they seem to be identical bit patterns. outAsHex(one + numeric_limits<quad_float>::epsilon() - one); // 0.246519032881566189191165176650871e-31 2.4651903288156619e-032 == 0x3960000000000000 0 == 0x0000000000000000 outAsHex(numeric_limits<quad_float>::epsilon()); // 0.246519032881566189191165176650871e-31 2.4651903288156619e-032==0x3960000000000000 0==0x0000000000000000 But both these tests check OK. BOOST_CHECK((one + numeric_limits<quad_float>::epsilon()) != one); // Check epsilon with value 'by definition' BOOST_CHECK((one + numeric_limits<quad_float>::epsilon()) >= one); // Check epsilon with value 'by definition' So I am puzzled if this is a result of the bizarre properties of doubledouble, or a problem in BOOST_CHECK_SMALL, or my understanding of it. Note that in std::numeric_limits, I have added static const int max_digits10 = 33; // maximum significant decimal digits. to ensure enough decimal digits are shown, but in any case they are shown in hex too. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

Then for epsilon 2 ^-105 and 2 ^ -106 are below outAsHex(one + numeric_limits<quad_float>::epsilon()); // 1.00000000000000000000000000000002 1 == 0x3ff0000000000000 2.4651903288156619e-032 == 0x3960000000000000 2 ^-105 // 1.00000000000000000000000000000001 1==0x3ff0000000000000 1.2325951644078309e-032==0x3950000000000000 2 ^ -106
I note that the 2 ^ -106 value is the one I would naively expect - a single least significant bit different.
Strange! The docs do make it clear that this type is rather strange: "This is a rather wierd representation; although it gives one essentially twice the precision of an ordinary double, it is not really the equivalent of quadratic precision (despite the name). For example, the number 1 + 2^{-200} can be represented exactly as a quad_float. Also, there is no real notion of "machine precision". Note that overflow/underflow for quad_floats does not follow any particularly useful rules, even if the underlying floating point arithmetic is IEEE compliant. Generally, when an overflow/underflow occurs, the resulting value is unpredicatble, although typically when overflow occurs in computing a value x, the result is non-finite (i.e., IsFinite(&x) == 0). Note, however, that some care is taken to ensure that the ZZ to quad_float conversion routine produces a non-finite value upon overflow." Ah, now hold on: when you print out your results, even if there is a 1 in the last binary digit *that doesn't mean there will be a 1 in the last hexadecimal or decimal digit*. Think about it, unless the number of binary bits fit into a whole number of bytes, the last binary 1 may be in a 1, 2, 4 or 8 position in the last byte. I actually tried to write a short program to calculate epsilon with this type, but quad_float exhibits some very strange behaviour: for example if you go on adding a single digit to one, each time one position to the right (by dividing the digit you're adding by 2 each time), then the high-part goes up to 2, and the low part become negative! It's actually correct if you think of the number as the sum of it's parts (which is less than 2), but it means that it does not behave as an N-bit number in any way shape or form. As the number increases in magnitude, the low-part actually become less-negative (smaller magnitude) as the number gets ever closer to it's high-part in value. Double weird. I'm not sure how you can meaningfully reason about such a beast to be honest. John.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock | Sent: 06 December 2005 18:40 | To: boost@lists.boost.org | Subject: Re: [boost] [math] floating point classification - | testinghelpwanted | | > Then for epsilon 2 ^-105 and 2 ^ -106 are below | > outAsHex(one + numeric_limits<quad_float>::epsilon()); | > // 1.00000000000000000000000000000002 1 == 0x3ff0000000000000 | > 2.4651903288156619e-032 == 0x3960000000000000 2 ^-105 | > // 1.00000000000000000000000000000001 1==0x3ff0000000000000 | > 1.2325951644078309e-032==0x3950000000000000 2 ^ -106 | > | > I note that the 2 ^ -106 value is the one I would naively expect - a | > single least significant bit different. | | Strange! | | The docs do make it clear that this type is rather strange: OK, at least there is no confusion about that! | Ah, now hold on: when you print out your results, even if there is a 1 in | the last binary digit *that doesn't mean there will be a 1 in the last | hexadecimal or decimal digit*. Think about it, unless the number of binary | bits fit into a whole number of bytes, the last binary 1 may | be in a 1, 2, 4 or 8 position in the last byte. But the two values printed are the same in all threee representations, quad_float, doubles hi and lo, and dougles as hex. so how are they failing the comparison. I've glanced at the quad_float operator> < and == and they look fine. My primitive routine, growed like topsy, prints the hi and lo doubles in hex separately, mapping so I don't quite follow how the hex display can not be a true representation. (Actually it would be much simpler to union the double to a _int64?) As you call nextafter repeatedly, on double or quad_float, the bits roll up from the back end as I would expect. (Nasty things happen when x.lo hits numeric_limit<double>::max, but that's surely not he problem here?) union dhex { // Used by void outAsHex(double) double d; unsigned short usa[4]; // Suits Intel 8087 FP P J Plauger C Standard p 67. }; void outAsHex(double d) { // displayAsHex dhex dd = {d}; // Initialise double dd.d = 0.0; // Assume Intel 8087 FP. // Standard C Library P J Plaguer p 67. char fill = cout.fill('0'); // Save. int fmtflags = cout.flags(); int precision = cout.precision(2 + numeric_limits<double>::digits * 3010/10000); // max_digits10() cout << dec << dd.d << "==0x" << hex << right << setw(4) << dd.usa[3] // SCCC CCCC CCCC FFFF sign & exponent. << setw(4) << dd.usa[2] // CCCC << setw(4) << dd.usa[1] // CCCC << setw(4) << dd.usa[0] // FFFF ; cout.flags(fmtflags); // Restore. cout.fill(fill); cout.precision(precision); } // void outAsHex(double d) void outAsHex(quad_float x) { // displayAsHex char fill = cout.fill('0'); int fmtflags = cout.flags(); int precision = cout.precision(2+numeric_limits<quad_float>::digits * 3010/10000); // Save. cout << right << x << ' '; // 33 decimal digits autofloat format. outAsHex(x.hi); cout << ' '; outAsHex(x.lo); cout << endl; // max_digits10 & hex formats. cout.flags(fmtflags); // Restore. cout.fill(fill); cout.precision(precision); } // outAsHex(quad_float x) I have checked a bit more, but still baffled -- and getting bored ;-) BOOST_CHECK_SMALL((one + numeric_limits<quad_float>::epsilon()) - one, epsilon); // Check epsilon with value 'by definition' // absolute value of (one + numeric_limits<quad_float>::epsilon()) - one{0.123259516440783094595582588325435e-31} exceeds 0.123259516440783094595582588325435e-31 Which is still bizarre. outAsHex(one + numeric_limits<quad_float>::epsilon() - one); // 0.246519032881566189191165176650871e-31 2.4651903288156619e-032 == 0x3960000000000000 0 == 0x0000000000000000 outAsHex(numeric_limits<quad_float>::epsilon()); // 0.246519032881566189191165176650871e-31 2.4651903288156619e-032==0x3960000000000000 0==0x0000000000000000 BOOST_CHECK((one + numeric_limits<quad_float>::epsilon()) != one); // Check epsilon with value 'by definition' - pass BOOST_CHECK((one + numeric_limits<quad_float>::epsilon()) >= one); // Check epsilon with value 'by definition' - pass BOOST_CHECK(one <= (one + numeric_limits<quad_float>::epsilon())); // Check epsilon with value 'by definition' - pass With the tolerance doubled BOOST_CHECK_SMALL((one + numeric_limits<quad_float>::epsilon()) - one, epsilon + epsilon); - pass I conclude so far that this epsilon is a reasonable approximation to an il--defined value and will _probably_ prove useful. I'm going to give up and try it out. | I actually tried to write a short program to calculate epsilon with this | type, but quad_float exhibits some very strange behaviour: <snip> | Double weird. Quadruply weird! A proper 128-bit FP would be MUCH nicer. (This sort of ill-behaviour is probably why Keith Briggs, 'whose fault all this is ' ;-) now recommends Exact Real http://keithbriggs.info/xrc.html) despite its lack of speed. Or NTL RR where you can have enough decimal places to not worry about a few dodgy ones. But then espilon is really meaningless? Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

On 12/7/05 9:45 AM, "Paul A Bristow" <pbristow@hetp.u-net.com> wrote: Just a side note, but...
void outAsHex(quad_float x) { char fill = cout.fill('0'); int fmtflags = cout.flags(); int precision = cout.precision(2+numeric_limits<quad_float>::digits * 3010/10000); // Save. [SNIP the actual printing] cout.flags(fmtflags); // Restore. cout.fill(fill); cout.precision(precision); }
..we do have RAII classes for I/O state saving & restoring. (Check out the "ROOT/boost/io/ios_state.hpp" header.) -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

I awoke with a start at 4 am realising that I need a brain transplant, urgently! Better stick to gardening on Sunday, if not the rest of the week. Paul

John Maddock wrote:
The new TR1 complex number code could really use this to improved it's reliability. I believe Robert Ramey was looking for this functionality for serialisation, and frankly it's hard to write any serious floating point code without it.
I'm using it for hashing floating point numbers so I'm very interested. I'll look at it in detail this weekend.
Tested so far:
Win32: VC6: Can't detect signaling NaN's (_isnan is broken).
I think it isn't too hard to implement this for x86 by looking at the binary representation - would you be interested? I was experimenting with doing something like that, but decided not to, as it doesn't really matter what hash value you generate for a NaN and I was worried about missing some special case. Whenever I think I fully understand floating point, something new always crops up.
Suse Linux: gcc-3.3.3 : OK.
IIRC, gcc-2.95 doesn't have fpclassify, but it might be possible to implement it with other functions.
#elif defined(_MSC_VER) // This only works for type double, for both float // and long double it gives misleading answers. int fpclassify BOOST_NO_MACRO_EXPAND(double t) { switch(::_fpclass(t))
Oh, I've been using _fpclass for hash (not checked into CVS yet) - which suggests that my tests are lacking something. Is this something I should be worried about? On Visual C++ a long double has the same representation as a double so I think it should work for that case at least. http://blogs.msdn.com/ericflee/archive/2004/06/10/152852.aspx http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm... Incidently, I'm not sure, but I think other x86 compilers have a different 80 bit long double to the one described in the msdn page. Daniel

I think it isn't too hard to implement this for x86 by looking at the binary representation - would you be interested? I was experimenting with doing something like that, but decided not to, as it doesn't really matter what hash value you generate for a NaN and I was worried about missing some special case. Whenever I think I fully understand floating point, something new always crops up.
Doesn't it just :-) At present it looks like only VC6 signaling nan's are broken, I was hoping to avoid non-portable binary-extraction if possible, but it wouldn't be too much of a hardship to support x86.
Suse Linux: gcc-3.3.3 : OK.
IIRC, gcc-2.95 doesn't have fpclassify, but it might be possible to implement it with other functions.
I'll test with that when I get a chance: the whole point of the test code is that it doesn't require fpclassify, but will use it if present.
Oh, I've been using _fpclass for hash (not checked into CVS yet) - which suggests that my tests are lacking something. Is this something I should be worried about?
It depends if you want to detect non-nan's correctly: a float converted to double may become normal when it was subnormal before.
On Visual C++ a long double has the same representation as a double so I think it should work for that case at least.
http://blogs.msdn.com/ericflee/archive/2004/06/10/152852.aspx http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm...
Incidently, I'm not sure, but I think other x86 compilers have a different 80 bit long double to the one described in the msdn page.
Yes, they have a 10-byte long double, in that case long-double to double convertions may change the value's category: normal -> infinity sub-normal -> zero normal -> sub-normal It doesn't affect whether it's a NaN or not though, so you can use _isnan safely (except for the broken VC6 version that is!). John.

John Maddock wrote: Tests run on IRIX Mips Pro 7.4.1m -N32 -mips3 For CC -o classify classify.cpp -lm -I$(BOOST) and CC -O1 -o classify classify.cpp -lm -I$(BOOST) Running 1 test case... FP_ZERO: 0 FP_NORMAL: 1 FP_INFINITE: 2 FP_NAN: 3 FP_SUBNORMAL: 4 Testing type float classify.cpp(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)4 failed [0 != 4] classify.cpp(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)4 failed [0 != 4] Testing type double classify.cpp(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)4 failed [0 != 4] classify.cpp(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)4 failed [0 != 4] Testing type long double classify.cpp(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)4 failed [0 != 4] classify.cpp(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)4 failed [0 != 4] *** 6 failures detected in test suite "Test Program" For CC -O2 -o classify classify.cpp -lm -I$(BOOST) and CC -O3 -o classify classify.cpp -lm -I$(BOOST) Running 1 test case... FP_ZERO: 0 FP_NORMAL: 1 FP_INFINITE: 2 FP_NAN: 3 FP_SUBNORMAL: 4 Testing type float Testing type double Testing type long double *** No errors detected So go figure :-) Optimisation improves the compliance.... bizarre, anyway there are a whole heap of Optimisation options that probably change this behaviour. There is an entire man page for it separate from the main compiler one so I'll look through that to see which ones are important. I belive you can't actually tell at compile time if denormalised are supported like the code is doing I think you need: if (std::numeric_limits<T>::has_denorm == std::denorm_present ) But thats on a brief inspection, may also want to test using denorm_min. Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

So go figure :-) Optimisation improves the compliance.... bizarre, anyway there are a whole heap of Optimisation options that probably change this behaviour. There is an entire man page for it separate from the main compiler one so I'll look through that to see which ones are important. I belive you can't actually tell at compile time if denormalised are supported like the code is doing I think you need:
if (std::numeric_limits<T>::has_denorm == std::denorm_present )
But thats on a brief inspection, may also want to test using denorm_min.
Thanks for the results, looks like I should check that the value isn't zero before assuming it's a denorm. And yes, I will test denorm_min as well. John.

John Maddock wrote: [SNIP]
Win32: VC6: Can't detect signaling NaN's (_isnan is broken). VC7, 7.1, 8: OK Intel 7, 8, 9: OK provided you're not using the VC6 runtime. GCC cygwin: OK. Borland 5.6.4: So broken I don't know where to begin :-(
HP-UX: gcc-3.4.2: OK. aCC: OK (native fpclassify broken for long doubles, but portable version works).
Tru64: cxx : OK (but can't test infinities/NaN's as they're not supported?
Suse Linux: gcc-3.3.3 : OK.
[pedro@localhost Projetos]$ g++ -v Utilisation des specs internes. Target: i386-redhat-linux Configuré avec: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux Modèle de thread: posix version gcc 4.0.2 20051125 (Red Hat 4.0.2-8) [pedro@localhost Projetos]$ g++ -Wall classify.cpp -o classify [pedro@localhost Projetos]$ ./classify Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float Testing type double Testing type long double *** No errors detected [pedro@localhost Projetos]$ g++ -Wall -O classify.cpp -o classify [pedro@localhost Projetos]$ ./classify Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float Testing type double Testing type long double *** No errors detected [pedro@localhost Projetos]$ g++ -Wall -O2 classify.cpp -o classify [pedro@localhost Projetos]$ ./classify Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float Testing type double Testing type long double *** No errors detected [pedro@localhost Projetos]$ g++ -Wall -O3 classify.cpp -o classify [pedro@localhost Projetos]$ ./classify Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float Testing type double Testing type long double *** No errors detected [pedro@localhost Projetos]$ man gcc [pedro@localhost Projetos]$ g++ -Wall -O3 -ffast-math classify.cpp -o classify [pedro@localhost Projetos]$ ./classify Running 1 test case... Platform has FP_NORMAL macro. FP_ZERO: 2 FP_NORMAL: 4 FP_INFINITE: 1 FP_NAN: 0 FP_SUBNORMAL: 3 Testing type float Testing type double Testing type long double *** No errors detected -- Pedro Lamarão

John Maddock wrote:
Please build the test program with optimisations turned on: many compilers make non-IEEE arithmetic assumptions once you tell them to optimise the code.
Results for QNX 6.3.0 using the gcc 3.3.5 compiler and the Dinkum libraries. Optimisation does not affect the results. Running 1 test case... Platform has isnan macro. Platform has fpclassify macro. Platform has FP_NORMAL macro. FP_ZERO: 0 FP_NORMAL: -1 FP_INFINITE: 1 FP_NAN: 2 FP_SUBNORMAL: -2 Testing type float classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2] Testing type double classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2] Testing type long double classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2] *** 6 failures detected in test suite "Test Program" Regards Jim Douglas

Results for QNX 6.3.0 using the gcc 3.3.5 compiler and the Dinkum libraries. Optimisation does not affect the results.
Thanks.
Testing type float classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2]
*** 6 failures detected in test suite "Test Program"
Worrying: the results indicate that the platform has native support for fpclassify, and that it doesn't work (doesn't detect denorms). John.

John Maddock wrote:
Testing type float classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2]
*** 6 failures detected in test suite "Test Program"
Worrying: the results indicate that the platform has native support for fpclassify, and that it doesn't work (doesn't detect denorms).
Just explain that in slightly more detail and I will raise a support ticket with QNX. Thanks Jim

John, The attached header files may be of interest. A comment in ymath.h suggests that the value returned conforms to the C9X standard. Let me know if you still think it is erroneous. Jim Jim Douglas wrote:
John Maddock wrote:
Testing type float classify.cc(130): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(t) == (int)(-2) failed [-1 != -2] classify.cc(131): error in "test_main_caller( argc, argv )": check (::pending::fpclassify)(-t) == (int)(-2) failed [-1 != -2]
*** 6 failures detected in test suite "Test Program"
Worrying: the results indicate that the platform has native support for fpclassify, and that it doesn't work (doesn't detect denorms).
Just explain that in slightly more detail and I will raise a support ticket with QNX.
Thanks Jim
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
/* math.h standard header */ #ifndef _MATH #define _MATH #ifndef _YMATH #include <ymath.h> #endif /* _YMATH */ #include <_pack64.h> _C_STD_BEGIN /* * XOPEN/SVID */ #if defined(__EXT_XOPEN_EX) #define M_E 2.7182818284590452354 /* e */ #define M_LOG2E 1.4426950408889634074 /* log 2e */ #define M_LOG10E 0.43429448190325182765 /* log 10e */ #define M_LN2 0.69314718055994530942 /* log e2 */ #define M_LN10 2.30258509299404568402 /* log e10 */ #define M_PI 3.14159265358979323846 /* pi */ #define M_PI_2 1.57079632679489661923 /* pi/2 */ #define M_PI_4 0.78539816339744830962 /* pi/4 */ #define M_1_PI 0.31830988618379067154 /* 1/pi */ #define M_2_PI 0.63661977236758134308 /* 2/pi */ #define M_2_SQRTPI 1.12837916709551257390 /* 2/sqrt(pi) */ #define M_SQRT2 1.41421356237309504880 /* sqrt(2) */ #define M_SQRT1_2 0.70710678118654752440 /* 1/sqrt(2) */ #define MAXFLOAT ((float)3.40282346638528860e+38) #endif /* CODES FOR is* FUNCTIONS */ #define _FP_LT 1 #define _FP_EQ 2 #define _FP_GT 4 /* CODES FOR ilogb FUNCTIONS */ #if _ILONG #define _FP_ILOGB0 (-0x7fffffff - _C2) #define _FP_ILOGBNAN 0x7fffffff #else /* _ILONG */ #define _FP_ILOGB0 (-0x7fff - _C2) #define _FP_ILOGBNAN 0x7fff #endif /* _ILONG */ #if _HAS_C9X || (2 <= __GNUC__) /* TYPES */ #if _FEVAL == 1 typedef double float_t; typedef double double_t; #elif _FEVAL == 2 typedef long double float_t; typedef long double double_t; #else /* _FEVAL */ typedef float float_t; typedef double double_t; #endif /* _FEVAL */ /* MACROS */ #if 245 < __EDG_VERSION__ && !defined(__cplusplus) #define HUGE_VAL ((double)__INFINITY__) #define HUGE_VALF __INFINITY__ #define HUGE_VALL ((long double)__INFINITY__) #define INFINITY __INFINITY__ #define NAN __NAN__ #elif defined(__EDG__) || 0 < _MSC_VER #define HUGE_VAL _CSTD _Inf._Double #define HUGE_VALF _CSTD _FInf._Float #define HUGE_VALL _CSTD _LInf._Long_double #define INFINITY _CSTD _FInf._Float #define NAN _CSTD _FNan._Float #else /* !defined(__EDG__) */ #ifndef _HUGE_ENUF #define _HUGE_ENUF 1e+30F /* _HUGE_ENUF*_HUGE_ENUF must overflow */ #endif /* _HUGE_ENUF */ #define HUGE_VAL ((double)INFINITY) #define HUGE_VALF ((float)INFINITY) #define HUGE_VALL ((long double)INFINITY) #define INFINITY ((float)(_HUGE_ENUF * _HUGE_ENUF)) #define NAN ((float)(INFINITY * 0.0F)) #endif /* 245 < __EDG_VERSION__ */ #define FP_INFINITE _INFCODE #define FP_NAN _NANCODE #define FP_NORMAL _FINITE #define FP_SUBNORMAL _DENORM #define FP_ZERO 0 #if _HAS_C9X_FAST_FMA #define FP_FAST_FMA 1 #define FP_FAST_FMAF 1 #define FP_FAST_FMAL 1 #endif /* _HAS_C9X_FAST_FMA */ #define FP_ILOGB0 _FP_ILOGB0 #define FP_ILOGBNAN _FP_ILOGBNAN #define MATH_ERRNO 1 #define MATH_ERREXCEPT 2 #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) /* do both */ _C_LIB_DECL int _FFpcomp(float, float); int _Fpcomp(double, double); int _LFpcomp(long double, long double); int _FDclass(float); int _Dclass(double); int _LDclass(long double); int _FDsign(float); int _Dsign(double); int _LDsign(long double); _END_C_LIB_DECL #if defined(__cplusplus) && !__embedded_cplusplus /* DON'T SIMPLIFY! */ #define _FPCOMP_template 1 #else /* defined(__cplusplus) && !__embedded_cplusplus */ #define _FPCOMP_template 0 #endif /* defined(__cplusplus) && !__embedded_cplusplus */ #if _FPCOMP_template // TEMPLATE CLASS _Rc_type template<class _Ty> struct _Rc_type { // determine if type is real or complex typedef float _Type; // default is real }; // TEMPLATE CLASS _Rc_widened template<class _Ty, class _T2> struct _Rc_widened { // determine real/complex type typedef float _Type; // (real, real) is real }; // TEMPLATE CLASS _Real_type #if defined(__SUNPRO_CC) /* compiler test */ template<class _Ty> struct _Real_type; template<> struct _Real_type<int> { // determine equivalent real type typedef double _Type; }; template<> struct _Real_type<unsigned int> { // determine equivalent real type typedef double _Type; }; template<> struct _Real_type<long> { // determine equivalent real type typedef double _Type; }; template<> struct _Real_type<unsigned long> { // determine equivalent real type typedef double _Type; }; template<> struct _Real_type<double> { // determine equivalent real type typedef double _Type; }; #else /* defined(__SUNPRO_CC) */ template<class _Ty> struct _Real_type { // determine equivalent real type typedef double _Type; // default is double }; #endif /* defined(__SUNPRO_CC) */ template<> struct _Real_type<float> { // determine equivalent real type typedef float _Type; }; template<> struct _Real_type<long double> { // determine equivalent real type typedef long double _Type; }; // TEMPLATE CLASS _Real_widened template<class _Ty, class _T2> struct _Real_widened { // determine widened real type typedef long double _Type; // default is long double }; template<> struct _Real_widened<float, float> { // determine widened real type typedef float _Type; }; template<> struct _Real_widened<float, double> { // determine widened real type typedef double _Type; }; template<> struct _Real_widened<double, float> { // determine widened real type typedef double _Type; }; template<> struct _Real_widened<double, double> { // determine widened real type typedef double _Type; }; // TEMPLATE CLASS _Combined_type template<class _Trc, class _Tre> struct _Combined_type { // determine combined type typedef float _Type; // (real, float) is float }; template<> struct _Combined_type<float, double> { // determine combined type typedef double _Type; }; template<> struct _Combined_type<float, long double> { // determine combined type typedef long double _Type; }; // TEMPLATE FUNCTION _FPCOMP inline int _FPCOMP(float _Left, float _Right) { // compare _Left and _Right return (_FFpcomp(_Left, _Right)); } inline int _FPCOMP(double _Left, double _Right) { // compare _Left and _Right return (_Fpcomp(_Left, _Right)); } inline int _FPCOMP(long double _Left, long double _Right) { // compare _Left and _Right return (_LFpcomp(_Left, _Right)); } template<class _T1, class _T2> inline int _FPCOMP(_T1 _Left, _T2 _Right) { // compare _Left and _Right typedef typename _Combined_type<float, typename _Real_widened< typename _Real_type<_T1>::_Type, typename _Real_type<_T2>::_Type>::_Type>::_Type _Tw; return (_FPCOMP((_Tw)_Left, (_Tw)_Right)); } // FUNCTION fpclassify inline int fpclassify(float _Left) { // classify argument return (_FDtest(&_Left)); } inline int fpclassify(double _Left) { // classify argument return (_Dtest(&_Left)); } inline int fpclassify(long double _Left) { // classify argument return (_LDtest(&_Left)); } // FUNCTION signbit inline bool signbit(float _Left) { // test sign bit return (_FDsign(_Left) != 0); } inline bool signbit(double _Left) { // test sign bit return (_Dsign(_Left) != 0); } inline bool signbit(long double _Left) { // test sign bit return (_LDsign(_Left) != 0); } template<class _Ty> inline bool isfinite(_Ty _Left) { // test for finite return (fpclassify(_Left) <= 0); } template<class _Ty> inline bool isinf(_Ty _Left) { // test for infinite return (fpclassify(_Left) == FP_INFINITE); } template<class _Ty> inline bool isnan(_Ty _Left) { // test for NaN return (fpclassify(_Left) == FP_NAN); } template<class _Ty> inline bool isnormal(_Ty _Left) { // test for normal return (fpclassify(_Left) == FP_NORMAL); } template<class _Ty1, class _Ty2> inline bool isgreater(_Ty1 _Left, _Ty2 _Right) { // test for _Left > _Right return ((_FPCOMP(_Left, _Right) & _FP_GT) != 0); } template<class _Ty1, class _Ty2> inline bool isgreaterequal(_Ty1 _Left, _Ty2 _Right) { // test for _Left >= _Right return ((_FPCOMP(_Left, _Right) & (_FP_EQ | _FP_GT)) != 0); } template<class _Ty1, class _Ty2> inline bool isless(_Ty1 _Left, _Ty2 _Right) { // test for _Left < _Right return ((_FPCOMP(_Left, _Right) & _FP_LT) != 0); } template<class _Ty1, class _Ty2> inline bool islessequal(_Ty1 _Left, _Ty2 _Right) { // test for _Left <= _Right return ((_FPCOMP(_Left, _Right) & (_FP_LT | _FP_EQ)) != 0); } template<class _Ty1, class _Ty2> inline bool islessgreater(_Ty1 _Left, _Ty2 _Right) { // test for _Left != _Right return ((_FPCOMP(_Left, _Right) & (_FP_LT | _FP_GT)) != 0); } template<class _Ty1, class _Ty2> inline bool isunordered(_Ty1 _Left, _Ty2 _Right) { // test for _Left unorderd w.r.t. _Right return (_FPCOMP(_Left, _Right) == 0); } #define fpclassify(x) (_CSTD fpclassify(x)) #define signbit(x) (_CSTD signbit(x)) #define isfinite(x) (_CSTD isfinite(x)) #define isinf(x) (_CSTD isinf(x)) #define isnan(x) (_CSTD isnan(x)) #define isnormal(x) (_CSTD isnormal(x)) #define isgreater(x, y) (_CSTD isgreater(x, y)) #define isgreaterequal(x, y) (_CSTD isgreaterequal(x, y)) #define isless(x, y) (_CSTD isless(x, y)) #define islessequal(x, y) (_CSTD islessequal(x, y)) #define islessgreater(x, y) (_CSTD islessgreater(x, y)) #define isunordered(x, y) (_CSTD isunordered(x, y)) #else /* _FPCOMP_template */ #if __EDG__ #define _CARGI(x, fd, ff, fl) \ __generic(x,,, fd, ff, fl,,,)(x) #define _CARG2I(x, y, fd, ff, fl) \ __generic(x, y,, fd, ff, fl,,,)(x, y) #elif 2 <= __GNUC__ #define _FLT_TYPE(expression) _FLT_OR_DBL(__typeof__ (expression)) #define _FLT_OR_DBL(mytype) __typeof__ (*(0 \ ? (__typeof__ (0 ? (mytype *)0 : (void *)((mytype)0.5 == 0)))0 \ : (__typeof__ (0 ? (double *)0 : (void *)((mytype)0.5 != 0)))0)) #define _CARGI(x, fd, ff, fl) \ (__extension__({ \ int _Ans; \ if (sizeof (_FLT_TYPE(x)) == sizeof (double)) \ _Ans = fd(x); \ else if (sizeof (_FLT_TYPE(x)) == sizeof (float)) \ _Ans = ff(x); \ else \ _Ans = fl(x); \ _Ans; \ })) #define _CARG2I(x, y, fd, ff, fl) \ (__extension__({ \ int _Ans; \ if (sizeof (_FLT_TYPE((x) + (y))) == sizeof (double)) \ _Ans = fd(x, y); \ else if (sizeof (_FLT_TYPE((x) + (y))) == sizeof (float)) \ _Ans = ff(x, y); \ else \ _Ans = fl(x, y); \ _Ans; \ })) #else /* compiler type */ #define _ARG(x) (sizeof ((x) + (float)0) == sizeof (float) ? 'f' \ : sizeof ((x) + (double)0) == sizeof (double) ? 'd' \ : 'l') #define _CARGI(x, fd, ff, fl) \ (_ARG(x) == 'f' ? ff((float)(x)) \ : _ARG(x) == 'd' ? fd((double)(x)) \ : fl((long double)(x))) #define _CARG2I(x, y, fd, ff, fl) \ (_ARG((x) + (y)) == 'f' ? ff((float)(x), (float)(y)) \ : _ARG((x) + (y)) == 'd' ? fd((double)(x), (double)(y)) \ : fl((long double)(x), (long double)(y))) #endif /* compiler type */ #define _FPCOMP(x, y) \ _CARG2I(x, y, _Fpcomp, _FFpcomp, _LFpcomp) #define fpclassify(x) \ _CARGI(x, _Dclass, _FDclass, _LDclass) #define signbit(x) \ _CARGI(x, _Dsign, _FDsign, _LDsign) #define isfinite(x) (fpclassify(x) <= 0) #define isinf(x) (fpclassify(x) == FP_INFINITE) #define isnan(x) (fpclassify(x) == FP_NAN) #define isnormal(x) (fpclassify(x) == FP_NORMAL) #define isgreater(x, y) ((_FPCOMP(x, y) & _FP_GT) != 0) #define isgreaterequal(x, y) \ ((_FPCOMP(x, y) & (_FP_EQ | _FP_GT)) != 0) #define isless(x, y) ((_FPCOMP(x, y) & _FP_LT) != 0) #define islessequal(x, y) ((_FPCOMP(x, y) & (_FP_LT | _FP_EQ)) != 0) #define islessgreater(x, y) \ ((_FPCOMP(x, y) & (_FP_LT | _FP_GT)) != 0) #define isunordered(x, y) (_FPCOMP(x, y) == 0) #endif /* _FPCOMP_template */ #else /* _IS_C9X */ /* MACROS */ #define HUGE_VAL _CSTD _Hugeval._Double #endif /* _IS_C9X */ _C_LIB_DECL /* double declarations */ double acos(double); double asin(double); double atan(double); double atan2(double, double); double ceil(double); double exp(double); double fabs(double); double floor(double); double fmod(double, double); double frexp(double, int *); double ldexp(double, int); double modf(double, double *); double pow(double, double); double sqrt(double); double tan(double); double tanh(double); #if _HAS_C9X double acosh(double); double asinh(double); double atanh(double); double cbrt(double); double copysign(double, double); double erf(double); double erfc(double); double exp2(double); double expm1(double); double fdim(double, double); double fma(double, double, double); double fmax(double, double); double fmin(double, double); double hypot(double, double); int ilogb(double); double lgamma(double); _Longlong llrint(double); _Longlong llround(double); double log1p(double); double logb(double); long lrint(double); long lround(double); double nan(const char *); double nearbyint(double); double nextafter(double, double); double nexttoward(double, long double); double remainder(double, double); double remquo(double, double, int *); double rint(double); double round(double); double scalbn(double, int); double scalbln(double, long); double tgamma(double); double trunc(double); #endif /* _IS_C9X */ double lgamma_r(double,int*); float lgammaf_r(float,int*); long double lgammal_r(long double,int*); /* Deprecated functions, use C99 lgamma*() functions instead */ float gammaf(float) __attribute__((__const__,__deprecated__)); float gammaf_r(float,int*) __attribute__((__const__,__deprecated__)); double gamma(double) __attribute__((__const__,__deprecated__)); double gamma_r(double,int*) __attribute__((__const__,__deprecated__)); /* Compatibility functions */ int finite(double) __attribute__((__const__,__deprecated__)); /* use isfinite macro */ int finitef(float) __attribute__((__const__,__deprecated__)); /* use isfinite macro */ int isinff(float) __attribute__((__const__,__deprecated__)); /* use isinf macro */ int isnanf(float) __attribute__((__const__,__deprecated__)); /* use isnan macro */ double drem(double, double) __attribute__((__const__,__deprecated__)); /* use remainder */ float dremf(float, float) __attribute__((__const__,__deprecated__)); /* use remainder */ float significandf(float) __attribute__((__const__,__deprecated__)); /* use scalbn(x, -ilogb(x) */ double significand(double) __attribute__((__const__,__deprecated__)); /* use scalbnf(x, -ilogbf(x) */ float scalbf(float, float) __attribute__((__const__,__deprecated__)); /* use scalbn */ /* XSI Obsolete functions */ double scalb(double, double) __attribute__((__const__)); /* float declarations */ float acosf(float); float asinf(float); float atanf(float); float atan2f(float, float); float ceilf(float); float expf(float); float fabsf(float); float floorf(float); float fmodf(float, float); float frexpf(float, int *); float ldexpf(float, int); float modff(float, float *); float powf(float, float); float sqrtf(float); float tanf(float); float tanhf(float); #if _HAS_C9X float acoshf(float); float asinhf(float); float atanhf(float); float cbrtf(float); float copysignf(float, float); float erff(float); float erfcf(float); float expm1f(float); float exp2f(float); float fdimf(float, float); float fmaf(float, float, float); float fmaxf(float, float); float fminf(float, float); float hypotf(float, float); int ilogbf(float); float lgammaf(float); _Longlong llrintf(float); _Longlong llroundf(float); float log1pf(float); float logbf(float); long lrintf(float); long lroundf(float); float nanf(const char *); float nearbyintf(float); float nextafterf(float, float); float nexttowardf(float, long double); float remainderf(float, float); float remquof(float, float, int *); float rintf(float); float roundf(float); float scalbnf(float, int); float scalblnf(float, long); float tgammaf(float); float truncf(float); #endif /* _IS_C9X */ /* long double declarations */ long double acosl(long double); long double asinl(long double); long double atanl(long double); long double atan2l(long double, long double); long double ceill(long double); long double expl(long double); long double fabsl(long double); long double floorl(long double); long double fmodl(long double, long double); long double frexpl(long double, int *); long double ldexpl(long double, int); long double modfl(long double, long double *); long double powl(long double, long double); long double sqrtl(long double); long double tanl(long double); long double tanhl(long double); #if _HAS_C9X long double acoshl(long double); long double asinhl(long double); long double atanhl(long double); long double cbrtl(long double); long double copysignl(long double, long double); long double erfl(long double); long double erfcl(long double); long double exp2l(long double); long double expm1l(long double); long double fdiml(long double, long double); long double fmal(long double, long double, long double); long double fmaxl(long double, long double); long double fminl(long double, long double); long double hypotl(long double, long double); int ilogbl(long double); long double lgammal(long double); _Longlong llrintl(long double); _Longlong llroundl(long double); long double log1pl(long double); long double logbl(long double); long lrintl(long double); long lroundl(long double); long double nanl(const char *); long double nearbyintl(long double); long double nextafterl(long double, long double); long double nexttowardl(long double, long double); long double remainderl(long double, long double); long double remquol(long double, long double, int *); long double rintl(long double); long double roundl(long double); long double scalbnl(long double, int); long double scalblnl(long double, long); long double tgammal(long double); long double truncl(long double); #endif /* _IS_C9X */ _END_C_LIB_DECL #if defined(__cplusplus) && !defined(_NO_CPP_INLINES) #if _IS_EMBEDDED #else /* _IS_EMBEDDED */ // TEMPLATE FUNCTION _Pow_int template<class _Ty> inline _Ty _Pow_int(_Ty _Left, int _Right) { // raise to integer power unsigned int _Num = _Right; if (_Right < 0) _Num = 0 - _Num; for (_Ty _Ans = 1; ; _Left *= _Left) { // scale and fold in factors if ((_Num & 1) != 0) _Ans *= _Left; if ((_Num >>= 1) == 0) return (0 <= _Right ? _Ans : _Ans == _Ty(0) ? HUGE_VAL : _Ty(1) / _Ans); } } #endif /* _IS_EMBEDDED */ // double INLINES, FOR C++ _C_LIB_DECL inline double cos(double _Left) { // return cosine return (_Sin(_Left, 1)); } inline double cosh(double _Left) { // return hyperbolic cosine return (_Cosh(_Left, 1)); } inline double log(double _Left) { // return natural logarithm return (_Log(_Left, 0)); } inline double log10(double _Left) { // return base-10 logarithm return (_Log(_Left, 1)); } inline double sin(double _Left) { // return sine return (_Sin(_Left, 0)); } inline double sinh(double _Left) { // return hyperbolic sine return (_Sinh(_Left, 1)); } #if _HAS_C9X inline double log2(double _Left) { // return base-2 logarithm return (_Log(_Left, -1)); } #endif /* _IS_C9X */ _END_C_LIB_DECL inline double abs(double _Left) // OVERLOADS { // return absolute value return (fabs(_Left)); } #if _IS_EMBEDDED inline double pow(double _Left, int _Right) { // raise to integer power unsigned int _Num = _Right; if (_Right < 0) _Num = 0 - _Num; for (double _Ans = 1; ; _Left *= _Left) {if ((_Num & 1) != 0) _Ans *= _Left; if ((_Num >>= 1) == 0) return (_Right < 0 ? (double)(1) / _Ans : _Ans); } } #else /* _IS_EMBEDDED */ inline double pow(double _Left, int _Right) { // raise to integer power return (_Pow_int(_Left, _Right)); } #endif /* _IS_EMBEDDED */ // float INLINES, FOR C++ _C_LIB_DECL inline float cosf(float _Left) { // return cosine return (_FSin(_Left, 1)); } inline float coshf(float _Left) { // return hyperbolic cosine return (_FCosh(_Left, 1)); } inline float logf(float _Left) { // return natural logarithm return (_FLog(_Left, 0)); } inline float log10f(float _Left) { // return base-10 logarithm return (_FLog(_Left, 1)); } inline float sinf(float _Left) { // return sine return (_FSin(_Left, 0)); } inline float sinhf(float _Left) { // return hyperbolic sine return (_FSinh(_Left, 1)); } #if _HAS_C9X inline float log2f(float _Left) { // return base-2 logarithm return (_FLog(_Left, -1)); } #endif /* _IS_C9X */ _END_C_LIB_DECL inline float abs(float _Left) // OVERLOADS { // return absolute value return (fabsf(_Left)); } inline float acos(float _Left) { // return arccosine return (acosf(_Left)); } inline float asin(float _Left) { // return arcsine return (asinf(_Left)); } inline float atan(float _Left) { // return arctangent return (atanf(_Left)); } inline float atan2(float _Left, float _Right) { // return arctangent return (atan2f(_Left, _Right)); } inline float ceil(float _Left) { // return ceiling return (ceilf(_Left)); } inline float cos(float _Left) { // return cosine return (_FSin(_Left, 1)); } inline float cosh(float _Left) { // return hyperbolic cosine return (_FCosh(_Left, 1)); } inline float exp(float _Left) { // return exponential return (expf(_Left)); } inline float fabs(float _Left) { // return absolute value return (fabsf(_Left)); } inline float floor(float _Left) { // return floor return (floorf(_Left)); } inline float fmod(float _Left, float _Right) { // return modulus return (fmodf(_Left, _Right)); } inline float frexp(float _Left, int *_Right) { // unpack exponent return (frexpf(_Left, _Right)); } inline float ldexp(float _Left, int _Right) { // pack exponent return (ldexpf(_Left, _Right)); } inline float log(float _Left) { // return natural logarithm return (_FLog(_Left, 0)); } inline float log10(float _Left) { // return base-10 logarithm return (_FLog(_Left, 1)); } inline float modf(float _Left, float *_Right) { // unpack fraction return (modff(_Left, _Right)); } inline float pow(float _Left, float _Right) { // raise to power return (powf(_Left, _Right)); } #if _IS_EMBEDDED inline float pow(float _Left, int _Right) { // raise to integer power unsigned int _Num = _Right; if (_Right < 0) _Num = 0 - _Num; for (float _Ans = 1; ; _Left *= _Left) {if ((_Num & 1) != 0) _Ans *= _Left; if ((_Num >>= 1) == 0) return (_Right < 0 ? (float)(1) / _Ans : _Ans); } } #else /* _IS_EMBEDDED */ inline float pow(float _Left, int _Right) { // raise to integer power return (_Pow_int(_Left, _Right)); } #endif /* _IS_EMBEDDED */ inline float sin(float _Left) { // return sine return (_FSin(_Left, 0)); } inline float sinh(float _Left) { // return hyperbolic sine return (_FSinh(_Left, 1)); } inline float sqrt(float _Left) { // return square root return (sqrtf(_Left)); } inline float tan(float _Left) { // return tangent return (tanf(_Left)); } inline float tanh(float _Left) { // return hyperbolic tangent return (tanhf(_Left)); } #if _HAS_C9X inline float acosh(float _Left) { // return hyperbolic arccosine return (acoshf(_Left)); } inline float asinh(float _Left) { // return hyperbolic arcsine return (asinhf(_Left)); } inline float atanh(float _Left) { // return hyperbolic arctangent return (atanhf(_Left)); } inline float cbrt(float _Left) { // return cube root return (cbrtf(_Left)); } inline float copysign(float _Left, float _Right) { // return copysign return (copysignf(_Left, _Right)); } inline float erf(float _Left) { // return erf return (erff(_Left)); } inline float erfc(float _Left) { // return erfc return (erfcf(_Left)); } inline float exp2(float _Left) { // return exp2 return (exp2f(_Left)); } inline float expm1(float _Left) { // return expml return (expm1f(_Left)); } inline float fdim(float _Left, float _Right) { // return fdim return (fdimf(_Left, _Right)); } inline float fma(float _Left, float _Right, float _Addend) { // return fma return (fmaf(_Left, _Right, _Addend)); } inline float fmax(float _Left, float _Right) { // return fmax return (fmaxf(_Left, _Right)); } inline float fmin(float _Left, float _Right) { // return fmin return (fminf(_Left, _Right)); } inline float hypot(float _Left, float _Right) { // return hypot return (hypotf(_Left, _Right)); } inline int ilogb(float _Left) { // return ilogb return (ilogbf(_Left)); } inline float lgamma(float _Left) { // return lgamma return (lgammaf(_Left)); } inline _Longlong llrint(float _Left) { // return llrint return (llrintf(_Left)); } inline _Longlong llround(float _Left) { // return llround return (llroundf(_Left)); } inline float log1p(float _Left) { // return loglp return (log1pf(_Left)); } inline float log2(float _Left) { // return log2 return (_FLog(_Left, -1)); } inline float logb(float _Left) { // return logb return (logbf(_Left)); } inline long lrint(float _Left) { // return lrint return (lrintf(_Left)); } inline long lround(float _Left) { // return lround return (lroundf(_Left)); } inline float nearbyint(float _Left) { // return nearbyint return (nearbyintf(_Left)); } inline float nextafter(float _Left, float _Right) { // return nextafter return (nextafterf(_Left, _Right)); } inline float nexttoward(float _Left, long double _Right) { // return nexttoward return (nexttowardf(_Left, _Right)); } inline float remainder(float _Left, float _Right) { // return remainder return (remainderf(_Left, _Right)); } inline float remquo(float _Left, float _Right, int *_Pval) { // return remquo return (remquof(_Left, _Right, _Pval)); } inline float rint(float _Left) { // return rint return (rintf(_Left)); } inline float round(float _Left) { // return round return (roundf(_Left)); } inline float scalbn(float _Left, int _Right) { // return scalbn return (scalbnf(_Left, _Right)); } inline float scalbln(float _Left, long _Right) { // return scalbln return (scalblnf(_Left, _Right)); } inline float tgamma(float _Left) { // return tgamma return (tgammaf(_Left)); } inline float trunc(float _Left) { // return trunc return (truncf(_Left)); } #endif /* _IS_C9X */ // long double INLINES, FOR C++ _C_LIB_DECL inline long double cosl(long double _Left) { // return cosine return (_LSin(_Left, 1)); } inline long double coshl(long double _Left) { // return hyperbolic cosine return (_LCosh(_Left, 1)); } inline long double logl(long double _Left) { // return natural logarithm return (_LLog(_Left, 0)); } inline long double log10l(long double _Left) { // return base-10 logarithm return (_LLog(_Left, 1)); } inline long double sinl(long double _Left) { // return sine return (_LSin(_Left, 0)); } inline long double sinhl(long double _Left) { // return hyperbolic sine return (_LSinh(_Left, 1)); } #if _HAS_C9X inline long double log2l(long double _Left) { // return base-2 logarithm return (_LLog(_Left, -1)); } #endif /* _IS_C9X */ _END_C_LIB_DECL inline long double abs(long double _Left) // OVERLOADS { // return absolute value return (fabsl(_Left)); } inline long double acos(long double _Left) { // return arccosine return (acosl(_Left)); } inline long double asin(long double _Left) { // return arcsine return (asinl(_Left)); } inline long double atan(long double _Left) { // return arctangent return (atanl(_Left)); } inline long double atan2(long double _Left, long double _Right) { // return arctangent return (atan2l(_Left, _Right)); } inline long double ceil(long double _Left) { // return ceiling return (ceill(_Left)); } inline long double cos(long double _Left) { // return cosine return (_LSin(_Left, 1)); } inline long double cosh(long double _Left) { // return hyperbolic cosine return (_LCosh(_Left, 1)); } inline long double exp(long double _Left) { // return exponential return (expl(_Left)); } inline long double fabs(long double _Left) { // return absolute value return (fabsl(_Left)); } inline long double floor(long double _Left) { // return floor return (floorl(_Left)); } inline long double fmod(long double _Left, long double _Right) { // return modulus return (fmodl(_Left, _Right)); } inline long double frexp(long double _Left, int *_Right) { // unpack exponent return (frexpl(_Left, _Right)); } inline long double ldexp(long double _Left, int _Right) { // pack exponent return (ldexpl(_Left, _Right)); } inline long double log(long double _Left) { // return natural logarithm return (_LLog(_Left, 0)); } inline long double log10(long double _Left) { // return base-10 logarithm return (_LLog(_Left, 1)); } inline long double modf(long double _Left, long double *_Right) { // unpack fraction return (modfl(_Left, _Right)); } inline long double pow(long double _Left, long double _Right) { // raise to power return (powl(_Left, _Right)); } #if _IS_EMBEDDED inline long double pow(long double _Left, int _Right) { // raise to integer power unsigned int _Num = _Right; if (_Right < 0) _Num = 0 - _Num; for (long double _Ans = 1; ; _Left *= _Left) {if ((_Num & 1) != 0) _Ans *= _Left; if ((_Num >>= 1) == 0) return (_Right < 0 ? (long double)(1) / _Ans : _Ans); } } #else /* _IS_EMBEDDED */ inline long double pow(long double _Left, int _Right) { // raise to integer power return (_Pow_int(_Left, _Right)); } #endif /* _IS_EMBEDDED */ inline long double sin(long double _Left) { // return sine return (_LSin(_Left, 0)); } inline long double sinh(long double _Left) { // return hyperbolic sine return (_LSinh(_Left, 1)); } inline long double sqrt(long double _Left) { // return square root return (sqrtl(_Left)); } inline long double tan(long double _Left) { // return tangent return (tanl(_Left)); } inline long double tanh(long double _Left) { // return hyperbolic tangent return (tanhl(_Left)); } #if _HAS_C9X inline long double acosh(long double _Left) { // return acosh return (acoshl(_Left)); } inline long double asinh(long double _Left) { // return asinh return (asinhl(_Left)); } inline long double atanh(long double _Left) { // return atanh return (atanhl(_Left)); } inline long double cbrt(long double _Left) { // return cbrt return (cbrtl(_Left)); } inline long double copysign(long double _Left, long double _Right) { // return copysign return (copysignl(_Left, _Right)); } inline long double erf(long double _Left) { // return erf return (erfl(_Left)); } inline long double erfc(long double _Left) { // return erfc return (erfcl(_Left)); } inline long double exp2(long double _Left) { // return exp2 return (exp2l(_Left)); } inline long double expm1(long double _Left) { // return expml return (expm1l(_Left)); } inline long double fdim(long double _Left, long double _Right) { // return fdim return (fdiml(_Left, _Right)); } inline long double fma(long double _Left, long double _Right, long double _Addend) { // return fma return (fmal(_Left, _Right, _Addend)); } inline long double fmax(long double _Left, long double _Right) { // return fmax return (fmaxl(_Left, _Right)); } inline long double fmin(long double _Left, long double _Right) { // return fmin return (fminl(_Left, _Right)); } inline long double hypot(long double _Left, long double _Right) { // return hypot return (hypotl(_Left, _Right)); } inline int ilogb(long double _Left) { // return ilogb return (ilogbl(_Left)); } inline long double lgamma(long double _Left) { // return lgamma return (lgammal(_Left)); } inline _Longlong llrint(long double _Left) { // return llrint return (llrintl(_Left)); } inline _Longlong llround(long double _Left) { // return llround return (llroundl(_Left)); } inline long double log1p(long double _Left) { // return loglp return (log1pl(_Left)); } inline long double log2(long double _Left) { // return log2 return (_LLog(_Left, -1)); } inline long double logb(long double _Left) { // return logb return (logbl(_Left)); } inline long lrint(long double _Left) { // return lrint return (lrintl(_Left)); } inline long lround(long double _Left) { // return lround return (lroundl(_Left)); } inline long double nearbyint(long double _Left) { // return nearbyint return (nearbyintl(_Left)); } inline long double nextafter(long double _Left, long double _Right) { // return nextafter return (nextafterl(_Left, _Right)); } inline long double nexttoward(long double _Left, long double _Right) { // return nexttoward return (nexttowardl(_Left, _Right)); } inline long double remainder(long double _Left, long double _Right) { // return remainder return (remainderl(_Left, _Right)); } inline long double remquo(long double _Left, long double _Right, int *_Pval) { // return remquo return (remquol(_Left, _Right, _Pval)); } inline long double rint(long double _Left) { // return rint return (rintl(_Left)); } inline long double round(long double _Left) { // return round return (roundl(_Left)); } inline long double scalbn(long double _Left, int _Right) { // return scalbn return (scalbnl(_Left, _Right)); } inline long double scalbln(long double _Left, long _Right) { // return scalbln return (scalblnl(_Left, _Right)); } inline long double tgamma(long double _Left) { // return tgamma return (tgammal(_Left)); } inline long double trunc(long double _Left) { // return trunc return (truncl(_Left)); } #endif /* _IS_C9X */ #else /* defined(__cplusplus) && !defined(_NO_CPP_INLINES) */ _C_LIB_DECL /* double MACRO OVERRIDES, FOR C */ double cos(double); double cosh(double); double log(double); double log10(double); double sin(double); double sinh(double); #define cos(x) _Sin(x, 1) #define cosh(x) _Cosh(x, 1) #define log(x) _Log(x, 0) #define log10(x) _Log(x, 1) #define sin(x) _Sin(x, 0) #define sinh(x) _Sinh(x, 1) #if _HAS_C9X double log2(double); #define log2(x) _Log(x, -1) #endif /* _IS_C9X */ /* float MACRO OVERRIDES, FOR C */ float cosf(float); float coshf(float); float logf(float); float log10f(float); float sinf(float); float sinhf(float); #define cosf(x) _FSin(x, 1) #define coshf(x) _FCosh(x, 1) #define logf(x) _FLog(x, 0) #define log10f(x) _FLog(x, 1) #define sinf(x) _FSin(x, 0) #define sinhf(x) _FSinh(x, 1) #if _HAS_C9X float log2f(float); #define log2f(x) _FLog(x, -1) #endif /* _IS_C9X */ /* long double MACRO OVERRIDES, FOR C */ long double cosl(long double); long double coshl(long double); long double logl(long double); long double log10l(long double); long double sinl(long double); long double sinhl(long double); #define cosl(x) _LSin(x, 1) #define coshl(x) _LCosh(x, 1) #define logl(x) _LLog(x, 0) #define log10l(x) _LLog(x, 1) #define sinl(x) _LSin(x, 0) #define sinhl(x) _LSinh(x, 1) #if _HAS_C9X long double log2l(long double); #define log2l(x) _LLog(x, -1) #endif /* _IS_C9X */ _END_C_LIB_DECL #endif /* defined(__cplusplus) && !defined(_NO_CPP_INLINES) */ _C_STD_END #endif /* _MATH */ #if defined(_STD_USING) using _CSTD acos; using _CSTD asin; using _CSTD atan; using _CSTD atan2; using _CSTD ceil; using _CSTD cos; using _CSTD cosh; using _CSTD exp; using _CSTD fabs; using _CSTD floor; using _CSTD fmod; using _CSTD frexp; using _CSTD ldexp; using _CSTD log; using _CSTD log10; using _CSTD modf; using _CSTD pow; using _CSTD sin; using _CSTD sinh; using _CSTD sqrt; using _CSTD tan; using _CSTD tanh; using _CSTD acosf; using _CSTD asinf; using _CSTD atanf; using _CSTD atan2f; using _CSTD ceilf; using _CSTD cosf; using _CSTD coshf; using _CSTD expf; using _CSTD fabsf; using _CSTD floorf; using _CSTD fmodf; using _CSTD frexpf; using _CSTD ldexpf; using _CSTD logf; using _CSTD log10f; using _CSTD modff; using _CSTD powf; using _CSTD sinf; using _CSTD sinhf; using _CSTD sqrtf; using _CSTD tanf; using _CSTD tanhf; using _CSTD acosl; using _CSTD asinl; using _CSTD atanl; using _CSTD atan2l; using _CSTD ceill; using _CSTD cosl; using _CSTD coshl; using _CSTD expl; using _CSTD fabsl; using _CSTD floorl; using _CSTD fmodl; using _CSTD frexpl; using _CSTD ldexpl; using _CSTD logl; using _CSTD log10l; using _CSTD modfl; using _CSTD powl; using _CSTD sinl; using _CSTD sinhl; using _CSTD sqrtl; using _CSTD tanl; using _CSTD tanhl; #if _HAS_C9X #if _FPCOMP_template using _CSTD _Rc_type; using _CSTD _Rc_widened; using _CSTD _Real_type; using _CSTD _Real_widened; using _CSTD _Combined_type; using _CSTD _FPCOMP; using _CSTD fpclassify; using _CSTD signbit; using _CSTD isfinite; using _CSTD isinf; using _CSTD isnan; using _CSTD isnormal; using _CSTD isgreater; using _CSTD isgreaterequal; using _CSTD isless; using _CSTD islessequal; using _CSTD islessgreater; using _CSTD isunordered; #endif /* _FPCOMP_template */ using _CSTD float_t; using _CSTD double_t; using _CSTD acosh; using _CSTD asinh; using _CSTD atanh; using _CSTD cbrt; using _CSTD erf; using _CSTD erfc; using _CSTD expm1; using _CSTD exp2; using _CSTD hypot; using _CSTD ilogb; using _CSTD lgamma; using _CSTD log1p; using _CSTD log2; using _CSTD logb; using _CSTD llrint; using _CSTD lrint; using _CSTD nearbyint; using _CSTD rint; using _CSTD llround; using _CSTD lround; using _CSTD fdim; using _CSTD fma; using _CSTD fmax; using _CSTD fmin; using _CSTD round; using _CSTD trunc; using _CSTD remainder; using _CSTD remquo; using _CSTD copysign; using _CSTD nan; using _CSTD nextafter; using _CSTD scalbn; using _CSTD scalbln; using _CSTD nexttoward; using _CSTD tgamma; using _CSTD acoshf; using _CSTD asinhf; using _CSTD atanhf; using _CSTD cbrtf; using _CSTD erff; using _CSTD erfcf; using _CSTD expm1f; using _CSTD exp2f; using _CSTD hypotf; using _CSTD ilogbf; using _CSTD lgammaf; using _CSTD log1pf; using _CSTD log2f; using _CSTD logbf; using _CSTD llrintf; using _CSTD lrintf; using _CSTD nearbyintf; using _CSTD rintf; using _CSTD llroundf; using _CSTD lroundf; using _CSTD fdimf; using _CSTD fmaf; using _CSTD fmaxf; using _CSTD fminf; using _CSTD roundf; using _CSTD truncf; using _CSTD remainderf; using _CSTD remquof; using _CSTD copysignf; using _CSTD nanf; using _CSTD nextafterf; using _CSTD scalbnf; using _CSTD scalblnf; using _CSTD nexttowardf; using _CSTD tgammaf; using _CSTD acoshl; using _CSTD asinhl; using _CSTD atanhl; using _CSTD cbrtl; using _CSTD erfl; using _CSTD erfcl; using _CSTD expm1l; using _CSTD exp2l; using _CSTD hypotl; using _CSTD ilogbl; using _CSTD lgammal; using _CSTD log1pl; using _CSTD log2l; using _CSTD logbl; using _CSTD llrintl; using _CSTD lrintl; using _CSTD nearbyintl; using _CSTD rintl; using _CSTD llroundl; using _CSTD lroundl; using _CSTD fdiml; using _CSTD fmal; using _CSTD fmaxl; using _CSTD fminl; using _CSTD roundl; using _CSTD truncl; using _CSTD remainderl; using _CSTD remquol; using _CSTD copysignl; using _CSTD nanl; using _CSTD nextafterl; using _CSTD scalbnl; using _CSTD scalblnl; using _CSTD nexttowardl; using _CSTD tgammal; #endif /* _IS_C9X */ #endif /* defined(_STD_USING) */ /* * Copyright (c) 1992-2003 by P.J. Plauger. ALL RIGHTS RESERVED. * Consult your license regarding permissions and restrictions. V4.02:1296 */ /* ymath.h internal header */ #ifndef _YMATH #define _YMATH #include <yvals.h> _C_STD_BEGIN _C_LIB_DECL /* MACROS FOR _FPP_TYPE */ #define _FPP_NONE 0 /* software emulation of FPP */ #define _FPP_X86 1 /* Intel Pentium */ #define _FPP_SPARC 2 /* Sun SPARC */ #define _FPP_MIPS 3 /* SGI MIPS */ #define _FPP_S390 4 /* IBM S/390 */ #define _FPP_PPC 5 /* Motorola PowerPC */ #define _FPP_HPPA 6 /* Hewlett-Packard PA-RISC */ #define _FPP_ALPHA 7 /* Compaq Alpha */ #define _FPP_ARM 8 /* ARM ARM */ #define _FPP_M68K 9 /* Motorola 68xxx */ #define _FPP_SH4 10 /* Hitachi SH4 */ #define _FPP_IA64 11 /* Intel IA64 */ #define _FPP_WCE 12 /* EDG Windows CE */ /* MACROS FOR _Dtest RETURN (0 => ZERO) */ #define _DENORM (-2) /* C9X only */ #define _FINITE (-1) #define _INFCODE 1 #define _NANCODE 2 /* MACROS FOR _Feraise ARGUMENT */ #if _FPP_TYPE == _FPP_X86 #define _FE_DIVBYZERO 0x04 #define _FE_INEXACT 0x20 #define _FE_INVALID 0x01 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x10 #elif _FPP_TYPE == _FPP_SPARC #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x04 #elif _FPP_TYPE == _FPP_MIPS #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x04 #elif _FPP_TYPE == _FPP_S390 #define _FE_DIVBYZERO 0x08 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x02 #elif _FPP_TYPE == _FPP_PPC #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x04 #elif _FPP_TYPE == _FPP_HPPA #define _FE_DIVBYZERO 0x08 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x02 #elif _FPP_TYPE == _FPP_ALPHA #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x10 #define _FE_INVALID 0x01 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x08 #elif _FPP_TYPE == _FPP_ARM #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x10 #define _FE_INVALID 0x01 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x08 #elif _FPP_TYPE == _FPP_M68K #define _FE_DIVBYZERO 0x02 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x04 #elif _FPP_TYPE == _FPP_SH4 #define _FE_DIVBYZERO 0x08 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x02 #elif _FPP_TYPE == _FPP_IA64 #define _FE_DIVBYZERO 0x04 #define _FE_INEXACT 0x20 #define _FE_INVALID 0x01 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x10 #elif _FPP_TYPE == _FPP_WCE #define _FE_DIVBYZERO 0x08 #define _FE_INEXACT 0x01 #define _FE_INVALID 0x10 #define _FE_OVERFLOW 0x04 #define _FE_UNDERFLOW 0x02 #else /* _FPP_TYPE == _FPP_NONE or unknown */ #undef _FPP_TYPE #define _FPP_TYPE _FPP_NONE #define _FE_DIVBYZERO 0x04 /* dummy same as Pentium */ #define _FE_INEXACT 0x20 #define _FE_INVALID 0x01 #define _FE_OVERFLOW 0x08 #define _FE_UNDERFLOW 0x10 #endif /* _FPP_TYPE */ /* TYPE DEFINITIONS */ typedef union { /* pun float types as integer array */ unsigned short _Word[8]; float _Float; double _Double; long double _Long_double; } _Dconst; /* ERROR REPORTING */ void _Feraise(int); /* double DECLARATIONS */ double _Cosh(double, double); short _Dtest(double *); short _Exp(double *, double, short); double _Log(double, int); double _Sin(double, unsigned int); double _Sinh(double, double); #ifdef _FLOAT_DATA_IS_CONST extern const _Dconst _Denorm, _Hugeval, _Inf, _Nan, _Snan; #else extern /* const */ _Dconst _Denorm, _Hugeval, _Inf, _Nan, _Snan; #endif /* float DECLARATIONS */ float _FCosh(float, float); short _FDtest(float *); short _FExp(float *, float, short); float _FLog(float, int); float _FSin(float, unsigned int); float _FSinh(float, float); #ifdef _FLOAT_DATA_IS_CONST extern const _Dconst _FDenorm, _FInf, _FNan, _FSnan; #else extern /* const */ _Dconst _FDenorm, _FInf, _FNan, _FSnan; #endif /* long double DECLARATIONS */ long double _LCosh(long double, long double); short _LDtest(long double *); short _LExp(long double *, long double, short); long double _LLog(long double, int); long double _LSin(long double, unsigned int); long double _LSinh(long double, long double); #ifdef _FLOAT_DATA_IS_CONST extern const _Dconst _LDenorm, _LInf, _LNan, _LSnan; #else extern /* const */ _Dconst _LDenorm, _LInf, _LNan, _LSnan; #endif #if defined(__SUNPRO_CC) /* compiler test */ float fmodf(float, float); long double fmodl(long double, long double); #endif /* defined(__SUNPRO_CC) */ #if defined(__BORLANDC__) /* compiler test */ float fmodf(float, float); float logf(float); #endif /* defined(__BORLANDC__) */ _END_C_LIB_DECL _C_STD_END #endif /* _YMATH */ /* * Copyright (c) 1992-2003 by P.J. Plauger. ALL RIGHTS RESERVED. * Consult your license regarding permissions and restrictions. V4.02:1296 */

Just explain that in slightly more detail and I will raise a support ticket with QNX.
Basically fpclassify is returning FP_NORMAL rather than FP_SUBNORMAL when passed a denormalised number (at least that's my assumption based on the results you reported). The simple test case below should allow you to check this, Regards, John. P.S. It's probably not a good idea to post headers that aren't open-source to a public list mailing list ;-) #include <assert.h> #include <math.h> #include <limits> template <class T> void test(T t) { T denorm; // denorm_min should be either zero or a denormalised number: denorm = std::numeric_limits<T>::denorm_min(); if(denorm != static_cast<T>(0.0)) assert(fpclassify(denorm) == FP_SUBNORMAL); // any number less than min() should be demormal or zero: denorm = std::numeric_limits<T>::min() / 2; if(denorm != static_cast<T>(0.0)) assert(fpclassify(denorm) == FP_SUBNORMAL); } int main() { test(1.0F); test(1.0); test(1.0L); }
participants (15)
-
Boris Gubenko
-
Christoph Ludwig
-
Daniel James
-
Daryle Walker
-
Guillaume Melquiond
-
Ian McCulloch
-
Jim Douglas
-
John Maddock
-
Kevin Wheatley
-
Markus Schöpflin
-
Martin Wille
-
Matthias Troyer
-
Paul A Bristow
-
Pedro Lamarão
-
Robert Ramey