More Math Functions for Statistics (and TR2)

More Mathematical Functions especially for Statistics +++++++++++++++++++++++++++++++++++++++++++++++++++++ Aficionados of mathematical functions may remember that I proposed some additional functions http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2004/n1668.pdf to the choice of Walter Brown which have been added to the C++ TR1 (latest draft) http://www2.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1905.pdf The rationale for my proposed functions is that previously standardized functions do not include many functions which have very much wider utility in statistical problems, even the most elementary, like Student's t, Fisher and Chi-sqr tests. To recap, a long list of basic and essential functions were added with the C99 for which rationale is given in http://www2.open-std.org/JTC1/SC22/WG14/www/C99RationaleV5.10.pdf Briefly the total list of functions consists of 1 Original math.h functions, log, exp, pow, sin, cos... 1 So-called IEEE recommended functions, signbit, isnan, isfinite ... 2 Some simplish math functions like lgamma, tgamma, cbrt, acosh. 3 TR1 Bessel functions, Reimann, hypergeometric... 4 TR2 proposed additions, incomplete beta, Fisher, Student, Chisqr probability quantile distributions... I have now produced a first demo of a few of these functions to carry out some very simple, but almost universally applicable, statistical tests - Student's t and Fisher's F-ratio. The example is from analytical chemistry, but could equally well be of school exams, economic comparisons, measurement of lengths, brightness of stars - the list is endless. The demo shows computation of the probability that two methods of extracting tin from foodstuffs give a different result and the probability that one method is more precise than the other. It does NOT require the use of published tables. All the hard and expert work has been done by Stephen Moshier, author of a respected book on mathematical functions, and a long supported (since 1984) and extensive library of mathematical functions in C called Cephes. Moshier has kindly offered to change the library license terms to the Boost terms to support this project. I have merely packaged a few examples of the C functions as C++ functions and placed them in namespaces std::tr2 to show what they would look like in final form. The C versions are required to remain in global namespace to follow the example of the C99 functions which can be used by both C and C++ programs. I am advised that this is a requirement to get the proposal accepted by C WG14 and C++ WG21 Standard groups. I have compiled all the Cephes modules (with a few name changes using macros in the mconf.h configuration header file) and built a library from them. In unitTestfunc1.cpp, I show declarations and definitions of a few sample functions float cbrt(float x) { // Loss of speed over C float version cbrtf? // But not less accurate. // return float(::cbrt(x)); return ::cbrtf(x); // C99 implementation from Cephes, // 32-bit IEEE 754 single precision. } //float cbrt(float x) Some of the 'issues' are: 1 Floating-point format. I have assumed IEEE formats, but try to check and warn if not. All built-in types can vary in precision, especially double and long double: I have shown an example of selecting long double having 53, 80 or 128 significand bits (all IEEE formats). 2 I have only used MSVC 8.0 (but I expect older versions to work too) where long double == 64-bit double. 3 Cephes can be configured to deal with 'wrong' endianness using a macro definition in the configuration header. This might be derived from a Boost header? 4 User Defined Types I have also used Victor Shoup's NTL 128-bit and arbitrary precision library to show that the function definitions can be successfully extended to include User Defined Types such as these. There are a small but significant group of users who want a higher precision than the native long double provides, especially Microsoft compiler users where long double is identical to double and only 64-bit (about 15 decimal digit). (I'm sorry that this makes the sample code rather messy, but you can skip over the NTL part). 5 I have shown some sample tests using the Boost Unit Test framework. These are only a few spot values, calculated using Cephes DOSbox qcalc.exe 100 decimal digit calculator (output to 40 decimal digits) or NTL. These are intended merely to test a few obvious, random, or exact values, a few corner cases, and around some points at which the algorithms are know to change. They aim to check for general algorithm 'sanity' rather than attempt to evaluate accuracy in detail over the whole range. The Cephes functions have already been tested rather fully and their accuracy and tests are reported in the package. 6 In the test example, for simplicity I have concatenated all the declarations and definitions in the same file. These will of course be in separate header and/or source files. 7 I have not concerned myself with efficiency or speed and assumed that the compiler will inline and otherwise optimise away inefficiencies from wrapping. In some cases, the float versions may simply do the calculation in double and return a conversion to float: it may be the best speed and/or accuracy anyway. 8 Implementations will always have to make some compromises between size, speed and accuracy. 9 Cephes deals with errors (for example, from wrong parameter values) by the then conventional method of calling matherr & setting a global math error no, and printing a message, rather than throwing exceptions. But I note that the system-wide error numbers (XENIX) in the Microsoft error.h include files are undocumented, and conflict with the Cephes codes. And C99 (and thus TR1?) no longer REQUIRE a math error to set errno. The simplest solution would be to change the existing matherr function to set errno and NOT print any message: users would always be free to change this module to suit their other requirements. I would NOT wish to implement the C99 exception handling, nor C++ floating_point exceptions. __Detailed__ recommendations about if and how this error handling should be made more 'standard', perhaps using error.h would be welcomed. 10 Cephes could provide test and values for NaNs and infinity etc, but only for some floating- point layouts. These C99 functions are _essential_, but outside the scope of my work at present. 11 I have assumed that compilation and link option choices will deal with floating point instruction set variations, including 'intrinsic' functions like log, exp, sin, cos. This will complicate building the math library and provide potential for confusion at link time. 12 The Cephes code produces a lot of warnings when compiled in 'strict' mode. I have ignored these, although they suggest that the C code does not meet todays 'picky' standards. However a full revision would be a big undertaking with risks of introducing errors: my impression is that most of the warnings look spurious. 13 Cephes implements complex but I have not yet even thought about it ;-) 14 Finally may I reiterate that the objective of this implementation is to persuade the C and C++ Standards groups to standardize the signatures of these functions: it will be possible for better implementations to be produced in the future by both open source and commercial organisations like Dinkumware. Before I undertake the tedious task of wrapping all the proposed TR2 functions (and as many of the C99 and TR1 functions as possible), and presenting the collection for Boost review, I would like feedback from Boosters ever-critical eyes. Bear in mind that (without dealing with complex) there are nearly ***500*** function signatures! I would especially like to avoid a lot of bright new ideas emerging at the review stage, as seems to happen all too often. Speak now or forever hold thy peace, please! Thank you. Paul For those want to see code, a simple demo is at: http://www.hetp.u-net.com/public/mathFuncDemo1.cpp http://www.hetp.u-net.com/public/math2.cpp http://www.hetp.u-net.com/public/math2.hpp and two samples, simple and with UDTs of unit testing: http://www.hetp.u-net.com/public/unitTestFunc1.cpp http://www.hetp.u-net.com/public/unitTestFunc2.cpp -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

After receiving a dose of my own email medicine, I have realized that it is totally unreadable :-(( A more readable version is at http://www.hetp.u-net.com/public/More%20Mathematical%20Functions%20Progress% 2030%20Nov%2005.pdf or http://tinyurl.com/b7yw2 Sorry. Your comments are most welcome. Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html | -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Paul A Bristow | Sent: 29 November 2005 15:07 | To: boost@lists.boost.org | Subject: [boost] More Math Functions for Statistics (and TR2)

Hi Paul, Thanks for doing this important work.
Some of the 'issues' are:
1 Floating-point format. I have assumed IEEE formats, but try to check and warn if not. All built-in types can vary in precision, especially double and long double: I have shown an example of selecting long double having 53, 80 or 128 significand bits (all IEEE formats).
7 I have not concerned myself with efficiency or speed and assumed
Q: is the precision of the algorithms actually so good that it makes a difference if one uses float, double or long double? that the
compiler will inline and otherwise optimise away inefficiencies from wrapping.
Good.
9 Cephes deals with errors (for example, from wrong parameter values) by the then conventional method of calling matherr & setting a global math error no, and printing a message, rather than throwing exceptions. But I note that the system-wide error numbers (XENIX) in the Microsoft error.h include files are undocumented, and conflict with the Cephes codes. And C99 (and thus TR1?) no longer REQUIRE a math error to set errno. The simplest solution would be to change the existing matherr function to set errno and NOT print any message: users would always be free to change this module to suit their other requirements. I would NOT wish to implement the C99 exception handling, nor C++ floating_point exceptions.
__Detailed__ recommendations about if and how this error handling should be made more 'standard', perhaps using error.h would be welcomed.
I guess throwing is out of the question because of C compatibility? How are error handled in the new tr1 math functions?
12 The Cephes code produces a lot of warnings when compiled in 'strict' mode. I have ignored these, although they suggest that the C code does not meet todays 'picky' standards. However a full revision would be a big undertaking with risks of introducing errors: my impression is that most of the warnings look spurious.
14 Finally may I reiterate that the objective of this implementation is to persuade the C and C++ Standards groups to standardize the signatures of these functions: it will be possible for better implementations to be produced in the future by both open
How many warnings? what type of warning? source
and commercial organisations like Dinkumware.
Right. And it will probably make a big difference to some vendors that the functions are now in Boost. When you write a revised paper, please enhance the "motivation" section by drawing from exampes in other languages. If it has been part of Visual Basic, then write it. If similar functions are used oftenin Java, then tell something about it. Consider how to answer these questions: 1. why so many functions? are they all really needed? 2. would it make more sense to standardize a usability layer on top of these functions? br Thorsten

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Thorsten Ottosen | Sent: 03 December 2005 18:32 | To: boost@lists.boost.org | Subject: Re: [boost] More Math Functions for Statistics (and TR2) | Thanks for doing this important work. Thanks for your encouragement. | Q: is the precision of the algorithms actually so good that | it makes a | difference if one uses float, double or long double? The results shown in the package strongly suggest so, but some functions are MUCH harder to get right than others. | | I guess throwing is out of the question because of C compatibility? :-(( | How are error handled in the new tr1 math functions? C style matherr, I believe, but I have yet to see examples coded. I suspect Dinkumware may have built in options to use exceptions? | > my impression is that most of the warnings look spurious. | How many warnings? what type of warning? Dozens - mostly data loss on conversion, unsigned/signed, untidynesses like unused parameters. I don't want to alter Moshiers code at all (yet). | When you write a revised paper, please enhance the "motivation" section | by drawing from exampes in other languages. If it has been part of | Visual Basic, then write it. If similar functions are used often in Java, | then tell something about it. OK - but there isn't much to tell except that they have them. | 1. why so many functions? are they all really needed? Well of course there was some hyperbole about the number - it counts up from sin and cos to the highest functions - but they all link together so any one missing is a real pain to someone. | 2. would it make more sense to standardize a usability layer on top IMO, higher layers are entirely a user matter. The incomplete beta is to statistics as sin and cos are to geometry. One might consider something to get the moments - mean, variance, skew and kurtosis - but then you get into dealing with the many containers and iterator solutions look attractive, but then where do you put the calculated moments? I'd rather not go there - yet: the functions are probably more than I can chew ;-) Paul

[Fernando, I'm CC'ing you because of the numeric_cast<> discussion here...] Paul A Bristow wrote:
| I guess throwing is out of the question because of C compatibility?
:-((
| How are error handled in the new tr1 math functions? C style matherr, I believe, but I have yet to see examples coded. I suspect Dinkumware may have built in options to use exceptions?
Yeah, maybe a macro could control this behavior: #if BOOST_MATH_STAT_FUNTIONS_THROW ... The default could be to throw for C++ and not for C.
| > my impression is that most of the warnings look spurious. | How many warnings? what type of warning?
Dozens - mostly data loss on conversion, unsigned/signed, untidynesses like unused parameters. I don't want to alter Moshiers code at all (yet).
well, I suggest that you do a numeric_cast in debug-mode and a static_cast in release-mode. That way you don't alter his code's behavior, but get rid of the warnings. We really should have something like asserting_numeric_cast<T>( u ) in cast.hpp
| When you write a revised paper, please enhance the "motivation" section | by drawing from exampes in other languages. If it has been part of | Visual Basic, then write it. If similar functions are used often in Java, | then tell something about it.
OK - but there isn't much to tell except that they have them.
right, but the more languages that actually provide them, the more presure is put on the comittee to adopt these.
| 1. why so many functions? are they all really needed?
Well of course there was some hyperbole about the number - it counts up from sin and cos to the highest functions - but they all link together so any one missing is a real pain to someone.
good point ... put that sentence in the paper.
| 2. would it make more sense to standardize a usability layer on top
IMO, higher layers are entirely a user matter. The incomplete beta is to statistics as sin and cos are to geometry.
another great sentence :-) get it in the paper.
One might consider something to get the moments - mean, variance, skew and kurtosis - but then you get into dealing with the many containers and iterator solutions look attractive, but then where do you put the calculated moments? I'd rather not go there - yet: the functions are probably more than I can chew ;-)
right, seems fair. br Thorsten

[Thank you Thorsten for pointing me to this] Hi Thorsten Hi Paul
my impression is that most of the warnings look spurious. How many warnings? what type of warning?
Dozens - mostly data loss on conversion, unsigned/signed, untidynesses like unused parameters. I don't want to alter Moshiers code at all (yet).
well, I suggest that you do a numeric_cast in debug-mode and a static_cast in release-mode. That way you don't alter his code's behavior, but get rid of the warnings.
We really should have something like
asserting_numeric_cast<T>( u )
in cast.hpp
That would range check but only in a DEBUG build right? The problem I have with this is that it hides the fact that range checking is disabled in a release build. A better solution, IMO, is to use something like: assert( is_in_range<T>(u) ) ; T t = static_cast<T>(u) ; which makes it clear that out of range conversions are "exceptional" (unexpected) and so are "asserted" rather than "always checked". The Boost Numeric Conversion Library already provides a stand alone range checking functionality through: numeric::converter<T,S>::out_of_range(u) == cInRange But granted, that is far from terse... I guess the library should have a shortcut for that in the form of template function "is_in_range<>" I'll add that one. Best Fernando Cacciola SciSoft http://fcacciola.50webs.com/
participants (3)
-
Fernando Cacciola
-
Paul A Bristow
-
Thorsten Ottosen