Re: [boost] [math/staticstics/design] How best to name statistical functions?

10 Jul 2006

      |  -----Original Message-----
|  From: boost-bounces@lists.boost.org 
|  [mailto:boost-bounces@lists.boost.org] On Behalf Of Jeff Garland
|  Sent: 08 July 2006 17:49
|  To: boost@lists.boost.org
|  Subject: Re: [boost] [math/staticstics/design] How best to 
|  name statistical functions?
|  
|  John Maddock wrote:
|  > Paul Bristow has been toiling away producing some 
|  statistical functions on  top of some of my Math special functions, and
we've 
|  encountered a bit of a  naming dilemma that I hope the ever resourceful
Boosters 
|  can solve for us 
|  > :-)
|  
|  Possibly better, save him from writing them, possibly?  Has 
|  he looked at Eric Niebler's statistical accumulators?

Indeed - on my TODO list.

Some further background, before you all leap in with your favourite names
;-)

This is to support my proposal

A Proposal to add Mathematical Functions for Statistics
to the C++ Standard Library
Document number: JTC 1/SC22/WG14/N1069, WG21/N1668
Date:11 Aug 2004

Recent WG21 paper

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2003.html

  includes this response to my proposal

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1668.pdf

(To be reissued revised as N2048 but missed this mailing):

"
N1668 A Proposal to add Mathematical Functions for Statistics to the C++
Standard Library Date: 2004-08-11
Status: Open.

Lillehammer [2005-04]: The main argument against this proposal is that a
high-quality implementation would be extremely hard; this is about 150
functions, most of which have several parameters. Issue: are we willing to
standardize something with the expectation that most implementations will be
low quality? Are these functions ones where poor accuracy is acceptable? (If
so, we could do this for float only, and drop the double and long double
versions.)

Mixed interest.

No consensus for bringing this forward at this meeting. What might change
people's mind:

 1. Reasoning for why to include these functions and exclude others.

 2. A smaller set of functions.

 3. If this is intended to support an easy-to-use statistical package, then
show the interface for that statistical package first.
" 

But I think after John's stunning work on the incomplete beta & gamma, the
guts of the functions that you all need to get information from your data
using statistics, we are close to meeting the WG21 'requirements' to accept
this proposal.  His work in the sandbox is functionally complete.  I am just
doing some 'grunt' work on cosmetics and the wrappers to provide the
statistics functions in a format that is best for the end users.

Before you jump to judgement on this issue, I invite (beg!) you to consider
the end users' needs.  They are NOT mathematicians, they are probably NOT
professional statisticians, but are ordinary physicist, chemists, surgeons,
social 'scientists', bee keepers, farmers ...

Bear in mind too that these groups all have different customary
names/jargons for many of these functions.

So IMO the names have to be helpful as possible TO THE USERS - clarity
before curtness.

There is also the complication that the distributions have, so-called by
some, 'mass' values and 'cumulative' for others, and these two are confusing
and confused, especially if they have the same name!  Ideally we would have
wrappers which provide BOTH of these variants.

For each function there are variants - complements, and inverses (more than
one inverse if more than one argument - something I have NOT tackled in the
list before and I have only realised the need when doing the wrappers!)  The
inverse functions have been tackled by John mainly using root finding
methods - the incomplete beta inverse is as usual MUCH more difficult and
John has a state-of-the-art solution by Professor Temme.

[Example, the 'forward' functions are useful tell you the probability of a
hypothesis, the 'inverse' is useful to tell you what something would be
needed to achieve a certain probability, for example a number of
measurements or samples, OR the variance (or accuracy of measurement)].

To complicate things futher, here are also annoying C99 precedents in erf
and erfc, which by Boost convention of using _ should be erf_c.

These are some of the reasons why I came up with the list of names below.

But as John has explained FOR ONE FUNCTION Student's t, it is not really
enough.

Your suggestions are most welcome.

Paul

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow@hetp.u-net.com

Mathematical 'special' functions
(only double versions are shown, overloads for float and long double will
also be provided).
double beta_distribution(double a, double b, float x)); // Beta distribution
function.
double beta_incomplete (double a, double b, double x); // Incomplete beta
integral.
double beta_incomplete_inv (double a, double b, double y); // Inverse of
incomplete beta
integral.
double binomial (unsigned int k, unsigned int n, double p); // Binomial
distribution function.
double binomial_c (unsigned int k, unsigned int n, double p); // Binomial
distribution
function complemented.
double binomial_distribution_inv(unsigned int k, unsigned int n, double y);
// Binomial
distribution function inverse.
double binomial_neg_distribution (unsigned int k, unsigned int n, double p);
// Negative
binomial distribution .
double binomial_neg_distribution_c (unsigned int k, unsigned int n, double
p); // Negative
binomial distribution complement.
double binomial_neg_distribution_inv (unsigned int k, unsigned int n, double
p); // Inverse of
negative binomial distribution.
double chi_sqr_distribution(double df, double x); // Chi-squared
distribution function.
double chi_sqr_distribution_c(double df, double x); // Chi-squared
distribution function
complemented.
double chi_sqr_distribution_c_inv(double df, double p); // Inverse of
Chi-squared distribution
function complemented.
double digamma(double x); // psi or digamma function.
double fisher_distribution(unsigned int ia, unsigned int ib, double c); //
Fisher F
distribution.
double fisher_distribution_c(unsigned int ia, unsigned int ib, double c); //
Fisher F
distribution complemented.
double fisher_distribution_c_inv(double dfn, double dfd, double y); //
Inverse of complemented
Fisher F distribution.
double gamma_distribution (double a, double b, double x); // Gamma
probability distribution
function.
double gamma_distribution_c (double a, double b, double x); // Gamma
probability distribution
function complemented.
double gamma_incomplete (double a, double x); // Incomplete gamma function.
double gamma_incomplete_c (double a, double x); // Incomplete gamma function
complemented.
double gamma_incomplete_inv (double a, double y0); // Inverse of incomplete
gamma integral.
double gamma_incomplete_c_inv (double a, double y0); // Inverse of
complemented incomplete
gamma integral. double gamma (double x); // gamma function (or tgamma as in
C99 math.h?)
double lgamma (double x); // log gamma function name as C99.
double normal_distribution (double a); // Normal distribution function.
double normal_distribution_inv (double a); // Inverse of normal distribution
function.
double poisson_distribution (unsigned int k, double m); // Poisson
distribution.
double poisson_distribution_c(unsigned int k, double m); // Complemented
Poisson distribution.
double poisson_distribution_inv(unsigned int k, double y); // Inverse
Poisson distribution.
double students_t (double df, double t); // Student's t.
double students_t_inv (double df, double p); // Inverse of Student's t.
double students_t (unsigned int df, double t); // Student's t.
double students_t_inv(unsigned int df, double p); // Inverse of Student's t.
Distribution function probabilities and quantiles
double normal_probability(double z); // Probability of quantile z.
double normal_quantile(double p); // Quantile of probability p.
double students_t_probability(double t, double df, double ncp);//
Probability of quantile.
double students_t_quantile(double p, double df, double ncp); // Quantile of
probability p.
double chi_sqr_probability(double x, double df, double ncp); // Probability
of quantile.
double chi_sqr_quantile(double p, double df, double ncp); // Quantile of
probability p.
double beta_probability(double x, double a, double b); // Probability of x,
a, b.
double beta_quantile(double p, double a, double b); // Quantile of
double fisher_probability(double f, double dfn, double dfd, double ncp); //
Probability of
quantile.
double fisher_quantile(double p, double dfn, double dfd, double ncp); //
Quantile of
probability p.
double binomial_probability(double x, double n, double pr); // Probability
of x.
unsigned int binomial_first(double p, unsigned int n, double r); // 1st k
for probability >= p
double neg_binomial_probability(double x, double n, double pr); //
Probability of quantile.
double poisson_probability(double x, double lambda); // Probability of
quantile.
double poisson_quantile(double p, double lambda); // Quantile of probability
p.
double gamma_probability(double x, double shape, double scale); //
Probability of x.
double gamma_quantile(double p, double shape, double scale); // Quantile of
probability p.
double smirnov(int n, double p); // Exact Smirnov statistic.
double smirnov_inv(int n, double x); // Exact Smirnov statistic.
double kolmogorov ( double ); // Kolmogorov statistic.
double kolmogorov_inv (double p); // Kolmogorov statistic inverse.