Re: [boost] [math/staticstics/design] How best to name statisticalfunctions?

11 Jul 2006

      Paul A Bristow wrote:
...
Well, if you regard the degrees of freedom as fixed, or the probability as
fixed, often 95%,
then yes,
but, I would say that they are 2D (and others 3D) distributions.
To keep it simpler, lets go back to the students t which I have
implemented (actually templates but ignore that for now) as
double students_t(double degrees_of_freedom, double t)
t is roughly a measure of difference between two things (means for example)
this returns the probability that the things are different.
If degrees_of_freedom are small (you only measured 3 times, say),
then t can be big, but it still doesn't mean much.
But if you made a 100 measurements, it probably does.
When you do the inverse, you may want to say, I want to be 95% confident,
and I already have fixed the degrees_of_freedom, so what is the
corresponding
value for t.  This is what the ubiquitous styudent's t tables do.
On the other hand, sometimes you may decide you want 95% confidence, and you
have already made some measurements of t, but you want to know how many
(more probably) measurements (degrees_of_freedom) you would have to make to
get this 95%.
This is common problem - and often reveals in drug trials, for example, that
there are not enough potential patients available to carry out a trial and
achieve a 95% probability.
If you accept this, then the problem is how to name the two, or three
'inverses' (and complements).
students_t_inv_t  and students_t_inv_df ???
I think you're confusing *the* inverse cumulative distribution function 
with other possible inverse functions that can be defined for each 
specific distribution. This is why I really dislike a name like 
"students_t_inv_t", which tells me very little about what it is.

So let's use the Students T distribution as an example. The Students T 
distribution is a *family* of 1-dimensional distributions that depend on 
a single parameter, called "degrees of freedom". Given a value, say, D, 
for the degrees of freedom, you get a density function p_D and 
integrating it gives you the cumulative density function P_D.

As I mentioned before, these should be member functions, which could be 
  called "density" and "cumulative".

The cumulative density function is a strictly increasing function and 
therefore can be inverted. The inverse function could be called 
"inverse_cumulative", which is a completely unambiguous name.

I would say that these three member functions should be common to all 
implemented distributions. Other common member functions might include 
"mean", "variance", and possibly others.

Finally, you observe that it is often useful to specify the cumulative 
probability for a given value of the random variable and solve for the 
parameter (the "degrees of freedom" for a Students T distribution) that 
determines the distribution. Since each family of distributions depends 
on a different set of parameters (for example, normal distributions 
depend on two parameters, the mean and variance), the interface for this 
  is trickier to define. I can think of two possibilities (I prefer the 
first):

1) Define ad hoc inverse functions for each specific distribution. So 
for the Students T distribution, you would define a member function of 
the form:

double degrees_of_freedom(double cumulative_probability, double 
random_variable) const;

2) Always specify distribution parameters (other than the random 
variable itself) in the constructor using a tuple (a 1-tuple for the 
Students T and a 2-tuple for the normal). You could then define 
templated inverse functions:

template <unsigned int index>
double inverse(double cumulative probability, double random_variable) const;

Each function would hold all other parameters fixed (as set by the 
constructor) and solve for the parameter specified by the index.

(I don't like using tuples as an input type, because it means I always 
have to be very careful about the order of the parameters.)

Deane

Re: [boost] [math/staticstics/design] How best to name statisticalfunctions?

Deane Yang