
At 11:02 AM 7/11/2006, Paul A Bristow wrote:
| So let's use the Students T distribution as an example. The | Students T | distribution is a *family* of 1-dimensional distributions | that depend on a single parameter, called "degrees of freedom".
Does the word *family* implies integral degrees of freedom? Numerically, and perhaps conceptually, it isn't - it's a continuous real. So could one also regard it as a two parameter function f(t, v) ? However I don't think this matters here.
No, a "family of distributions" does not imply that the parameters are integral. What is frequently referred to as *the* normal distribution is also a family parameterized by the mean and standard deviation. Transformation between members of the family is so easy that we generally transform everything into and from one member of the family the "standard normal" distribution. Keep in mind that a distribution is not a function, although it is associated with several functions or function-like entities. Standard usage is to consider the distributions in the family to be indexed by parameters and therefore the associated functions to be indexed, single parameter functions. There isn't much difference mathematically, though, between p[mu, sigma](x) and p(mu, sigma, x) (even when the indexes *are* integral), and sometimes it is useful to reframe them in that way. The point is, that is a reframing, and the standard (no, I am not imagining that it is standard) usage is to treat single-dimensional distributions as being single-dimensional.
| Given a value, say, D, | for the degrees of freedom, you get a density function p_D and | integrating it gives you the cumulative density function P_D.
What about the Qs? (complements)
| As I mentioned before, these should be member functions, | which could be called "density" (also called 'mass')
| and "cumulative".
OHOH many books don't mention either of these words!
But I would be very, very surprised to find many serious statistics books written in English that don't.
The whole nomenclature seems a massive muddle, with mathematicians, statistics, and users or all sorts using different terms and everyone thinks they are the 'Standard' :-(
Some variation exists due to the interdisciplinary origin and continued nature of the field, but most of the terminology is pretty standard with some enclaves of specialized usage.
And the highest priority in my book is the END USERS, not the professionals.
Exactly -- the professionals are aware of the non-standard usage. Lets give the end users a chance of being able to use what they learned in their high school stat class.
| The cumulative density function is a strictly increasing | function and | therefore can be inverted. The inverse function could be called | "inverse_cumulative", which is a completely unambiguous name.
But excessively long :-(
| I would say that these three member functions should be | common to all | implemented distributions. Other common member functions | might include | "mean", "variance", and possibly others.
Median, mode, variance, skewness, kurtosis are common given, for example:
Skewness and kurtosis are generally defined but rarely used for distributions. Their computation on small or even moderate samples tends to be rather unstable, so comparison to the ideal distributions isn't terribly useful. I wouldn't bother with them. Mode is not uniquely defined for many distributions, nor is it that commonly used (even if the references give a formula) in practice for unimodal distributions. Except for some specialized uses, these are more useful for theory than for computation -- more algebraic than numerical. There are a lot of other possible associated functions, such as general quantiles or various confidence intervals, but I don't think many of them have general enough use to bother with for all distributions. People who need it could use the distribution as a template parameter. The only exception I would suggest would be to include the convenience of the standard deviation as well as the variance. One might stick in RNG here but that is redundant at this point. As to naming of the probability functions: My personal preference would be to use what is probably the most common abbreviations for the basic functions. They are simple, compact and standard. Maybe a little obscure for those who only took statistics in high school or some who only know cookbook statistics -- but that is what documentation is for. The ignorant are after all ignorant whatever choice is made, but you can do something about it by using the standard terms: dist.pdf(x) -- Probability Density Function, this is what looks like a "bell shaped curve" for a normal distribution, for example. A.k.a. "p" dist.cdf(x) -- Cumulative Distribution Function. P dist.ccdf(x) -- Complementary Cumulative Distribution Function; ccdf(x) = 1 - cdf(x) dist.icdf(p) -- Inverse Cumulative Distribution Function: P'; icdf(cdf(x)) = x and vice versa dist.iccdf(p) -- Inverse Complementary Cumulative Distribution Function; iccdf(p) = icdf(1-p); iccdf(ccdf(x)) = x Topher