
| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Topher Cooper | Sent: 11 July 2006 17:32 | To: boost@lists.boost.org | Subject: Re: [boost] [math/staticstics/design] How best to | namestatisticalfunctions? | | At 11:02 AM 7/11/2006, Paul A Bristow wrote: | | | >| So let's use the Students T distribution as an example. The | >| Students T | >| distribution is a *family* of 1-dimensional distributions | >| that depend on a single parameter, called "degrees of freedom". | > | >Does the word *family* implies integral degrees of freedom? | | No, a "family of distributions" does not imply that the parameters | are integral. What is frequently referred to as *the* normal | distribution is also a family parameterized by the mean and standard | deviation. Transformation between members of the family is so easy | that we generally transform everything into and from one member of | the family the "standard normal" distribution. | | Keep in mind that a distribution is not a function, although it is | associated with several functions or function-like entities. | | Standard usage is to consider the distributions in the family to be | indexed by parameters and therefore the associated functions to be | indexed, single parameter functions. There isn't much difference | mathematically, though, between p[mu, sigma](x) and p(mu, sigma, x) | (even when the indexes *are* integral), and sometimes it is | useful to reframe them in that way. The point is, that is a | reframing, and the | standard (no, I am not imagining that it is standard) usage is to | treat single-dimensional distributions as being single-dimensional. Thanks, I think I understand better now. | >And the highest priority in my book is the END USERS, | >not the professionals. | | Exactly -- the professionals are aware of the non-standard | usage. Lets give the end users a chance of being able to use what | they learned in their high school stat class. My main objective :-)) | . Other common member functions might include | >| "mean", "variance", and possibly others. | > | >Median, mode, variance, skewness, kurtosis are common | given, for example: | > | >http://en.wikipedia.org/wiki/Student%27s_t | | Skewness and kurtosis are generally defined but rarely used for | distributions. Their computation on small or even moderate samples | tends to be rather unstable, so comparison to the ideal | distributions | isn't terribly useful. I wouldn't bother with them. Mode is not | uniquely defined for many distributions, nor is it that | commonly used | (even if the references give a formula) in practice for unimodal | distributions. Except for some specialized uses, these are more | useful for theory than for computation -- more algebraic | than numerical. | | There are a lot of other possible associated functions, such as | general quantiles or various confidence intervals, but I don't think | many of them have general enough use to bother with for all | distributions. People who need it could use the distribution as a | template parameter. The only exception I would suggest would be to | include the convenience of the standard deviation as well as the | variance. One might stick in RNG here but that is redundant | at this point. | As to naming of the probability functions: | | My personal preference would be to use what is probably the most | common abbreviations for the basic functions. They are simple, | compact and standard. Maybe a little obscure for those who | only took | statistics in high school or some who only know cookbook statistics | -- but that is what documentation is for. The ignorant are | after all | ignorant whatever choice is made, but you can do something about it | by using the standard terms: | | dist.pdf(x) -- Probability Density Function, this is what looks like | a "bell shaped curve" for a normal distribution, for | example. A.k.a. "p" | dist.cdf(x) -- Cumulative Distribution Function. P | dist.ccdf(x) -- Complementary Cumulative Distribution Function; | ccdf(x) = 1 - cdf(x) | dist.icdf(p) -- Inverse Cumulative Distribution Function: P'; | icdf(cdf(x)) = x and vice versa | dist.iccdf(p) -- Inverse Complementary Cumulative Distribution | Function; iccdf(p) = icdf(1-p); iccdf(ccdf(x)) = x My instinct is that these are too abbreviated, despite their logicalness. But this is the key problem - being clear, not curt, and yet concise. students_t.inverse_complement_cumulative_probability certains fails! ;-)) so we a getting to: template <T> // T an integral or real or floating-point type. T distribution(T x) const; // Probability Density Function or pdf or p T cumulative_probability(T x) const; // Cumulative Distribution Function. P cumulative_probability is too long :-( Do we REALLY need the cumulative here? T probability(T x) const; // Cumulative Distribution Function or cdf or P T quantile(T probability) const; // Also known as Inverse cumulative Distribution Function what do we call T complementary_cumulative_probability(T x) const; // Complementary Cumulative Distribution Function. Q ??? :-(( and worse what about Inverse Complementary Cumulative Distribution complementary_quantile??? :-(( and the ad hoc 'extra's static T degrees_of_freedom(T quantile, T probability) const; So I feel we haven't QUITE got there yet. But many thanks for your help so far. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com PS Since everybody obviously knows far more about stats that I do, can you also suggest fully worked examples that can be used to demonstrate usage in a tutorial. I'm especailly keen to show how superior using this would be to the traditional tables and fixed 95% confidence limits.