
| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Deane Yang | Sent: 10 July 2006 21:41 | To: boost@lists.boost.org | Subject: Re: [boost] [math/staticstics/design] How best to | name statisticalfunctions? | | Paul A Bristow wrote: | > | -----Original Message----- | > | From: Kevin Lynch | | > | Why not hide the functions behind a class interface? | After all, the | > | various functions are "properties" of the distributions. Hence: | > | | > | class students_t { | > | students_t(double mu); | > | double P(double x); | > | double Q(double x); | > | double invP(double p); (or perhaps inverseP or Pinv or | > | something) | > | ..... | > | } | > | | > | class normal { | > | normal(double mu, double sigma); | > | double P(double x); | > | double Q(double x); | > | double invP(double x); | > | ...... | > | } | > | > Rather interesting idea. | | I support Kevin's proposal rather strongly for exactly the | reasons he | states. But I'm not sure what P, Q, invP mean. I would prefer: | | double density(double x); | double cumulative(double x); | double inverse_cumulative(double y); | | > How would you envisage this working with Fisher, for | example which has | > degrees of freedom 1 and 2, and a variance ratio. | > | > Is this a 1D or 2D or 3D? | > | > Its inversion will return df1 (given df2 and F and Probability) | > or df2 (given df1, F and Prob) | > or F (given Df1 and df2 and Prob) | > | > WOuld you like to flesh out how you suggest handling all these? | > | | Could you clarify your question? Isn't the F distribution still the | probability distribution of a single real random variable? The | cumulative and inverse cumulative density functions have a | consistent mathematical meaning for any 1-dimensional probability | distribution, do they not? Well, if you regard the degrees of freedom as fixed, or the probability as fixed, often 95%, then yes, but, I would say that they are 2D (and others 3D) distributions. To keep it simpler, lets go back to the students t which I have implemented (actually templates but ignore that for now) as double students_t(double degrees_of_freedom, double t) t is roughly a measure of difference between two things (means for example) this returns the probability that the things are different. If degrees_of_freedom are small (you only measured 3 times, say), then t can be big, but it still doesn't mean much. But if you made a 100 measurements, it probably does. When you do the inverse, you may want to say, I want to be 95% confident, and I already have fixed the degrees_of_freedom, so what is the corresponding value for t. This is what the ubiquitous styudent's t tables do. On the other hand, sometimes you may decide you want 95% confidence, and you have already made some measurements of t, but you want to know how many (more probably) measurements (degrees_of_freedom) you would have to make to get this 95%. This is common problem - and often reveals in drug trials, for example, that there are not enough potential patients available to carry out a trial and achieve a 95% probability. If you accept this, then the problem is how to name the two, or three 'inverses' (and complements). students_t_inv_t and students_t_inv_df ??? Paul PS I also worry about the risk of code bloat. At present, I think that you don't pay for what you don't use. We certainly don't want all the possible functions discussed above instantiated, even for one floating-point type, if only one function is actually used. --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com