Re: [boost] [math/staticstics/design] How best to name statisticalfunctions?

11 Jul 2006

      |  -----Original Message-----
|  From: boost-bounces@lists.boost.org 
|  [mailto:boost-bounces@lists.boost.org] On Behalf Of Deane Yang
|  Sent: 10 July 2006 21:41
|  To: boost@lists.boost.org
|  Subject: Re: [boost] [math/staticstics/design] How best to 
|  name statisticalfunctions?
|  
|  Paul A Bristow wrote:
|  > |  -----Original Message-----
|  > |  From:  Kevin Lynch
|  
|  > |  Why not hide the functions behind a class interface?  
|  After all, the
|  > |  various functions are "properties" of the distributions.  Hence:
|  > |  
|  > |  class students_t {
|  > |  	students_t(double mu);
|  > |  	double P(double x);
|  > |  	double Q(double x);
|  > |  	double invP(double p); (or perhaps inverseP or Pinv or 
|  > |  something)
|  > |  	.....
|  > |  }
|  > |  
|  > |  class normal {
|  > |  	normal(double mu, double sigma);
|  > |  	double P(double x);
|  > |  	double Q(double x);
|  > |  	double invP(double x);
|  > |  	......
|  > |  }
|  > 
|  > Rather interesting idea.
|  
|  I support Kevin's proposal rather strongly for exactly the 
|  reasons he 
|  states. But I'm not sure what P, Q, invP mean. I would prefer:
|  
|  double density(double x);
|  double cumulative(double x);
|  double inverse_cumulative(double y);
|  
|  > How would you envisage this working with Fisher, for 
|  example which has
|  > degrees of freedom 1 and 2, and a variance ratio.
|  > 
|  > Is this a 1D or 2D or 3D?
|  > 
|  > Its inversion will return df1 (given df2 and F and Probability)
|  > or df2 (given df1, F and Prob)
|  > or F (given Df1 and df2 and Prob)
|  > 
|  > WOuld you like to flesh out how you suggest handling all these?
|  > 
|  
|  Could you clarify your question? Isn't the F distribution still the 
|  probability distribution of a single real random variable? The 
|  cumulative and inverse cumulative density functions have a 
|  consistent mathematical meaning for any 1-dimensional probability 
|  distribution, do they not?

Well, if you regard the degrees of freedom as fixed, or the probability as
fixed, often 95%,

then yes,

but, I would say that they are 2D (and others 3D) distributions.

To keep it simpler, lets go back to the students t which I have
implemented (actually templates but ignore that for now) as

double students_t(double degrees_of_freedom, double t)

t is roughly a measure of difference between two things (means for example)

this returns the probability that the things are different.

If degrees_of_freedom are small (you only measured 3 times, say),

 then t can be big, but it still doesn't mean much.

But if you made a 100 measurements, it probably does.

When you do the inverse, you may want to say, I want to be 95% confident,
and I already have fixed the degrees_of_freedom, so what is the
corresponding
value for t.  This is what the ubiquitous styudent's t tables do.

On the other hand, sometimes you may decide you want 95% confidence, and you
have already made some measurements of t, but you want to know how many
(more probably) measurements (degrees_of_freedom) you would have to make to
get this 95%.

This is common problem - and often reveals in drug trials, for example, that
there are not enough potential patients available to carry out a trial and
achieve a 95% probability.

If you accept this, then the problem is how to name the two, or three
'inverses' (and complements).

students_t_inv_t  and students_t_inv_df ???

Paul

PS I also worry about the risk of code bloat.  At present, I think that you
don't pay for what you don't use.  We certainly don't want all the possible
functions discussed above instantiated, even for one floating-point type, if
only one function is actually used.

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow@hetp.u-net.com