Re: [boost] [math/staticstics/design] How best to name statistical functions?

10 Jul 2006

      |  -----Original Message-----
|  From: boost-bounces@lists.boost.org 
|  [mailto:boost-bounces@lists.boost.org] On Behalf Of Kevin Lynch
|  Sent: 09 July 2006 11:49
|  To: boost@lists.boost.org
|  Subject: Re: [boost] [math/staticstics/design] How best to 
|  name statistical functions?
|  
|  John Maddock wrote:
|  > Paul Bristow has been toiling away producing some 
|  statistical functions on 
|  > top of some of my Math special functions, and we've 
|  encountered a bit of a 
|  > naming dilemma that I hope the ever resourceful Boosters 
|  can solve for us :-)
|  Why not hide the functions behind a class interface?  After all, the
|  various functions are "properties" of the distributions.  Hence:
|  
|  class students_t {
|  	students_t(double mu);
|  	double P(double x);
|  	double Q(double x);
|  	double invP(double p); (or perhaps inverseP or Pinv or 
|  something)
|  	.....
|  }
|  
|  class normal {
|  	normal(double mu, double sigma);
|  	double P(double x);
|  	double Q(double x);
|  	double invP(double x);
|  	......
|  }

Rather interesting idea.

|  
|  This interface has a few major benefits over raw functions:
|  
|  1) Since Paul is using your C++ special functions library in the
|  implementation, there's no argument on the implementation side for C
|  compatibility.  Without C compatibility as a driving force, you don't
|  need to stick with free functions and the corresponding combinatorial
|  explosion of hard to remember names.

Agreed.

|  2) A class interface also lets you carry around data specific to the
|  current "in use" distribution in one place, rather than 
|  needing to stuff
|  it into every call (the mean in the case of Student's t, the mean and
|  deviation for the Normal, etc).

|  3) This "normalizes" the interface for the calls to the distribution
|  functions - every call for "P" has exactly one argument, and 
|  not two or three or four depending on the distribution in use.

How would you envisage this working with Fisher, for example which has
degrees of freedom 1 and 2, and a variance ratio.

Is this a 1D or 2D or 3D?

Its inversion will return df1 (given df2 and F and Probability)
or df2 (given df1, F and Prob)
or F (given Df1 and df2 and Prob)

WOuld you like to flesh out how you suggest handling all these?

|  4) The consistent interface is of course easier to document, 
|  teach and learn, and easier to use.

Yes, usability is a major requirement to allow all and sundry to USE this.

|  You might also want to provide a
|  function to obtain the non-cumulative distribution value (perhaps
|  operator() or dist() or something).

Yes - most desriable - but this project is getting bigger, day by day ;-)

(as an aside, John has devised a way to avoid bloat caused by the
expectation that one can provide degrees of freedom as an integer OR a
floating-point.  Without his meta-magic, a serious downside of a fully
templated version would be instantiation of many variants of functions).

|  Of course, you would probably templatize and you might want 
|  to inherit
|  from 1D or 2D abstract base classes if you plan to provide
|  multidimensional distributions  (or maybe not ...) and functions that
|  operate on distributions.
|  
|  In any case, I look forward to the results....

Watch this space...

Paul

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow@hetp.u-net.com