Re: [boost] [math/staticstics/design] How best tonamestatisticalfunctions?

13 Jul 2006

      |  -----Original Message-----
|  From: boost-bounces@lists.boost.org 
|  [mailto:boost-bounces@lists.boost.org] On Behalf Of Deane Yang
|  Sent: 12 July 2006 23:14
|  To: boost@lists.boost.org
|  Subject: Re: [boost] [math/staticstics/design] How best 
|  tonamestatisticalfunctions?
|  
|  Topher Cooper wrote:
|  > At 05:11 AM 7/12/2006, you wrote:
|  >>     T distribution(T x) const; // Probability Density 
|  Function or pdf or p
|  >>      T cumulative_probability(T x) const; // Cumulative 
|  Distribution
|  >> Function.  P
|  >>
|  >> cumulative_probability is too long :-(
|  >>
|  >> Do we REALLY need the cumulative here?
|  >>
|  >>      T probability(T x) const; // Cumulative Distribution 
|  Function or cdf or
|  >> P
|  > 
|  > Sorry, as attractive as it seems at first blush, I think just 
|  > "probability" is a very poor choice. ...
|  
|  <explanation about why and discussion about using intervals snipped>
|  
|  I definitely do not want to use the same function name for both the 
|  density function and the cumulative probability. Your point 
|  about people 
|  confusing the meaning of the density function is on the mark, and I 
|  think using the same function name will only exacerbate the 
|  confusion.
|  
|  Do I would still vote for:
|  
|  double density(double x) const;
|  
|  (Despite the origin of the word "density" from physics, it 
|  is definitely 
|  used by mathematicians, statisticans, and engineers to mean exactly 
|  this. And I agree that the word "distribution" is not a synonym for 
|  "density".)
|  
|  On the other hand, I like the idea of using an interval type for the 
|  "probability" function and requiring an explicit interval 
|  constructor 
|  when calling the function, like
|  
|  student_t dist(2.0);
|  double p = dist.probability(interval(-1.0, 2.0));
|  double q = dist.probability(interval(infinity, -1.0));
|  
|  To me, syntax like this just makes it easier for me to 
|  understand what's 
|  going on.
|  
|  And I agree that we shouldn't just use the Boost Interval library. I 
|  think we should define an interval class specific to the statistics 
|  library, where the left endpoint is allowed to be -infinity and the 
|  right endpoint +infinity.
|  
|  Then we get a syntax that is easy to read and understand, 
|  and we don't 
|  need to come up with a good name for the cumulative or complementary 
|  cumulative probability functions.

I've quickly knocked up a very rough sketch of how it might look like this
(attached a zip of a .cpp run on MSVC 8.0)

I'm sure you can suggest improvements to this.

Seeing it used makes my still quite like a single function name
'probability' (with 1 parameter for pdf and two for cdf(s)) but I am willing
to be out-voted.  Neat but riskier.

I also attached a response from Daniel Egloff making a similar, but more
advanced proposal.

(as John notes, the downside with a class is difficulty of extension).

However, I am just about to go on holiday for two weeks, so I will leave you
all to discuss further, and hope you've got everything sorted out and an
example code written by the time I get back ;-))

Thanks

Paul

---
Paul A Bristow
Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB
+44 1539561830 & SMS, Mobile +44 7714 330204 & SMS
pbristow@hetp.u-net.com