Re: [boost] [Math Statistical Distributions] Hypergeometricdistribution

6 May 2008

      Johan Råde wrote:
...
There is one difficulty with the two-sided Fisher exact test.
To calculate a p-value for the left-sided test,
you take the cdf of the hypergeometric distribution
for the observed value of the test statistica.
I you want to calculate a p-value for the right-sided test,
then you just take cdf complement for the observed value of the test
statistica.
But for a two-sided test, well, say that the observed value
of the test statistica is n. Then you should sum the pdf over all k
such that pdf(k) <= pdf(n). This means summing over both tails.
And since the distribution is not symmetric,
you can not just sum over one tail and multiply by 2,
as you do with the 2-sided t-test.
I don't see how to do that in a clean way using the current
statistical distributions API. (Am I missing something?)
This is true of all asymmetric distributions of course, you need to add the 
two tails calculated separately:

cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n)

Ah... wait, because it's discrete, that misses out one value from the right 
tail? So should be:

cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n - 1) ???
...
Maybe some extension to the statistical distributions API is needed.
Something like cdf(symmetric(dist,x)) for the sum/integral of
pdf(dist,y)
over all y such that pdf(dist,y) <= pdf(dist,x).
Hmm, that's a slightly different quantity: I'm not especially familiar with 
Fisher's exact test, but from what I've seen there appear to be differences 
of opinion on how two sided tests are calculated?  For the second "side" you 
want to sum the probabilities of all the contingency tables that are "at 
least as extreme" as the one you observed but in the other direction.  One 
way as I've suggested above is to sum all the tables that are as 
*asymmetric* as the one you observe, yours is to sum all the tables with 
lower or equal *probablity*.

Are you certain you require the latter?  I'm sure you are... just double 
checking :-)

I don't see any easy way of doing this, except by brute force - or maybe 
doing a numeric inversion on the PDF to find the correct right tail test 
statistic value, and then using the CDF's as above?

John.

Re: [boost] [Math Statistical Distributions] Hypergeometricdistribution

John Maddock