
Johan Råde wrote:
There is one difficulty with the two-sided Fisher exact test.
To calculate a p-value for the left-sided test, you take the cdf of the hypergeometric distribution for the observed value of the test statistica. I you want to calculate a p-value for the right-sided test, then you just take cdf complement for the observed value of the test statistica.
But for a two-sided test, well, say that the observed value of the test statistica is n. Then you should sum the pdf over all k such that pdf(k) <= pdf(n). This means summing over both tails. And since the distribution is not symmetric, you can not just sum over one tail and multiply by 2, as you do with the 2-sided t-test.
I don't see how to do that in a clean way using the current statistical distributions API. (Am I missing something?)
This is true of all asymmetric distributions of course, you need to add the two tails calculated separately: cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n) Ah... wait, because it's discrete, that misses out one value from the right tail? So should be: cdf(hypergeometric(), n) + cdf(hypergeometric(), total - n - 1) ???
Maybe some extension to the statistical distributions API is needed. Something like cdf(symmetric(dist,x)) for the sum/integral of pdf(dist,y) over all y such that pdf(dist,y) <= pdf(dist,x).
Hmm, that's a slightly different quantity: I'm not especially familiar with Fisher's exact test, but from what I've seen there appear to be differences of opinion on how two sided tests are calculated? For the second "side" you want to sum the probabilities of all the contingency tables that are "at least as extreme" as the one you observed but in the other direction. One way as I've suggested above is to sum all the tables that are as *asymmetric* as the one you observe, yours is to sum all the tables with lower or equal *probablity*. Are you certain you require the latter? I'm sure you are... just double checking :-) I don't see any easy way of doing this, except by brute force - or maybe doing a numeric inversion on the PDF to find the correct right tail test statistic value, and then using the CDF's as above? John.