[math/distributions] Problem using inv-chi**2 and quantiles
I'm trying to duplicate Excel's CHIINV to calculate confidence limits for an observed number of events. I think I'm nearly there, largely as a result of trial and error, but the boost results still significantly differ from the Excel results, so I've presumably messed up somewhere. Any advice will be *greatly* appreciated. The Excel formulas I'm using are: lo limit = CHIINV((1+$C$3)/2, 2*B6)/2 hi limit = CHIINV((1-$C$3)/2, 2*(B6+1))/2 where $C$3 is the confidence level required (95%), and B6 is the number of events observed. In Excel, 'CHIINV' is documented as:
CHIINV(probability,degrees_freedom) Probability is a probability associated with the chi-squared distribution. Degrees_freedom is the number of degrees of freedom.
For 17 events, for example, Excel is producing limits of [9.90, 27.22]. I've attached my complete boost test program below, which I hope does the same thing, but there was a large amount of guesswork here. This program produces results of [10.67, 25.98] for 17 events. Can anyone tell me what I'm doing wrong? Thanks - Paul
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Paul Johnson Sent: Tuesday, June 14, 2011 1:39 PM To: boost-users@lists.boost.org Subject: [Boost-users] [math/distributions] Problem using inv-chi**2 and quantiles
I'm trying to duplicate Excel's CHIINV to calculate confidence limits for an observed number of events. I think I'm nearly there, largely as a result of trial and error, but the boost results still significantly differ from the Excel results, so I've presumably messed up somewhere. Any advice will be *greatly* appreciated.
The Excel formulas I'm using are:
lo limit = CHIINV((1+$C$3)/2, 2*B6)/2 hi limit = CHIINV((1-$C$3)/2, 2*(B6+1))/2
where $C$3 is the confidence level required (95%), and B6 is the number of events observed. In Excel, 'CHIINV' is documented as:
CHIINV(probability,degrees_freedom) Probability is a probability associated with the chi-squared distribution. Degrees_freedom is the number of degrees of freedom.
For 17 events, for example, Excel is producing limits of [9.90, 27.22].
I've attached my complete boost test program below, which I hope does the same thing, but there was a large amount of guesswork here. This program produces results of [10.67, 25.98] for 17 events.
Can anyone tell me what I'm doing wrong?
Well, using EXCEL for a start ;-) No - seriously, I can't see at a glance why they differ, and since this distribution has been added recently, a mistake as always possible. I'll look more closely as soon as I can, but it won't be today (but at least you know you are on my TODO list, and can mail me privately). Paul PS As a very quick idea - Is the /2 in (1+$C$3)/2 the cause of the difference? --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
On 14/06/2011 14:51, Paul A. Bristow wrote:
Can anyone tell me what I'm doing wrong?
Well, using EXCEL for a start ;-)
Well, that's why I'm here :)
No - seriously, I can't see at a glance why they differ, and since this distribution has been added recently, a mistake as always possible.
I'll look more closely as soon as I can, but it won't be today (but at least you know you are on my TODO list, and can mail me privately).
Excellent - thanks very much.
PS As a very quick idea - Is the /2 in (1+$C$3)/2 the cause of the difference?
Excel can't calculate the lower limit if I take out the /2. However, if I take out the /2 in the upper limit ((1-$C$3)/2 => (1-$C$3)) then it produces an answer which is closer to the boost answer. For 17 events, Excel now produces 25.50, compared to boost's 25.98. -Paul
I'm trying to duplicate Excel's CHIINV to calculate confidence limits for an observed number of events. I think I'm nearly there, largely as a result of trial and error, but the boost results still significantly differ from the Excel results, so I've presumably messed up somewhere. Any advice will be *greatly* appreciated.
The Excel formulas I'm using are:
lo limit = CHIINV((1+$C$3)/2, 2*B6)/2 hi limit = CHIINV((1-$C$3)/2, 2*(B6+1))/2
where $C$3 is the confidence level required (95%), and B6 is the number of events observed. In Excel, 'CHIINV' is documented as:
CHIINV(probability,degrees_freedom) Probability is a probability associated with the chi-squared distribution. Degrees_freedom is the number of degrees of freedom.
For 17 events, for example, Excel is producing limits of [9.90, 27.22].
I've attached my complete boost test program below, which I hope does the same thing, but there was a large amount of guesswork here. This program produces results of [10.67, 25.98] for 17 events.
Can anyone tell me what I'm doing wrong?
quantile(boost::math::chi_squared(2*17), (0.05)/2) / 2 and quantile(boost::math::chi_squared(2*17), (0.95)/2) / 2 should reproduce your EXCEL results, but they're not formula's I recognize for chi squared confidence limits, so I'll leave it for you to decide if this is what you actually want ;-) HTH, John.
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Paul Johnson Sent: Tuesday, June 14, 2011 1:39 PM To: boost-users@lists.boost.org Subject: [Boost-users] [math/distributions] Problem using inv-chi**2 and quantiles
I'm trying to duplicate Excel's CHIINV to calculate confidence limits for an observed number of events. I think I'm nearly there, largely as a result of trial and error, but the boost results still significantly differ from the Excel results, so I've presumably messed up somewhere. Any advice will be *greatly* appreciated.
I've checked and for the formula John suggests Boost.Math and Excel agree - see below and attached. Excel 2007 has new versions claimed to be more accurate - see below, but I don't think this is the trouble. I think there is confusion about the inverse of the chi squared - quantile in the new terminology, and the inverse_chi_squared_distribution - a completely different distribution. (Boost.Math and R agree in tests for both these distributions). quantile(boost::math::chi_squared(2*17), (0.05)/2) / 2 = (34, 0.025) = 9.90313 =CHISQ.INV(0.025,34)/2 =9.90313 and quantile(boost::math::chi_squared(2*17), 1 - (0.05)/2) / 2 = (34, 0. 925) = 25.9829976 =CHISQ.INV(0.975,34)/2 = 25.9829976 quantile(boost::math::chi_squared(2*(17+1)), 1 - (0.05)/2) / 2 = (36, 0. 925) = =CHISQ.INV(0.975,36)/2 = 27.2186 =CHISQ.INV(0.975,36)/2 = 27.21864682 I'm also not sure what you really want here, but I suspect it isn't the inverse_chi_squared_distribution, but I'm not expert enough to know if it is chi_squared either ;-) HTH - a bit! Paul -- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
The Excel formulas I'm using are:
lo limit = CHIINV((1+$C$3)/2, 2*B6)/2) == chiinv (0.05/2, 2 * 17) = chiinv(0.025, 34) = 19.80625294 hi limit = CHIINV((1-$C$3)/2, 2*(B6+1))/2) = chiinv(0.975, 36) = > where $C$3 is the confidence level required (95%), and B6 is the number of events observed. In Excel, 'CHIINV' is documented as:
CHIINV(probability,degrees_freedom) Probability is a probability associated with the chi-squared distribution. Degrees_freedom is the number of degrees of freedom.
For 17 events, for example, Excel is producing limits of [9.90, 27.22].
I've attached my complete boost test program below, which I hope does the same thing, but there was a large amount of guesswork here. This program produces results of [10.67, 25.98] for 17 events.
CHIINV function Returns the inverse of the right-tailed probability of the chi-squared distribution. If probability = CHIDIST(x,...), then CHIINV(probability,...) = x. Use this function to compare observed results with expected ones in order to decide whether your original hypothesis is valid. Important This function has been replaced with one or more new functions that may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using the new functions from now on, because this function may not be available in future versions of Excel. For more information about the new functions, see CHISQ.INV function and CHISQ.INV.RT function. Syntax CHIINV(probability,deg_freedom)The CHIINV function syntax has the following arguments (argument: A value that provides information to an action, an event, a method, a property, a function, or a procedure.): Probability Required. A probability associated with the chi-squared distribution. Deg_freedom Required. The number of degrees of freedom. CHISQ.INV function Returns the inverse of the left-tailed probability of the chi-squared distribution. The chi-squared distribution is commonly used to study variation in the percentage of something across samples, such as the fraction of the day people spend watching television. Syntax CHISQ.INV(probability,deg_freedom)The CHISQ.INV function syntax has the following arguments (argument: A value that provides information to an action, an event, a method, a property, a function, or a procedure.): Probability Required. A probability associated with the chi-squared distribution. Deg_freedom Required. The number of degrees of freedom. CHISQ.INV.RT function Returns the inverse of the right-tailed probability of the chi-squared distribution. If probability = CHISQ.DIST.RT(x,...), then CHISQ.INV.RT(probability,...) = x. Use this function to compare observed results with expected ones in order to decide whether your original hypothesis is valid. Syntax CHISQ.INV.RT(probability,deg_freedom)The CHISQ.INV.RT function syntax has the following arguments (argument: A value that provides information to an action, an event, a method, a property, a function, or a procedure.): Probability Required. A probability associated with the chi-squared distribution. Deg_freedom Required. The number of degrees of freedom.
Excellent - thanks very much to both of you. I've confirmed that the final answer of lo = quantile(boost::math::chi_squared(2*events), 0.025) / 2; hi = quantile(boost::math::chi_squared(2*(events+1)), 0.975) / 2; does the job. As to what it's used for, the Excel formulas are commonly used in public health to calculate Poisson confidence levels, using a transformation between Poisson and chi-squared (attributed to Ulm, 1990). I've also noticed that at least one of the websites that provides an online Poisson limits calculator uses the chi-squared equivalent. There's a short summary at http://www.nwpho.org.uk/sadb/Poisson%20CI%20in%20spreadsheets.pdf, which also points out that Excel's CHIINV is/was buggy, which seems to have been well-known for years. Presumably the name change in Excel 2007 is because they have a new implementation. -Paul
participants (3)
-
John Maddock
-
Paul A. Bristow
-
Paul Johnson