[random] poisson_distribution problem and fix

The poisson_distribution code will fail for any mean larger than 750 or so (on my x86_64 machine). The problem is a comparison with exp(-_mean) which evaluates as zero right around there. The easiest fix is to operate with logs instead of exponents. I changed the operator() code to be something like this: RealType product = RealType(0); for(result_type m = 0; ; ++m) { product += log(eng()); if(product <= -_mean) return m; } This also makes it possible to get rid of the init() function and the _exp_mean private member variable. A far better fix would be to use an algorithm other than Knuth's for this. Knuth's algorithm is simple, but O(n). Here is some code I wrote, converting some old Fortran code that a friend had got from Numerical Recipes once upon a time. It works, but would need actual testing before use in production code: #ifndef F_POISSON_HEADER #define F_POISSON_HEADER #include <cmath> namespace fpoisson { double rand_zero_one() { double r = 0; do { r = ((double)std::rand() / ((double)(RAND_MAX)+(double)(1))); } while (r==0.0); return r; } double gammln(const double xx) { const double cof[6] = { 76.18009173, -86.50532033, 24.01409822, -1.231739516,0.12085003e-2,-0.536382e-5}; const double stp = 2.50662827465; double x = xx - 1; double tmp = x + 5.5; tmp = (x+0.5)*log(tmp)-tmp; double ser = 1; for(int j=0;j<6;++j) { x+=1; ser+=cof[j]/x; } return tmp+std::log(stp*ser); } double prand(const int xm) { const double pi=3.141592654; double em; double t; if(xm<12) { const double g = std::exp(-xm); em = -1.0; double t = 1.0; do { em+=1; t=t*rand_zero_one(); } while (t>g); return em; } do { startloop: const double sq = sqrt(2.0*xm); const double alxm = log(xm); const double g = xm*alxm-gammln(xm+1); const double y = tan(pi*rand_zero_one()); em = sq*y+xm; if(em < 0) goto startloop; em=static_cast<int>(em); t=0.9*(1.0+pow(y,2))*exp(em*alxm-gammln(em+1)-g); } while(rand_zero_one()>t); return em; } } #endif Joel Eidsath

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Joel Eidsath Sent: 22 December 2007 23:02 To: boost@lists.boost.org Subject: [boost] [random] poisson_distribution problem and fix
The poisson_distribution code will fail for any mean larger than 750 or so (on my x86_64 machine). The problem is a comparison with exp(-_mean) which evaluates as zero right around there.
A far better fix would be to use an algorithm other than Knuth's for this. Knuth's algorithm is simple, but O(n). Here is some code I wrote, converting some old Fortran code that a friend had got from Numerical Recipes once upon a time. It works, but would need actual testing before use in production code:
#ifndef F_POISSON_HEADER #define F_POISSON_HEADER
<snipped>
#endif
This looks too obviously lifted from NR in C (or FORTRAN) to fit the Boost license. But the algorithm should still be usable? (But I am not qualified to judge). ln gamma (and poisson) is available from the Math Toolkit, but this is a random requirement and you may value speed more than accuracy? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Paul A. Bristow wrote:
This looks too obviously lifted from NR in C (or FORTRAN) to fit the Boost license.
Yes, as I mentioned, it's from a NR FORTRAN recipe. I don't know if the conversion from FORTRAN to C++ makes it a non-derivative work. The code certainly wouldn't work in poisson_distribution.hpp as is.
But the algorithm should still be usable? (But I am not qualified to judge).
If not, then at least the fix (in the original part of my post) should be used. Right now, using [random] poisson_distribution is slow and dangerous for any mean over 750. If it is changed to use ln instead of exp, it will only be slow. That is completely unrelated to the NR code later on in my post, which is a (serious) speed optimization.
ln gamma (and poisson) is available from the Math Toolkit, but this is a random requirement and you may value speed more than accuracy?
I don't see any way to use the Math Toolkit poisson distribution code to actually generate poisson random variables. As a side note, the pdf function in Math Toolkit may have a possible problem with exp(-mean) where k is smaller than max_factorial and mean is larger than ~750. One fix would simply be to use the same code for k>max_factorial in that case as well. Joel Eidsath

Joel Eidsath wrote:
Paul A. Bristow wrote:
This looks too obviously lifted from NR in C (or FORTRAN) to fit the Boost license.
Yes, as I mentioned, it's from a NR FORTRAN recipe. I don't know if the conversion from FORTRAN to C++ makes it a non-derivative work. The code certainly wouldn't work in poisson_distribution.hpp as is.
A direct or mechanical conversion almost certainly does make it a derived work of the NR original: the best way is to go back to the original reference if possible and work from there. One recent reference is: Ahrens, J.H. and Dieter, U. (1982). Computer generation of Poisson deviates from modified normal distributions. ACM Trans. Math. Software 8, 163-179.
But the algorithm should still be usable? (But I am not qualified to judge).
If not, then at least the fix (in the original part of my post) should be used.
Right now, using [random] poisson_distribution is slow and dangerous for any mean over 750. If it is changed to use ln instead of exp, it will only be slow. That is completely unrelated to the NR code later on in my post, which is a (serious) speed optimization.
ln gamma (and poisson) is available from the Math Toolkit, but this is a random requirement and you may value speed more than accuracy?
I don't see any way to use the Math Toolkit poisson distribution code to actually generate poisson random variables.
You *could* apply the poisson quantile to a uniformly distributed variable to obtain a poisson one, but I wouldn't honestly recommend it for that purpose :-/ I'll look into the Poisson PDF issue, John.

John Maddock wrote:
Joel Eidsath wrote:
Paul A. Bristow wrote:
This looks too obviously lifted from NR in C (or FORTRAN) to fit the Boost license.
Yes, as I mentioned, it's from a NR FORTRAN recipe. I don't know if the conversion from FORTRAN to C++ makes it a non-derivative work. The code certainly wouldn't work in poisson_distribution.hpp as is.
A direct or mechanical conversion almost certainly does make it a derived work of the NR original: the best way is to go back to the original reference if possible and work from there. One recent reference is:
Ahrens, J.H. and Dieter, U. (1982). Computer generation of Poisson deviates from modified normal distributions. ACM Trans. Math. Software 8, 163-179.
But the algorithm should still be usable? (But I am not qualified to judge).
If not, then at least the fix (in the original part of my post) should be used.
Right now, using [random] poisson_distribution is slow and dangerous for any mean over 750. If it is changed to use ln instead of exp, it will only be slow. That is completely unrelated to the NR code later on in my post, which is a (serious) speed optimization.
ln gamma (and poisson) is available from the Math Toolkit, but this is a random requirement and you may value speed more than accuracy?
I don't see any way to use the Math Toolkit poisson distribution code to actually generate poisson random variables.
You *could* apply the poisson quantile to a uniformly distributed variable to obtain a poisson one, but I wouldn't honestly recommend it for that purpose :-/
I'll look into the Poisson PDF issue,
You may find this useful: http://statistik.wu-wien.ac.at/unuran

Neal Becker wrote:
You may find this useful: http://statistik.wu-wien.ac.at/unuran
Many thanks for all the useful references folks have dug up for this: I've filed a ticket on this along with all the info we have so far, please add any further information you may have to http://svn.boost.org/trac/boost/ticket/1540. Thanks, John.

Anyone attempting to generate non-uniform random numbers should consult "Non-Uniform Random Variate Generation" by Luc Devroye, which is available in full, for free use at: http://cg.scs.carleton.ca/~luc/rnbookindex.html Poisson generation is covered, specifically, in chapter 10: http://cg.scs.carleton.ca/~luc/chapter_ten.pdf starting on page 501. How far off is the Poisson distribution from the normal at a mean of 750? How large a sample would you need to have, say, a one in a thousand chance of detecting the difference? (I don't know the answer, but if there is a problem at that extreme, its a question that should be answered before getting too fancy with "precise" solutions). Topher At 07:08 AM 12/24/2007, you wrote:
-----Original Message----- Sent: 22 December 2007 23:02 To: boost@lists.boost.org Subject: [boost] [random] poisson_distribution problem and fix
The poisson_distribution code will fail for any mean larger than 750 or so (on my x86_64 machine). The problem is a comparison with exp(-_mean) which evaluates as zero right around there.
A far better fix would be to use an algorithm other than Knuth's for this. Knuth's algorithm is simple, but O(n). Here is some code I wrote, converting some old Fortran code that a friend had got from Numerical Recipes once upon a time. It works, but would need actual testing before use in production code:
#ifndef F_POISSON_HEADER #define F_POISSON_HEADER
<snipped>
#endif
This looks too obviously lifted from NR in C (or FORTRAN) to fit the Boost license.
. . .
Paul
--- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Well, sure. But the [random] poisson_distribution function doesn't return the normal at high means. It returns 0 (slowly!). That's just wrong. It should either be fixed or throw an exception. And as I posted above, here's a fix: RealType product = RealType(0); for(result_type m = 0; ; ++m) { product += log(eng()); if(product <= -_mean) return m; } That's not NR code, that's just my having gotten rid of the exponentials in Knuth's algorithm by using logarithms. It actually simplifies things, as you can see. We can get rid of the init function and _exp_mean member variable in the code. Again, this still uses Knuth's algorithm, so it's tremendously slow. There is a rejection method algorithm on page 511 of the chapter_ten.pdf that you linked. Joel Eidsath On Dec 26, 2007 7:50 AM, Topher Cooper <topher@topherc.net> wrote:
Anyone attempting to generate non-uniform random numbers should consult "Non-Uniform Random Variate Generation" by Luc Devroye, which is available in full, for free use at:
http://cg.scs.carleton.ca/~luc/rnbookindex.html
Poisson generation is covered, specifically, in chapter 10:
http://cg.scs.carleton.ca/~luc/chapter_ten.pdf
starting on page 501.
How far off is the Poisson distribution from the normal at a mean of 750? How large a sample would you need to have, say, a one in a thousand chance of detecting the difference? (I don't know the answer, but if there is a problem at that extreme, its a question that should be answered before getting too fancy with "precise" solutions).
Topher
participants (5)
-
Joel Eidsath
-
John Maddock
-
Neal Becker
-
Paul A Bristow
-
Topher Cooper