Beta Distribution for boost::random

Jeremy Bruestle

11 Dec 2007 11 Dec '07

4:19 a.m.

Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost? The method is based on the paper "Generating Beta Variates with Nonintegral Shape Parameters" ( http://portal.acm.org/citation.cfm?id=359460.359482) by R. C. H. Cheng, which was the best performing fully general purpose method I could find. It uses a reject method whose success rate varies based on the parameters, but whose worst case expected number of samples is 4. If anyone knows of a better method please let me know. -Jeremy Bruestle

Show replies by date

Jens Seidel

11 Dec 11 Dec

7:42 a.m.

On Mon, Dec 10, 2007 at 08:19:26PM -0800, Jeremy Bruestle wrote:

...

Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost? The method is based on the paper "Generating Beta Variates with Nonintegral Shape Parameters" ( http://portal.acm.org/citation.cfm?id=359460.359482) by R. C. H. Cheng, which was the best performing fully general purpose method I could find. It

Mmh, the article contains: General permission to make fair use in teaching or research of all or part of this material is granted to individual readers and to nonprofit libraries acting for them provided that A C M ' s copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Implementing an algorithm from the article is "fair use". But is Boost a nonprofit library? Consider that it can be used (also in parts) in commercial products! Jens

Paul A Bristow

9:27 a.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jens Seidel Sent: 11 December 2007 07:43 To: boost@lists.boost.org Subject: Re: [boost] Beta Distribution for boost::random

On Mon, Dec 10, 2007 at 08:19:26PM -0800, Jeremy Bruestle wrote:

...
Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost? The method is based on the paper "Generating Beta Variates with Nonintegral Shape Parameters" ( http://portal.acm.org/citation.cfm?id=359460.359482) by R. C. H. Cheng, which was the best performing fully general purpose method I could find. It

Mmh, the article contains:

General permission to make fair use in teaching or research of all or part of this material is granted to individual readers and to nonprofit libraries acting for them provided that A C M ' s copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery.

Implementing an algorithm from the article is "fair use". But is Boost a nonprofit library? Consider that it can be used (also in parts) in commercial products!

FWIW, my opinion is that this is OK - no different from lots of similar use in Boost algorithms. The original can't be in C++, so it isn't a straight copy. Patents are another can of worms... But Boost has never claimed to take any responsibility for attack by 'patent trolls' - though authors and reviewers will avoid knowingly using stuff that carries a risk of patent problems. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

John Maddock

9:33 a.m.

Jeremy Bruestle wrote:

...

Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost?

Unfortunately the author of that library isn't around much at present, so I would suggest: * Post a copy in the vault (http://www.boost-consulting.com/vault/) so that folks can access it. * Post a Track feature request at svn.boost.org. If you have worked up docs and test cases that are all ready to be merged into the source, then you could also ask for a formal review (not sure if this qualifies for a "fast track" review, but it probably should do for a small addition like this). If accepted you would then be responsible for making all the necessary changes to the SVN codebase and maintaining the beta random-distribution. HTH, John.

Andrew Sutton

1:31 p.m.

...

Jeremy Bruestle wrote:

...
Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost?

Interesting. I took a different approach using a couple of gamma distributions. Ironically, I'm also using a rejection based technique from Cheng in 1977. Go figure. I also got bored and wrote a bunch of other random number generators and fixed some of the ones that were a little broken in Boost (Gamma and Lognormal). The results are in my own little Boost testing ground: http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/random I've also been slowly adding support for some of the missing Boost.Math libraries here: http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ math/distributions The gumbel_distribution is a renamed version of the extreme_value_distribution. I renamed since (it looks like) there are three types of extreme value distributions (weibull, gumbel, and frechet), and also a generalized extreme value distribution. I wanted to rename fisher_f to just fisher also, but didn't get around to it. I also wrote a histogramming program to help generate plots of random numbers (using GNUPlot, for example), and comparing some basic stats against their theoretical equivalents - you know, just to make sure the random numbers look right. It's a great way of catching

...

If you have worked up docs and test cases that are all ready to be merged into the source, then you could also ask for a formal review (not sure if this qualifies for a "fast track" review, but it probably should do for a small addition like this). If accepted you would then be responsible for making all the necessary changes to the SVN codebase and maintaining the beta random-distribution.

I was waiting to finish out the Math distributions before dumping these wholesale into the sandbox. I think it would probably be a good idea to compare the implementations, figure out which is "better" - i'd probably lean towards the other since it (hopefully) doesn't depend on other variates. Incidentally, the term "library" is meant to indicate the place where you go to get books as opposed to, say, Boost. They're not placing any real restrictions on implementations. The article needs to be cited in the code or in the documentation (preferably both). Andrew Sutton asutton@cs.kent.edu

Jeremy Bruestle

7:25 p.m.

Sounds like you've got a number of good things brewing. I wrote up the beta distribution mostly out of necessity for a machine learning project I'm working on (I needed a fast NVIDIA GPU implementation, so I figured I might as well write the boost c++ one too, since I was using boost for all the other psudorandom numbers already). I was wondering if you would be interested in taking the code and bundling it with the other distributions you've already got, as it sounds like your testing capabilities are better. As far as testing, I've only done the basics, but things seem alright, with the possible concern that the range is 0 <= x <= 1 as opposed to 0 < x < 1, due to limited precision of floating point, esp when for example a is near 0 and b is very large. It's not a problem for my application, but perhaps someone who knows the IEEE floating point rules better than I could suggest a fix. At any rate, I'll email you a copy. -Jeremy On Dec 11, 2007 5:31 AM, Andrew Sutton <asutton@cs.kent.edu> wrote:

...

...
Jeremy Bruestle wrote:

...
Hello, I have implemented support for the beta distribution for boost::random. I was wondering who I should send it to for examination and possible acceptance into boost?

Interesting. I took a different approach using a couple of gamma distributions. Ironically, I'm also using a rejection based technique from Cheng in 1977. Go figure. I also got bored and wrote a bunch of other random number generators and fixed some of the ones that were a little broken in Boost (Gamma and Lognormal). The results are in my own little Boost testing ground:

http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/random

I've also been slowly adding support for some of the missing Boost.Math libraries here:

http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ math/distributions

The gumbel_distribution is a renamed version of the extreme_value_distribution. I renamed since (it looks like) there are three types of extreme value distributions (weibull, gumbel, and frechet), and also a generalized extreme value distribution.

I wanted to rename fisher_f to just fisher also, but didn't get around to it.

I also wrote a histogramming program to help generate plots of random numbers (using GNUPlot, for example), and comparing some basic stats against their theoretical equivalents - you know, just to make sure the random numbers look right. It's a great way of catching

...
If you have worked up docs and test cases that are all ready to be merged into the source, then you could also ask for a formal review (not sure if this qualifies for a "fast track" review, but it probably should do for a small addition like this). If accepted you would then be responsible for making all the necessary changes to the SVN codebase and maintaining the beta random-distribution.

I was waiting to finish out the Math distributions before dumping these wholesale into the sandbox. I think it would probably be a good idea to compare the implementations, figure out which is "better" - i'd probably lean towards the other since it (hopefully) doesn't depend on other variates.

Incidentally, the term "library" is meant to indicate the place where you go to get books as opposed to, say, Boost. They're not placing any real restrictions on implementations. The article needs to be cited in the code or in the documentation (preferably both).

Andrew Sutton asutton@cs.kent.edu _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Martin Bonner

12 Dec 12 Dec

10:38 a.m.

From: Jeremy Bruestle

...

As far as testing, I've only done the basics, but things seem alright, with the possible concern that the range is 0 <= x <= 1 as opposed to 0 < x < 1, due to limited precision of floating point, esp when for example a is near 0 and b is very large. It's not a problem for my application, but perhaps someone who knows the IEEE floating point rules better than I could suggest a fix.

Well I don't know much about IEEE, but there's the brute force approach double result; do { result = clever_stuff(); } while (result != 0.0 && result != 1.0); return result; (Of course, that doesn't look so good when the parameters are such that result tends to be 0.0 a very large percentage of the time.) -- Martin Bonner Senior Software Engineer/Team Leader PI SHURLOK LTD Telephone: +44 1223 441434 / 203894 (direct) Fax: +44 1223 203999 Email: martin.bonner@pi-shurlok.com www.pi-shurlok.com disclaimer

John Maddock

1:39 p.m.

Andrew Sutton wrote:

...

I've also been slowly adding support for some of the missing Boost.Math libraries here:

http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ math/distributions

Cool.

...

The gumbel_distribution is a renamed version of the extreme_value_distribution. I renamed since (it looks like) there are three types of extreme value distributions (weibull, gumbel, and frechet), and also a generalized extreme value distribution.

This is tricky: the there are multiple Gumbel distributions too: but both Wikipedia and Mathworld use the general term "Gumbel Distribution" to refer to the *minimum* case of the extreme value type I distribution, where as the distribution currently implemented as "extreme_value_distribution" is the *maximal* case of the extreme value type I distribution, which is the same as the ExtremeValueDistribution in Mathematica: http://reference.wolfram.com/mathematica/ref/ExtremeValueDistribution.html I really wish this wasn't so confusing, but there appears to be no easy naming choice here.

...

I wanted to rename fisher_f to just fisher also, but didn't get around to it.

I wish this had come up in the review, we did think about renaming this at one point, but never got around to it :-( If anyone has strong opinions speak up now, as it'll be next to impossible to make that breaking change after 1.35 has shipped! Cheers, John.

Andrew Sutton

13 Dec 13 Dec

2:55 p.m.

...

This is tricky: the there are multiple Gumbel distributions too: but both Wikipedia and Mathworld use the general term "Gumbel Distribution" to refer to the *minimum* case of the extreme value type I distribution, where as the distribution currently implemented as "extreme_value_distribution" is the *maximal* case of the extreme value type I distribution, which is the same as the ExtremeValueDistribution in Mathematica: http://reference.wolfram.com/mathematica/ref/ ExtremeValueDistribution.html

I really wish this wasn't so confusing, but there appears to be no easy naming choice here.

So I went back and read a little more carefully. Let me see if I have this right... Gumbel, Weibull, and Fisher-Tippet are are all Extreme Value distributions. The Fisher-Tippet distribution corresponds to a maximum extreme value distribution, and Gumbel, a minimum. Re-reading the Wikipedia entry for it, they seem to use the distributions a little more interchangeably. As far as code goes, Boost.Math has the Fisher-Tippet variant (as extreme_value), but not a true Gumbel distribution, right? That means that I've effectively written a Fisher-Tippet variate, but not really a true Gumbel variate... I guess I'll just hack away at my new Gumbel distribution and rename the number generator.

...

...
I wanted to rename fisher_f to just fisher also, but didn't get around to it.

I wish this had come up in the review, we did think about renaming this at one point, but never got around to it :-(

If anyone has strong opinions speak up now, as it'll be next to impossible to make that breaking change after 1.35 has shipped!

Since many of the distributions have alternate names, it might be worth considering an aliasing scheme or something. Derivation might work. For example: template <...> struct fisher_tippet_distribution : public extreme_value_distribution<...> { }; Template typedefs would be better. Andrew Sutton asutton@cs.kent.edu

John Maddock

3:46 p.m.

Andrew Sutton wrote:

...

So I went back and read a little more carefully. Let me see if I have this right... Gumbel, Weibull, and Fisher-Tippet are are all Extreme Value distributions. The Fisher-Tippet distribution corresponds to a maximum extreme value distribution, and Gumbel, a minimum. Re-reading the Wikipedia entry for it, they seem to use the distributions a little more interchangeably.

I'm not sure that's correct still and different sources use different conventions. Mathworld seems to treat the terms Extreme Value and Fisher Tippet as interchangable (see http://mathworld.wolfram.com/ExtremeValueDistribution.html), but if someone refers colloquially to "The Extreme Value" or "The Fisher Tippet" or "The Log-Weibul" distribution then it probably means the maximum case of the type I extreme value dist. NIST uses "Gumbel Distribution" to refer to both the Extreme Value Type I distibutions (min ans max cases). Other sources refer to only the minimum case as "The Gumbel Distribution". Mathworld refers to these as "Gumbel types" http://mathworld.wolfram.com/GumbelDistribution.html. While mathematica uses Gumbel to refer to just the minimum case. So you can call it whatever you like and still be right :-) BTW I believe the min and max cases are basically just mirror images of each other about the location parameter?

...

As far as code goes, Boost.Math has the Fisher-Tippet variant (as extreme_value), but not a true Gumbel distribution, right? That means that I've effectively written a Fisher-Tippet variate, but not really a true Gumbel variate... I guess I'll just hack away at my new Gumbel distribution and rename the number generator.

Well it's a kind of Gumbel distribution, but not what folks normally call "The Gumbel Distribution": which is the minimum case. Hope that's now slightly clearer than the proverbial mud ;-) John.

Andrew Sutton

14 Dec 14 Dec

5:04 p.m.

...

So you can call it whatever you like and still be right :-)

BTW I believe the min and max cases are basically just mirror images of each other about the location parameter?

So names aside, it's all the extreme value distribution, but there are minimum and maximum variations of it. After doing more math than I've done in some time, I'll agree to that :) Actually, the min/max case are so similar that it might be worth considering generalizing the extreme value distribution to handle both cases. It comes down to adding a third parameter, c which is either 1 or -1. Naturally, 1 for max, -1 for min. c(x) = (c * (a - x)) / b pdf(x) = exp(c(x) - exp(cx)) if c == 1, then it's the maximum case. If c == -1, then it's the minimum because (a - x) becomes (x - a). Unfortunately, the cdf's are a little different... cdf(x) = { exp(-exp(c(x)), for c == 1 1 - exp(-exp(c(x)), for c== -1 } Which is kind of fun because the cdf for the minimum is the complement of the cdf for the maximum (and vice versa for respective complements). The characteristics (outside the mean) are all the same. The mean can be redefined given as: mean = a + c * b * euler I don't know what the effect on the quantile functions would be, since I'm not sure how to compute those. I spent the last hour rewriting the extreme_value_distribution to accept a sign argument, allowing it to act as either the min or max case of the distribution. It defaults to the max. I also figured out how to tweak the random number generator to generate numbers for both cases (also taking the additional parameter). The code is here and here: http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ math/distributions/extreme_value.hpp http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ random/extreme_value_distribution.hpp Thoughts? Andrew Sutton asutton@cs.kent.edu

John Maddock

16 Dec 16 Dec

10:02 a.m.

New subject: More distributions

Andrew Sutton wrote:

...

...
So you can call it whatever you like and still be right :-)

BTW I believe the min and max cases are basically just mirror images of each other about the location parameter?

So names aside, it's all the extreme value distribution, but there are minimum and maximum variations of it.

After doing more math than I've done in some time, I'll agree to that :)

:-)

...

Actually, the min/max case are so similar that it might be worth considering generalizing the extreme value distribution to handle both cases. It comes down to adding a third parameter, c which is either 1 or -1. Naturally, 1 for max, -1 for min.

I think this is along the right lines, but I don't much like the +-1 parameter: especially as it's a floating point type argument. How about using an enum here: enum extreme_value_type { type_1_maximum = 1, type_1_minimum = -1 }; Then the constructor looks like: extreme_value_distribution(RealType a = RealType(0), RealType b = RealType(1), extreme_value_type s = type_1_maximum) And the user can only pass valid values for the distribution type. We also leave the door open for other extreme value types should the need arise.

...

c(x) = (c * (a - x)) / b pdf(x) = exp(c(x) - exp(cx))

if c == 1, then it's the maximum case. If c == -1, then it's the minimum because (a - x) becomes (x - a). Unfortunately, the cdf's are a little different...

cdf(x) = { exp(-exp(c(x)), for c == 1 1 - exp(-exp(c(x)), for c== -1 }

Which is kind of fun because the cdf for the minimum is the complement of the cdf for the maximum (and vice versa for respective complements). The characteristics (outside the mean) are all the same. The mean can be redefined given as:

mean = a + c * b * euler

I don't know what the effect on the quantile functions would be, since I'm not sure how to compute those.

Just reverse the CDF: Let c = sign == 1 ? log(-log(p)) ? log(-boost::math::log1p(-q, Policy())); Then x = a - (sign/b)* c; I hope :-) All these need tests and sanity checks: preferably some sanity checks from an independent source if we can find one. Mathematica would be a good choice, but I don't have access to that here :-(

...

I spent the last hour rewriting the extreme_value_distribution to accept a sign argument, allowing it to act as either the min or max case of the distribution. It defaults to the max. I also figured out how to tweak the random number generator to generate numbers for both cases (also taking the additional parameter). The code is here and here:

http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ math/distributions/extreme_value.hpp http://warhol.sdml.cs.kent.edu/trac/miniboost/browser/trunk/boost/ random/extreme_value_distribution.hpp

Thoughts?

Looks good so far. Some further comments: Median looks like it needs adjusting to reflect the "sign". Skewness for min case is presumably the negative of the max case (needs checking)? Many thanks for taking this on, I notice you've started on the geometric distribution as well, that's also on my list of TODO's that I've just added to the end of here: http://svn.boost.org/trac/boost/browser/sandbox/math_toolkit/libs/math/doc/s... so more power to your elbow I say :-) BTW the sandbox/math_toolkit directory is open for breaking changes again now (Boost.Math in 1.35 is effectively frozen bar trivial fixes), so when you have these worked up feel free to commit to the sandbox. Also let me know if you get stumped for test cases. Cheers, John.

Andrew Sutton

2:39 p.m.

New subject: More distributions

...

I think this is along the right lines, but I don't much like the +-1 parameter: especially as it's a floating point type argument. How about using an enum here:

enum extreme_value_type { type_1_maximum = 1, type_1_minimum = -1 };

I had actually considered doing that, but then started feeling a little ambiguous about it since I would have been using the enumerated value in a mathematical context. But, it does make a lot more sense.

...

All these need tests and sanity checks: preferably some sanity checks from an independent source if we can find one. Mathematica would be a good choice, but I don't have access to that here :-(

I don't think my department has an installation of Mathematica, but I do know a couple of math grads. I'll see what I can come up with.

...

Median looks like it needs adjusting to reflect the "sign". Skewness for min case is presumably the negative of the max case (needs checking)?

Yup... I overlooked a negative sign from the MathWorld site.

...

BTW the sandbox/math_toolkit directory is open for breaking changes again now (Boost.Math in 1.35 is effectively frozen bar trivial fixes), so when you have these worked up feel free to commit to the sandbox. Also let me know if you get stumped for test cases.

Can do. I'm mostly waiting on my student to finish off the functions for the skew normal. After that, I'll push those distributions up to the sandbox along with the random numbers generators. On a side note, I think it might be interesting to provide a generator () function for distributions that could return random variates. Andrew Sutton asutton@cs.kent.edu

Andrew Sutton

8:18 p.m.

New subject: More distributions

...

...
...
So you can call it whatever you like and still be right :-)

BTW I believe the min and max cases are basically just mirror images of each other about the location parameter?

So names aside, it's all the extreme value distribution, but there are minimum and maximum variations of it.

I found this... It was a nice overview of some of the distribution. Section 1.2 has a nice overview of the type 1, 2 and 3 distributions and how they relate (and how they're related to Weibull, for example). I wonder if I can find this book in our math library... http://www.worldscibooks.com/mathematics/p191.html Andrew Sutton asutton@cs.kent.edu

Paul A Bristow

17 Dec 17 Dec

12:58 p.m.

New subject: More distributions

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Andrew Sutton Sent: 16 December 2007 20:18 To: boost@lists.boost.org Subject: Re: [boost] More distributions

...
...
...
So you can call it whatever you like and still be right :-)

BTW I believe the min and max cases are basically just mirror images of each other about the location parameter?

So names aside, it's all the extreme value distribution, but there are minimum and maximum variations of it.

I found this... It was a nice overview of some of the distribution. Section 1.2 has a nice overview of the type 1, 2 and 3 distributions and how they relate (and how they're related to Weibull, for example). I wonder if I can find this book in our math library...

http://www.worldscibooks.com/mathematics/p191.html

I don't have access to this, but I have K Krishnamoorthy who gives some info in Cha 25. I could send scans of the four pages diect if this would help but I suspect you know more already. He describes Type 1, II and III and gives formulae for type I cdf median ..., and his StatCalc does some calculations of parameters, so I could compute test values if you tell me what you want. Takes parameters a and b. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Andrew Sutton

2:41 p.m.

New subject: More distributions

...

I don't have access to this, but I have K Krishnamoorthy who gives some info in Cha 25. I could send scans of the four pages diect if this would help but I suspect you know more already.

I actually don't to have access to it either - it's not in our library system - but the first couple online sections were very informative :) Most of the information in it appears to available thru Wikipedia or MathWorld. I mainly posted the reference to make it visible in the archives if I get curious later. Andrew Sutton asutton@cs.kent.edu

Paul A Bristow

4:13 p.m.

New subject: More distributions

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Andrew Sutton Sent: 16 December 2007 20:18 To: boost@lists.boost.org Subject: Re: [boost] More distributions

I found this... It was a nice overview of some of the distribution. Section 1.2 has a nice overview of the type 1, 2 and 3 distributions and how they relate (and how they're related to Weibull, for example). I wonder if I can find this book in our math library...

http://www.worldscibooks.com/mathematics/p191.html

As you say, this is a nice overview and I begin to understand, so I've added this reference to the docs for the references, and the distributions too. However a fuller explanation will be useful when you have worked out the details of how one defines it in a C++ sense. Paul PS Also committed the change in support range (min to max) to avoid discontinutity in the curves at zero. Sorry I forgot to do this before. --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

6413

Age (days ago)

6419

Last active (days ago)

List overview

Download

16 comments

6 participants

participants (6)

Andrew Sutton
Jens Seidel
Jeremy Bruestle
John Maddock
Martin Bonner
Paul A Bristow