Random library, discrete distribution

newer
boost 1.40 beta libs, no longer...

older
Re: [Boost-users] Iterating over...

Kevin Martin

5 Apr 2009 5 Apr '09

5:11 p.m.

I made a post sometime last year about requiring a distribution that could generate enums with specified probabilities. I find I am regularly requiring a discrete distribution like this - where I specify the probabilities. Am I the only one who requires it, or would it be useful to a wider audience? I'm just wondering whether to submit a feature request for it or not? Thanks, Kevin Martin This is what I'm using at the moment: template<class IntType = int> class discrete_distribution { public: typedef boost::uniform_int<>::input_type input_type; typedef IntType result_type; template<size_t N> discrete_distribution(const boost::array<int, N> &outputs, const boost::array<int, N> &weights) : outputs_(outputs.begin(), outputs.end()), weights_(weights.begin(), weights.end()) { for(int i=1;i!=weights_.size();++i) weights_[i] += weights_[i-1]; } inline void reset() {} template<class Engine> result_type operator()(Engine& eng) { boost::uniform_int<> ui(0, weights_.back()-1); boost::variate_generator<Engine&, boost::uniform_int<> > vg(eng, ui); int idx = std::lower_bound(weights_.begin(), weights_.end(), vg()) - weights_.begin(); return outputs_[idx]; } private: std::vector<int> outputs_; std::vector<int> weights_; };

Show replies by date

er

6 Apr 6 Apr

12:15 a.m.

...

request for it or not?

A multinomial distribution? Sure, common enough that I would use it. I would use std::partial_sum and std::distance and return only idx. Perhaps have an option to sort the weights for greater performance. Thanks.

er

26 Apr 26 Apr

5:48 p.m.

New subject: reference wrapper for Random Distribution?

er wrote:

...

...
request for it or not?

A multinomial distribution? Sure, common enough that I would use it.

I am wondering if there already exists a reference wrapper for Random Distribution. This would help in the following case: typedef variate_generator<urng_t&,mult_dist_t> vg_t; mult_dist_t r(range_weight); //creates a vector with partial_sum std::generate(it,n,vg_t(urng,r)); because it is not unusual that size(range_weight) is big. Thanks.

er

19 Jun 19 Jun

2:52 a.m.

New subject: reference wrapper for Random Distribution?

...

I am wondering if there already exists a reference wrapper for Random Distribution. This would help in the following case:

All I needed was this (the bind part that is) ... std::generate_n( iter, n, boost::bind<result_type>( boost::ref(rg) ) );

er

16 Aug 16 Aug

5:55 a.m.

Kevin Martin wrote:

...

I made a post sometime last year about requiring a distribution that could generate enums with specified probabilities. I find I am regularly requiring a discrete distribution like this - where I specify the probabilities. Am I the only one who requires it, or would it be useful to a wider audience? I'm just wondering whether to submit a feature request for it or not?

Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/importance_sampling/ https://svn.boost.org/svn/boost/sandbox/statistics/importance_weights/ https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...

Kevin Martin

17 Aug 17 Aug

6:05 p.m.

On 16 Aug 2009, at 06:55, er wrote:

...

Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...

I think this is definitely needed in boost::random, and it seems to do much more error checking and dealing with large weights than my version does, so I will definitely change to using it once it makes it into a normal library release (and I can persuade my HPC admin to upgrade the library) I think you should rename it categorical distribution though as it only samples a single trial from the distribution. I believe the multinomial distribution gives the probability of vectors arising when n trials are performed. Thanks, Kevin Martin

Steven Watanabe

8:25 p.m.

AMDG Kevin Martin wrote:

...

On 16 Aug 2009, at 06:55, er wrote:

...
Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...

I think this is definitely needed in boost::random, and it seems to do much more error checking and dealing with large weights than my version does, so I will definitely change to using it once it makes it into a normal library release (and I can persuade my HPC admin to upgrade the library)

I think you should rename it categorical distribution though as it only samples a single trial from the distribution. I believe the multinomial distribution gives the probability of vectors arising when n trials are performed.

It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.) In Christ, Steven Watanabe

er

18 Aug 18 Aug

2:29 a.m.

Kevin Martin wrote:

...

...
I think you should rename it categorical distribution though as it

Thanks, done so.

...

...
only samples a single trial from the distribution. I believe the

It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.)

discrete does not sort the weights, so this step has to be carried out beforehand to get a comparable basis. I've added a small test file to compare categorical and discrete, where discrete is 10x faster than categorical at initialization and is equally fast at sampling. However, the weights that I work with usually come like this: w <- exp(lw+offset), where offset satisties sum{w}<inf, which depends on the ordering of w (machine precision). So sorting has to be carried out first, before finding offset...

er

2:48 a.m.

er wrote:

...

discrete does not sort the weights, so this step has to be carried out I meant categorical does not sort the weights

Steven Watanabe

2:55 a.m.

AMDG er wrote:

...

er wrote:

...
discrete does not sort the weights, so this step has to be carried out I meant categorical does not sort the weights

discrete doesn't sort the weights either if you notice. The initialization for discrete is more complex and is almost certainly slower, although it is still O(n) where n is the number of elements in the sequence of weights. I didn't try to optimize the initialization much, so it is quite likely to be several times slower than it could be. In Christ, Steven Watanabe

Steven Watanabe

2:52 a.m.

AMDG er wrote:

...

...
It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.)

discrete does not sort the weights, so this step has to be carried out beforehand to get a comparable basis. I've added a small test file to compare categorical and discrete, where discrete is 10x faster than categorical at initialization and is equally fast at sampling.

Your test isn't quite right. You're comparing the speed of categorical to itself for sampling. In Christ, Steven Watanabe

er

3:43 a.m.

Steven Watanabe wrote:

...

Your test isn't quite right. You're comparing the speed of categorical to itself for sampling.

Here's a correction: --initialize, n = 10000 categorical : t = 0.013616 discrete : t = 0.001571 --sample, m = 100000 categorical : t = 0.029282 discrete : t = 0.014483

5822

Age (days ago)

5957

Last active (days ago)

List overview

Download

11 comments

3 participants

participants (3)

er
Kevin Martin
Steven Watanabe