Random library, discrete distribution
I made a post sometime last year about requiring a distribution that
could generate enums with specified probabilities. I find I am
regularly requiring a discrete distribution like this - where I
specify the probabilities. Am I the only one who requires it, or would
it be useful to a wider audience? I'm just wondering whether to submit
a feature request for it or not?
Thanks,
Kevin Martin
This is what I'm using at the moment:
template<class IntType = int>
class discrete_distribution
{
public:
typedef boost::uniform_int<>::input_type input_type;
typedef IntType result_type;
template
er wrote:
request for it or not?
A multinomial distribution? Sure, common enough that I would use it.
I am wondering if there already exists a reference wrapper for Random
Distribution. This would help in the following case:
typedef variate_generator
I am wondering if there already exists a reference wrapper for Random Distribution. This would help in the following case:
All I needed was this (the bind part that is) ...
std::generate_n(
iter,
n,
boost::bind
Kevin Martin wrote:
I made a post sometime last year about requiring a distribution that could generate enums with specified probabilities. I find I am regularly requiring a discrete distribution like this - where I specify the probabilities. Am I the only one who requires it, or would it be useful to a wider audience? I'm just wondering whether to submit a feature request for it or not?
Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/importance_sampling/ https://svn.boost.org/svn/boost/sandbox/statistics/importance_weights/ https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...
On 16 Aug 2009, at 06:55, er wrote:
Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...
I think this is definitely needed in boost::random, and it seems to do much more error checking and dealing with large weights than my version does, so I will definitely change to using it once it makes it into a normal library release (and I can persuade my HPC admin to upgrade the library) I think you should rename it categorical distribution though as it only samples a single trial from the distribution. I believe the multinomial distribution gives the probability of vectors arising when n trials are performed. Thanks, Kevin Martin
AMDG Kevin Martin wrote:
On 16 Aug 2009, at 06:55, er wrote:
Hi, here's something related: https://svn.boost.org/svn/boost/sandbox/statistics/random/boost/random/multi...
I think this is definitely needed in boost::random, and it seems to do much more error checking and dealing with large weights than my version does, so I will definitely change to using it once it makes it into a normal library release (and I can persuade my HPC admin to upgrade the library)
I think you should rename it categorical distribution though as it only samples a single trial from the distribution. I believe the multinomial distribution gives the probability of vectors arising when n trials are performed.
It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.) In Christ, Steven Watanabe
Kevin Martin wrote:
I think you should rename it categorical distribution though as it
Thanks, done so.
only samples a single trial from the distribution. I believe the
It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.)
discrete does not sort the weights, so this step has to be carried out
beforehand to get a comparable basis. I've added a small test file to
compare categorical and discrete, where discrete is 10x faster than
categorical at initialization and is equally fast at sampling.
However, the weights that I work with usually come like this:
w <- exp(lw+offset), where offset satisties sum{w}
AMDG er wrote:
er wrote:
discrete does not sort the weights, so this step has to be carried out I meant categorical does not sort the weights
discrete doesn't sort the weights either if you notice. The initialization for discrete is more complex and is almost certainly slower, although it is still O(n) where n is the number of elements in the sequence of weights. I didn't try to optimize the initialization much, so it is quite likely to be several times slower than it could be. In Christ, Steven Watanabe
AMDG er wrote:
It's called discrete_distribution in the new standard. Also, the alias algorithm is more efficient. See attached. (Note that I haven't tried to make this implementation numerically bulletproof.)
discrete does not sort the weights, so this step has to be carried out beforehand to get a comparable basis. I've added a small test file to compare categorical and discrete, where discrete is 10x faster than categorical at initialization and is equally fast at sampling.
Your test isn't quite right. You're comparing the speed of categorical to itself for sampling. In Christ, Steven Watanabe
Steven Watanabe wrote:
Your test isn't quite right. You're comparing the speed of categorical to itself for sampling.
Here's a correction: --initialize, n = 10000 categorical : t = 0.013616 discrete : t = 0.001571 --sample, m = 100000 categorical : t = 0.029282 discrete : t = 0.014483
participants (3)
-
er
-
Kevin Martin
-
Steven Watanabe