[random]: several issues
HI, After some time I came back to the boost random library, and several things I noticed earlier have not changed since years ago. I hereby ask what the present state of ideas regarding these ideas is. Here's my list: a) The library itself provides files for random deviates of distributions not given in the documentation (e.g. poisson, gamma) etc. I find it truly sad that implementation and documentation is that out of synchronisation b) Several distributions require the engine to return a uniform deviate between [0,1), while other's don't have this prerequisite. I find this extremely error prone and purely documented (it is documented, but it should be so clearer, and especially, louder). Worse, I find it even harder (or impossible) to find out which range the engines return. I am not sure if there is any engine returning this range per se, or if I have always to go through uniform_01. Worst of all, there is neither a compile time check nor a runtime check if that [0,1) result-requirement of the engine holds (as far as I have seen the code) - I absolutely fail to see why such a critical assert is completely missing, also given the poor state of the documentation. The poisson and gamma for example also fall in this category, but are not documented at all. I consider this highly dangerous. c) Random numbers are tightly linked to statistical distributions, offered by the library of it's own. Wouldn't it be convenient to try to integrate the whole distribution part of the random numbers more closely into that library? Presently they are too confusingly standalone. d) Is there anyone actually responsible for the random library at the moment? Is it still under active development / maintained, or is old code simply propagating from release to release ? [No offense intended, but above issues let me have doubts regarding active management). thanks, Thomas
AMDG Thomas Mang wrote:
After some time I came back to the boost random library, and several things I noticed earlier have not changed since years ago. I hereby ask what the present state of ideas regarding these ideas is.
Here's my list:
a) The library itself provides files for random deviates of distributions not given in the documentation (e.g. poisson, gamma) etc. I find it truly sad that implementation and documentation is that out of synchronisation
b) Several distributions require the engine to return a uniform deviate between [0,1), while other's don't have this prerequisite. I find this extremely error prone and purely documented (it is documented, but it should be so clearer, and especially, louder). Worse, I find it even harder (or impossible) to find out which range the engines return. I am not sure if there is any engine returning this range per se, or if I have always to go through uniform_01. Worst of all, there is neither a compile time check nor a runtime check if that [0,1) result-requirement of the engine holds (as far as I have seen the code) - I absolutely fail to see why such a critical assert is completely missing, also given the poor state of the documentation. The poisson and gamma for example also fall in this category, but are not documented at all. I consider this highly dangerous.
You're not supposed to use the distributions directly. boost::variate_generator works with any engine and any distribution.
c) Random numbers are tightly linked to statistical distributions, offered by the library of it's own. Wouldn't it be convenient to try to integrate the whole distribution part of the random numbers more closely into that library? Presently they are too confusingly standalone.
Are you referring to the distributions in Boost.Math?
d) Is there anyone actually responsible for the random library at the moment? Is it still under active development / maintained, or is old code simply propagating from release to release ? [No offense intended, but above issues let me have doubts regarding active management).
In Christ, Steven Watanabe
[4th posting trial, apologize if the others would ever show up] Steven Watanabe wrote:
Thomas Mang wrote:
After some time I came back to the boost random library, and several things I noticed earlier have not changed since years ago. I hereby ask what the present state of ideas regarding these ideas is.
Here's my list:
a) The library itself provides files for random deviates of distributions not given in the documentation (e.g. poisson, gamma) etc. I find it truly sad that implementation and documentation is that out of synchronisation
b) Several distributions require the engine to return a uniform deviate between [0,1), while other's don't have this prerequisite. I find this extremely error prone and purely documented (it is documented, but it should be so clearer, and especially, louder). Worse, I find it even harder (or impossible) to find out which range the engines return. I am not sure if there is any engine returning this range per se, or if I have always to go through uniform_01. Worst of all, there is neither a compile time check nor a runtime check if that [0,1) result-requirement of the engine holds (as far as I have seen the code) - I absolutely fail to see why such a critical assert is completely missing, also given the poor state of the documentation. The poisson and gamma for example also fall in this category, but are not documented at all. I consider this highly dangerous.
You're not supposed to use the distributions directly. boost::variate_generator works with any engine and any distribution.
Well, in C++ there are many things one is not supposed to do, but that's not the point here. Does it really hurt that much to implement an assert in the distributions that the random draw was restricted to [0,1) ? (which can be done for the engines provided by boost at compile-time even, at least mostly I think)? Also if I am not supposed to use the distributions directly, then the interface of them is too public to me.
c) Random numbers are tightly linked to statistical distributions, offered by the library of it's own. Wouldn't it be convenient to try to integrate the whole distribution part of the random numbers more closely into that library? Presently they are too confusingly standalone.
Are you referring to the distributions in Boost.Math?
Yes, boost.Math/Statistical Distributions. Note that, in general through the inverse of the cdf, a random draw can be obtained from any distribution, just for some distribution the draw is 'particularly' simple. I find it poor from a design point of view to offer random draws of distributions in one library, and distributions without random draws in another library, and have them +- separate (of course I can manually use the inverse of the cdf and a uniform_01 random draw to achieve what I want, but this is less a technical issue, but more a organizatorial issue). My out-of-the-guts intuition is to have the random library only about worrying generating random numbers. How to turn these random numbers into draws from distributions should be something the distribution library worries about. But of course I know the random library has been developed long before the distributions library. Thomas
AMDG Thomas Mang wrote:
Steven Watanabe wrote:
Thomas Mang wrote:
After some time I came back to the boost random library, and several things I noticed earlier have not changed since years ago. I hereby ask what the present state of ideas regarding these ideas is.
Here's my list:
a) The library itself provides files for random deviates of distributions not given in the documentation (e.g. poisson, gamma) etc. I find it truly sad that implementation and documentation is that out of synchronisation
b) Several distributions require the engine to return a uniform deviate between [0,1), while other's don't have this prerequisite. I find this extremely error prone and purely documented (it is documented, but it should be so clearer, and especially, louder). Worse, I find it even harder (or impossible) to find out which range the engines return. I am not sure if there is any engine returning this range per se, or if I have always to go through uniform_01. Worst of all, there is neither a compile time check nor a runtime check if that [0,1) result-requirement of the engine holds (as far as I have seen the code) - I absolutely fail to see why such a critical assert is completely missing, also given the poor state of the documentation. The poisson and gamma for example also fall in this category, but are not documented at all. I consider this highly dangerous.
You're not supposed to use the distributions directly. boost::variate_generator works with any engine and any distribution.
Well, in C++ there are many things one is not supposed to do, but that's not the point here. Does it really hurt that much to implement an assert in the distributions that the random draw was restricted to [0,1) ? (which can be done for the engines provided by boost at compile-time even, at least mostly I think)?
This should be easy. Patches welcome :). However, the new standard eliminates variate_generator and requires that every random number engine work with every distribution. Eventually, Boost.Random needs to be brought into line with this.
Also if I am not supposed to use the distributions directly, then the interface of them is too public to me.
The interface for distributions needs to be specified so that new distributions can be created that plug into framework.
c) Random numbers are tightly linked to statistical distributions, offered by the library of it's own. Wouldn't it be convenient to try to integrate the whole distribution part of the random numbers more closely into that library? Presently they are too confusingly standalone.
Are you referring to the distributions in Boost.Math?
Yes, boost.Math/Statistical Distributions.
Note that, in general through the inverse of the cdf, a random draw can be obtained from any distribution, just for some distribution the draw is 'particularly' simple. I find it poor from a design point of view to offer random draws of distributions in one library, and distributions without random draws in another library, and have them +- separate (of course I can manually use the inverse of the cdf and a uniform_01 random draw to achieve what I want, but this is less a technical issue, but more a organizatorial issue). My out-of-the-guts intuition is to have the random library only about worrying generating random numbers. How to turn these random numbers into draws from distributions should be something the distribution library worries about. But of course I know the random library has been developed long before the distributions library.
Even for distributions for which there is no simple formula for generating a random variate, there are algorithms that are much more efficient than using the inverse cdf. Also, random distributions may need to maintain state that is not needed for any other use of the distributions. There needs to be some integration between Boost.Random and Boost.Math, but I'm not exactly sure how to go about it. In Christ, Steven Watanabe
Steven Watanabe wrote:
AMDG
Thomas Mang wrote:
Steven Watanabe wrote:
Thomas Mang wrote:
After some time I came back to the boost random library, and several things I noticed earlier have not changed since years ago. I hereby ask what the present state of ideas regarding these ideas is.
Here's my list:
a) The library itself provides files for random deviates of distributions not given in the documentation (e.g. poisson, gamma) etc. I find it truly sad that implementation and documentation is that out of synchronisation
b) Several distributions require the engine to return a uniform deviate between [0,1), while other's don't have this prerequisite. I find this extremely error prone and purely documented (it is documented, but it should be so clearer, and especially, louder). Worse, I find it even harder (or impossible) to find out which range the engines return. I am not sure if there is any engine returning this range per se, or if I have always to go through uniform_01. Worst of all, there is neither a compile time check nor a runtime check if that [0,1) result-requirement of the engine holds (as far as I have seen the code) - I absolutely fail to see why such a critical assert is completely missing, also given the poor state of the documentation. The poisson and gamma for example also fall in this category, but are not documented at all. I consider this highly dangerous.
You're not supposed to use the distributions directly. boost::variate_generator works with any engine and any distribution.
Well, in C++ there are many things one is not supposed to do, but that's not the point here. Does it really hurt that much to implement an assert in the distributions that the random draw was restricted to [0,1) ? (which can be done for the engines provided by boost at compile-time even, at least mostly I think)?
This should be easy. Patches welcome :).
However, the new standard eliminates variate_generator and requires that every random number engine work with every distribution. Eventually, Boost.Random needs to be brought into line with this.
I am not up-to-date with the proposals of C++0x. Will the random library become (conceptually) like a part of C++0x ?
Also if I am not supposed to use the distributions directly, then the interface of them is too public to me.
The interface for distributions needs to be specified so that new distributions can be created that plug into framework.
c) Random numbers are tightly linked to statistical distributions, offered by the library of it's own. Wouldn't it be convenient to try to integrate the whole distribution part of the random numbers more closely into that library? Presently they are too confusingly standalone.
Are you referring to the distributions in Boost.Math?
Yes, boost.Math/Statistical Distributions.
Note that, in general through the inverse of the cdf, a random draw can be obtained from any distribution, just for some distribution the draw is 'particularly' simple. I find it poor from a design point of view to offer random draws of distributions in one library, and distributions without random draws in another library, and have them +- separate (of course I can manually use the inverse of the cdf and a uniform_01 random draw to achieve what I want, but this is less a technical issue, but more a organizatorial issue). My out-of-the-guts intuition is to have the random library only about worrying generating random numbers. How to turn these random numbers into draws from distributions should be something the distribution library worries about. But of course I know the random library has been developed long before the distributions library.
Even for distributions for which there is no simple formula for generating a random variate, there are algorithms that are much more efficient than using the inverse cdf. Also, random distributions may need to maintain state that is not needed for any other use of the distributions. There needs to be some integration between Boost.Random and Boost.Math, but I'm not exactly sure how to go about it.
I am in general not familiar with the details of all the numerical algorithms of drawing random numbers, but I fully agree that the draws should be certainly efficient and used whenever applicable. Just keep in mind there is a major difference of going through the numerical calculation of the inverse cdf once you got a [0,1) input, or if you get a [0,1) input and then use any algorithm you like to convert that into the random draw. You seem to address the former, while I am more addressing the later, that is a common interface for the input to get a random draw. One way I could think of an integration - from a pure design point of view - is to specify a nested classes within the distribution class (like fisher_f_distribution<>::random) that kind of substitutes the distributions in the random library. There could be two nested classes actually, one that does sort of work as variate_generator, and one that accepts a [0,1) number as raw input. If there is an efficient algorithm for the distribution, use it. If not, go through the inverse cdf. And yeah of course invokation shall be able with both the generators of the random library (in which case all the information about the range of the random draws can be used, at compile-time occasionally even), or a general generator which provides the information of the bounds (either at run-time or compile-time) and finally a [0,1) draw gotten from whereever else, but with an assert of course that this precondition was held. This approach should not only also ease the integration of random draws for which the random library is not set up yet (inverse cdf, for example. I prefer a slightly less efficient implementation over none at all), but also to have a shared interface for additional distributions not yet in math.distributions (like mulitvariate normal, von-mises etc). Is there any interest in helping out regarding that, or develop these ideas further (if you know there is none, no changes will be accepted, I can safe time thinking of that further...) ? Thomas
AMDG Thomas Mang wrote:
I am not up-to-date with the proposals of C++0x. Will the random library become (conceptually) like a part of C++0x ?
Yes.
I am in general not familiar with the details of all the numerical algorithms of drawing random numbers, but I fully agree that the draws should be certainly efficient and used whenever applicable. Just keep in mind there is a major difference of going through the numerical calculation of the inverse cdf once you got a [0,1) input, or if you get a [0,1) input and then use any algorithm you like to convert that into the random draw. You seem to address the former, while I am more addressing the later, that is a common interface for the input to get a random draw.
One way I could think of an integration - from a pure design point of view - is to specify a nested classes within the distribution class (like fisher_f_distribution<>::random) that kind of substitutes the distributions in the random library. There could be two nested classes actually, one that does sort of work as variate_generator, and one that accepts a [0,1) number as raw input. If there is an efficient algorithm for the distribution, use it. If not, go through the inverse cdf. And yeah of course invokation shall be able with both the generators of the random library (in which case all the information about the range of the random draws can be used, at compile-time occasionally even), or a general generator which provides the information of the bounds (either at run-time or compile-time) and finally a [0,1) draw gotten from whereever else, but with an assert of course that this precondition was held. This approach should not only also ease the integration of random draws for which the random library is not set up yet (inverse cdf, for example. I prefer a slightly less efficient implementation over none at all), but also to have a shared interface for additional distributions not yet in math.distributions (like mulitvariate normal, von-mises etc).
I think it would be better to keep random distributions somewhat
decoupled, using
something like random_distribution
Is there any interest in helping out regarding that, or develop these ideas further (if you know there is none, no changes will be accepted, I can safe time thinking of that further...) ?
Some work has already been done on this in the sandbox: https://svn.boost.org/trac/boost/browser/sandbox/statistics/dist_random In Christ, Steven Watanabe
Some work has already been done on this in the sandbox: https://svn.boost.org/trac/boost/browser/sandbox/statistics/dist_random
This and related links will be phased out. Here's a new version that's hopefully a bit cleaner https://svn.boost.org/svn/boost/sandbox/statistics/distribution_toolkit/
participants (4)
-
er
-
Steven Watanabe
-
Thomas Mang
-
Thomas Mang