Question about Boost.Random

// The following program illustrates a problem in Boost.Random. // Is this the desired behavior? // If so, what is the reason for this? // I use Boost version 1.31 and GCC version 3.3.3 on Suse 9.1 #include <vector> #include <algorithm> #include <boost/random.hpp> // Definition of the random number generator typedef boost::mt19937 engine_t; typedef boost::uniform_int<int> distribution_t; typedef boost::variate_generator<engine_t, distribution_t> rng_t; int main() { const unsigned int N = 13, A = 0, B = 1000; std::vector<int> run1(N), run2(N), run3(N); // Generate some random integers { engine_t engine; distribution_t distribution(A,B); rng_t rng(engine, distribution); std::generate(run1.begin(), run1.end(), rng); } // Verify that I get the same sequence if I start again { engine_t engine; distribution_t distribution(A,B); rng_t rng(engine, distribution); std::generate(run2.begin(), run2.end(), rng); } // It appears that the generator restarts for each generate, since // the generate algorithm creates a copy of the functor so that // its state is lost { engine_t engine; distribution_t distribution(A,B); rng_t rng(engine, distribution); std::generate(run3.begin(), run3.begin()+N/2, rng); std::generate(run3.begin()+N/2, run3.end(), rng); } // For me it would seem more natural if the engine class would be a // singleton, so that there can be only one object of any particular // random generator engine type. // Our purpose for using random numbers is to simulate physical processes. // I can't think of a case when there would be a need for several random // engines, even if there can of course be many different random // distributions in an application. // Having the same random number sequence in different parts of a simulation // can probably create very strange behaviour, since the numbers from the // different engines of the same type will be perfectly correlated. // Perhaps other applications have use of several random engines? for(unsigned i=0; i<N; ++i) { if(i==N/2) std::cout<<"Halfway there!"<<std::endl; std::cout <<run1[i] <<" == "<<run2[i] <<" == "<<run3[i] <<std::endl; } return 0; } // Anders Edin // Sidec Technologies AB

On Thursday, July 8, 2004, at 11:44 AM, Anders Edin wrote:
// For me it would seem more natural if the engine class would be a // singleton, so that there can be only one object of any particular // random generator engine type.
// Our purpose for using random numbers is to simulate physical processes. // I can't think of a case when there would be a need for several random // engines, even if there can of course be many different random // distributions in an application.
// Having the same random number sequence in different parts of a simulation // can probably create very strange behaviour, since the numbers from the // different engines of the same type will be perfectly correlated.
// Perhaps other applications have use of several random engines?
One of the important characteristics of a pseudo random number generator is, oddly enough, determinancy. Given the same seed, you get the same sequence and can therefore reproduce the exact same results. This allows results to be checked and bugs to be tracked down reliably. If another thread, or just a library used by your application (for example, something that uses a pseudo-random numbers for dithering a graphic display), was using the same singleton engine as your application this characteristic would be lost. The number of values drawn by the other parts of your system might change or reseeding of the engine might happen without your knowledge. No serious statistical, scientific, or simulation results should ever be done using any kind of shared state RNG (including the C stdlib rand function). Topher Cooper

One of the important characteristics of a pseudo random number generator is, oddly enough, determinancy. Given the same seed, you get the same sequence and can therefore reproduce the exact same results. This allows results to be checked and bugs to be tracked down reliably.
Well, this is of course fine for debugging. However, when running a simulation shouldn't you rather think of statistics than of finding bugs?
If another thread, or just a library used by your application (for example, something that uses a pseudo-random numbers for dithering a graphic display), was using the same singleton engine as your application this characteristic would be lost. The number of values drawn by the other parts of your system might change or reseeding of the engine might happen without your knowledge.
If you use one generator of the same type in each thread, but within the same simulation application, how do you know that the different threads are not correlated in the statistics sense? Do you set the seeds far apart? If one is not careful the random distribution produced by the application is not the one you thought it would be. As before the application I have in mind is physics simulations. If you use random numbers for something else perhaps the needs are different. -- Anders Edin, Sidec Technologies AB

On Monday, July 12, 2004, at 03:37 AM, Anders Edin wrote:
One of the important characteristics of a pseudo random number generator is, oddly enough, determinancy. Given the same seed, you get the same sequence and can therefore reproduce the exact same results. This allows results to be checked and bugs to be tracked down reliably.
Well, this is of course fine for debugging. However, when running a simulation shouldn't you rather think of statistics than of finding bugs?
First off, are you proposing that you carefully use one PRNG that does not have hidden, global state to do your debugging and validation, then rip it out of your code to replace it with a different PRNG with a different interface to produce your actual runs with un-debugged, un-validated code? I don't think you thought that one through. Secondly, in the three areas I mentioned later -- statistical, scientific and simulation applications -- you should never consider that you have stopped debugging. You make what appears to be a successful run and record your results. Sometime later you do a different one that does not seem fully consistent with the earlier run. What do you do? Pretend that there is no problem? Pick one of the runs at random and throw it out? Include a footnote saying that there may be something wrong with your results, but you have no idea what? Or take the seeds you recorded from each run, recreate the runs and try to resolve the issue? This folks is the simulation equivalent to cleaning your test-tubes in a chemistry lab -- a fundamental, elementary lab procedure that is necessary for reliable results.
If another thread, or just a library used by your application (for example, something that uses a pseudo-random numbers for dithering a graphic display), was using the same singleton engine as your application this characteristic would be lost. The number of values drawn by the other parts of your system might change or reseeding of the engine might happen without your knowledge.
If you use one generator of the same type in each thread, but within the same simulation application, how do you know that the different threads are not correlated in the statistics sense? Do you set the seeds far apart? If one is not careful the random distribution produced by the application is not the one you thought it would be.
As von Neuman's famous quote has it, when you are using pseudo-random numbers you are "in a state of sin." You are taking a sequence of numbers that are anything but random, and simply pretending that they are, in fact, perfectly random. You get away with it by carefully considering the characteristics of that non-randomness and, through careful analysis and testing you make sure that the non-randomness doesn't matter to what you are doing with it. If your application uses random numbers in multiple places you must be sure that there are no meaningful correlations between the streams in those different places -- either by using a single pseudo-random number stream for all of them or by using a set of PRNGs that are independent in statistical tests. It is a fact of life in modern programming systems we use many packages and libraries the precise content of we have no control over and whose contents we are frequently ignorant of. The requirement that this puts on us in using pseudo-random generators is that any correlation (*or side effect*) that might be caused by use of such libraries of pseudo-random generators (whether or not we *know* that they make use of them) should not have any affect on our results. The example I gave -- a PRNG used for controlling shading in a graphics package for displaying the results -- is likely to have that characteristic. Other possibilities involve interthread, interprocess or interprocessor communication protocols, "random" keys assigned to data-structure nodes for hashing and various algorithms that introduce randomness to make worst-case performance situations unlikely. Correlations of a simulation's PRNG with the PRNGs used in such packages are unlikely to invalidate your results (but you should always, of course, consider the possibility that they might -- one reason that you should always repeat runs with at least two different PRNGs in any serious simulation).
As before the application I have in mind is physics simulations. If you use random numbers for something else perhaps the needs are different.
On the contrary, that is a prime example of an area where this is an absolute requirement. If you do actual physics experiments in a physics lab you keep a careful record of *every* aspect of the experiment that you can in order to be able to re-examine, re-analyze and replicate the experiment as closely as possible. Why would the fact that your experiment is run in a "virtual lab" where you have the capability to easily record and replicate every aspect of the experiment so much more precisely lead you to discard one of the basic principles of scientific rigor? Of course, you *could* save the entire stream of pseudo-random numbers used with every run instead, but it is so much more compact and convenient, don't you think, to just record the seed and make sure that all the packages and code necessary is kept in a well-maintained source control system. Just to remind you of a rather famous example of this. Lorenz was doing some simulations of weather systems. He discovered, that when he attempted to rerun his simulations using the precise starting points he had recorded he got entirely different results. The starting conditions had been recorded as decimal printouts introducing a tiny rounding difference -- a "half-bit" difference in the least significant figure, which nevertheless resulted in entirely different outcomes. His investigation of this lead to the (re)discovery of the "butterfly effect", and is generally considered the beginning of modern chaos theory.
-- Anders Edin, Sidec Technologies AB _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Anders Edin
-
Topher Cooper