[Random] normal distribution different behaviour 1.55 vs 1.57

older
Boost.container iterator_traits...

Thomas M

18 Dec 2014 18 Dec '14

10:56 a.m.

Hi, I have just switched an existent project from boost 1.55.0 to 1.57.0 and my test suites diagnosed a divergence in program (a simulation) results. I have tracked down the issue to different behaviour in the normal distribution random variates generator. Below is a simple program (run under VS 2012, Update 4) that outputs random variates for a uniform and a normal distribution; comparing the output between 1.55.0 and 1.57.0 it turns out that the first bunch of uniformly distributed variates is identical, and then all others variates (both bunches of normally distributed variates as well as the second bunch of uniformly distributed variates) diverge. Neither for 1.56.0 nor 1.57.0 does the change-log list an update to the random numbers library, so first I am puzzled why the libraries behave differently (though [Math] has undergone some changes -> propagation to [Random]?). Second, it is not clear to me why for the first bunch of uniformly distributed variates the results are identical, while for the second, after normally distributed variates were generated, the are not. It appears that the generation of normally distributed variates changes the whole state of the random numbers engine in a different manner [e.g multiple engine calls ??]. Any insight into what is going on here is much appreciated. And foremost: Is any of 1.55 or 1.57 bugged, that is one should be clearly preferred over the other? many thanks, Thomas #include <ostream> #include <iostream> #include <fstream> #include "boost/random.hpp" void test_rng() { typedef boost::mt19937 rnengine_t; typedef boost::uniform_01<double> uniform_distr_t; typedef boost::normal_distribution<double> normal_distr_t; typedef boost::variate_generator<rnengine_t &, uniform_distr_t> uniform_gen_t; typedef boost::variate_generator<rnengine_t &, normal_distr_t> normal_gen_t; uniform_distr_t uniformDistr; normal_distr_t normalDistr(0, 1); rnengine_t rnEngine; rnEngine.seed(100); uniform_gen_t uniformGen(rnEngine, uniformDistr); normal_gen_t normalGen(rnEngine, normalDistr); std::ofstream outFile("randomNumbers.txt"); outFile.precision(16); int n = 20; for (int outer = 0; outer < 2; ++outer) { for (int i = 0; i < n; ++i) outFile << "Uniform-distr variate #" << (n*outer) + i + 1 << ": " << uniformGen() << std::endl; for (int i = 0; i < n; ++i) outFile << "Normal-distr variate #" << (n*outer) + i + 1 << ": " << normalGen() << std::endl; } } int main() { test_rng(); return 0; }

Show replies by date

Semen Trygubenko / Семен Тригубен ко

18 Dec 18 Dec

11:29 a.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

Hi Thomas: On Thu, Dec 18, 2014 at 11:56:10AM +0100, Thomas M wrote:

...

I have just switched an existent project from boost 1.55.0 to 1.57.0 and my test suites diagnosed a divergence in program (a simulation) results. I have tracked down the issue to different behaviour in the normal distribution random variates generator. Below is a simple program (run under VS 2012, Update 4) that outputs random variates for a uniform and a normal distribution; comparing the output between 1.55.0 and 1.57.0 it turns out that the first bunch of uniformly distributed variates is identical, and then all others variates (both bunches of normally distributed variates as well as the second bunch of uniformly distributed variates) diverge.

Neither for 1.56.0 nor 1.57.0 does the change-log list an update to the random numbers library, so first I am puzzled why the libraries behave differently (though [Math] has undergone some changes -> propagation to [Random]?). Second, it is not clear to me why for the first bunch of uniformly distributed variates the results are identical, while for the second, after normally distributed variates were generated, the are not. It appears that the generation of normally distributed variates changes the whole state of the random numbers engine in a different manner [e.g multiple engine calls ??]. Any insight into what is going on here is much appreciated. And foremost: Is any of 1.55 or 1.57 bugged, that is one should be clearly preferred over the other?

I've observed this, too! The change seems to have been introduced in Boost version 1.56. S. -- Семен Тригубенко http://trygub.com

John Maddock

5:51 p.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

...

...
Neither for 1.56.0 nor 1.57.0 does the change-log list an update to the random numbers library, so first I am puzzled why the libraries behave differently (though [Math] has undergone some changes -> propagation to [Random]?). Second, it is not clear to me why for the first bunch of uniformly distributed variates the results are identical, while for the second, after normally distributed variates were generated, the are not. It appears that the generation of normally distributed variates changes the whole state of the random numbers engine in a different manner [e.g multiple engine calls ??]. Any insight into what is going on here is much appreciated. And foremost: Is any of 1.55 or 1.57 bugged, that is one should be clearly preferred over the other?

I've observed this, too! The change seems to have been introduced in Boost version 1.56.

Looks like the normal distribution was completely rewritten between those two releases: https://github.com/boostorg/random/commit/f0ec97ba36c05ef00f2d29dcf66094e3f4... Beyond that I know nothing, John.

Thomas M

6:51 p.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

On 18/12/2014 18:51, John Maddock wrote:

...

...
...
Neither for 1.56.0 nor 1.57.0 does the change-log list an update to the random numbers library, so first I am puzzled why the libraries behave differently (though [Math] has undergone some changes -> propagation to [Random]?). Second, it is not clear to me why for the first bunch of uniformly distributed variates the results are identical, while for the second, after normally distributed variates were generated, the are not. It appears that the generation of normally distributed variates changes the whole state of the random numbers engine in a different manner [e.g multiple engine calls ??]. Any insight into what is going on here is much appreciated. And foremost: Is any of 1.55 or 1.57 bugged, that is one should be clearly preferred over the other?

I've observed this, too! The change seems to have been introduced in Boost version 1.56.

Looks like the normal distribution was completely rewritten between those two releases: https://github.com/boostorg/random/commit/f0ec97ba36c05ef00f2d29dcf66094e3f4...

Beyond that I know nothing, John.

Ok it appears that the algorithm generating the variates was completely changed (from Box-Muller to Ziggurat sampling), where the latter makes a variable number of engine calls. Thus not only differ the genertaed normally distributed variates themselves, but also the state of the engine afterwards. A quick search yielded that the Ziggurat is faster, but can someone also comment specifically on the robustness of the provided implementation? What has been the prime motivation of the change? Is it intended that such changes which are transparent to end-users do not become reflected in the change-logs? In my case I have the troubles that now firstly all my existent test cases are invalidated (the lesser issue) and that in general reproducibility among runs is not provided any more if I upgrade to 1.57 (the greater issue). thanks, Thomas

Neal Becker

8:18 p.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

Thomas M wrote:

...

On 18/12/2014 18:51, John Maddock wrote:

...
...
...
Neither for 1.56.0 nor 1.57.0 does the change-log list an update to the random numbers library, so first I am puzzled why the libraries behave differently (though [Math] has undergone some changes -> propagation to [Random]?). Second, it is not clear to me why for the first bunch of uniformly distributed variates the results are identical, while for the second, after normally distributed variates were generated, the are not. It appears that the generation of normally distributed variates changes the whole state of the random numbers engine in a different manner [e.g multiple engine calls ??]. Any insight into what is going on here is much appreciated. And foremost: Is any of 1.55 or 1.57 bugged, that is one should be clearly preferred over the other?

I've observed this, too! The change seems to have been introduced in Boost version 1.56.

Looks like the normal distribution was completely rewritten between those two releases:

https://github.com/boostorg/random/commit/f0ec97ba36c05ef00f2d29dcf66094e3f4...

...
Beyond that I know nothing, John.

Ok it appears that the algorithm generating the variates was completely changed (from Box-Muller to Ziggurat sampling), where the latter makes a variable number of engine calls. Thus not only differ the genertaed normally distributed variates themselves, but also the state of the engine afterwards.

A quick search yielded that the Ziggurat is faster, but can someone also comment specifically on the robustness of the provided implementation? What has been the prime motivation of the change?

Is it intended that such changes which are transparent to end-users do not become reflected in the change-logs? In my case I have the troubles that now firstly all my existent test cases are invalidated (the lesser issue) and that in general reproducibility among runs is not provided any more if I upgrade to 1.57 (the greater issue).

thanks, Thomas

I'd like to add my $0.02 here as well. This had happened to me some years ago, I believe to make boost::random conform to std::random. The breaking of tests is a bad thing. It should only be done after careful consideration. And then, it should be advertised LOUDLY in the release notes. Otherwise, some poor schmuck is going to waste a lot of time tracking down why his tests broke. And the random number generator is the last place he'd suspect. -- -- Those who don't understand recursion are doomed to repeat it

Semen Trygubenko / Семен Тригубен ко

11 p.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

On Thu, Dec 18, 2014 at 07:51:34PM +0100, Thomas M wrote:

...

Is it intended that such changes which are transparent to end-users do not become reflected in the change-logs? In my case I have the troubles that now firstly all my existent test cases are invalidated (the lesser issue) and that in general reproducibility among runs is not provided any more if I upgrade to 1.57 (the greater issue).

I can only second that! It took me a while to work out the cause of discrepancies in our case, and I've read the changelogs many times, following up any mention of "breaking change" into the code to see if it could have something to do with what we've observed. We have skipped a version and it was a lot of work to isolate and fix this issue as we initially thought it was down to other development work that already happened and that required the latest version of boost. If it was in the changelog we would have found it much sooner. On the positive side, our tests are now much more robust to that sort of changes. :) -- Семен Тригубенко http://trygub.com

oswin krause

19 Dec 19 Dec

7:34 a.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

On 19.12.2014 00:00, Semen Trygubenko / Семен Тригубенко wrote:

...

On the positive side, our tests are now much more robust to that sort of changes. :)

We had the same issue a few years ago when changing to boost::random - almost every test broke. It turned out that testing for specific values or only a small number of samples (the only tests which are affected by this kind of change) can mask a lot of bugs - even though some values seem to be correct, confidence intervals or measured variances can still be off. So for us a more robust test also meant a better test that discovered bugs. But yeah: such a change should be part of the change-log.

Neal Becker

11:56 a.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

oswin krause wrote:

...

On 19.12.2014 00:00, Semen Trygubenko / Семен Тригубенко wrote:

...
On the positive side, our tests are now much more robust to that sort of changes. :)

We had the same issue a few years ago when changing to boost::random - almost every test broke. It turned out that testing for specific values or only a small number of samples (the only tests which are affected by this kind of change) can mask a lot of bugs - even though some values seem to be correct, confidence intervals or measured variances can still be off. So for us a more robust test also meant a better test that discovered bugs.

But yeah: such a change should be part of the change-log. _______________________________________________

Not sufficient! Any such change should be discussed with the boost user community first. Boost should have a policy that any breaking changes are discussed on boost-dev.

Thomas M

21 Dec 21 Dec

9:23 p.m.

New subject: [Random] normal distribution different behaviour 1.55 vs 1.57

On 19/12/2014 12:56, Neal Becker wrote:

...

oswin krause wrote:

...
On 19.12.2014 00:00, Semen Trygubenko / Семен Тригубенко wrote:

...
On the positive side, our tests are now much more robust to that sort of changes. :)

We had the same issue a few years ago when changing to boost::random - almost every test broke. It turned out that testing for specific values or only a small number of samples (the only tests which are affected by this kind of change) can mask a lot of bugs - even though some values seem to be correct, confidence intervals or measured variances can still be off. So for us a more robust test also meant a better test that discovered bugs.

But yeah: such a change should be part of the change-log. _______________________________________________

Not sufficient! Any such change should be discussed with the boost user community first. Boost should have a policy that any breaking changes are discussed on boost-dev.

I second that, first the proposal should be discussed, and if implemented become very loudly announced in the change-logs. For what it's worth I believe it should have even been possible to add/offer different variates generation algorithms (for any distribution) without breaking current behaviour: an extra template argument (to the distribution itself), with the current algorithm as default type, and here we'd have (almost) the best of both worlds. Again a discussion with the community beforehand would be useful. As to making tests simply robust: out of the box I can come up (from own work, real-world) with cases when this is not possible, e.g. when obtaining a non-small sample takes a looong time (> weeks; sorry not waiting that long to test boost), or when there's no way around truly comparing values for [floating-point near-] equality. cheers, Thomas

3877

Age (days ago)

3880

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

John Maddock
Neal Becker
oswin krause
Semen Trygubenko / Семен Тригубен ко
Thomas M