[random] seeding with arbitrary integers

AMDG Currently, some PRNGs cannot be seeded with certain values. For example, the seed for linear_congruential cannot be a multiple of the modulus if the offset is zero. lagged_fibonacci also cannot be seeded with 0, because it uses minstd_rand0. Since linear_congruential already tries to wrap the seed to fit in the range, I'd like to change it so that any seed works. The other generator that needs to be changed is linear_feedback_shift. linear_feedback_shift cannot be seeded with a small integer. What I've done is to add the minimum value to the seed, if the seed is too small. I'm not sure that this is a good solution, since this means that there are a few (and only a few) seeds that result in identical generators. Thoughts? Would it be better to wrap the values to make them fit in range instead? At least this would be consistent with linear_congruential. I've attached a patch with these changes and test cases. See also: http://svn.boost.org/trac/boost/ticket/3516 which prompted this investigation. In Christ, Steven Watanabe

This sounds like the functions in question will silently alter the seed so that it "just works". If this is what it is, I should say I don't like it. This means that if I do something invalid (a bad seed), I never find out about it. I would much prefer that if I pass an invalid parameter, I get either an return error code or an exception. Of course this is a pain as I have to go to the manual and figure out what I'm doing wrong. But it's much better than trying discover at some later time that things weren't doing what I thought they were. It's also a pain that this means that the documentation has to be updated to explain why some seeds are not valid. But, that should already be in there. Robert Ramey Steven Watanabe wrote:
AMDG
Currently, some PRNGs cannot be seeded with certain values. For example, the seed for linear_congruential cannot be a multiple of the modulus if the offset is zero. lagged_fibonacci also cannot be seeded with 0, because it uses minstd_rand0. Since linear_congruential already tries to wrap the seed to fit in the range, I'd like to change it so that any seed works.
The other generator that needs to be changed is linear_feedback_shift. linear_feedback_shift cannot be seeded with a small integer. What I've done is to add the minimum value to the seed, if the seed is too small. I'm not sure that this is a good solution, since this means that there are a few (and only a few) seeds that result in identical generators. Thoughts? Would it be better to wrap the values to make them fit in range instead? At least this would be consistent with linear_congruential.
I've attached a patch with these changes and test cases.
See also: http://svn.boost.org/trac/boost/ticket/3516 which prompted this investigation.
In Christ, Steven Watanabe
Index: boost/random/linear_feedback_shift.hpp =================================================================== --- boost/random/linear_feedback_shift.hpp (revision 57008) +++ boost/random/linear_feedback_shift.hpp (working copy) @@ -77,7 +77,12 @@ seed(first, last); }
- void seed(UIntType s0 = 341) { assert(s0 >= (1 << (w-k))); value = s0; } + void seed(UIntType s0 = 341) { + if(s0 < (1 << (w-k))) { + s0 += 1 << (w-k); + } + value = s0; + } template<class It> void seed(It& first, It last) { if(first == last) Index: boost/random/linear_congruential.hpp =================================================================== --- boost/random/linear_congruential.hpp (revision 57008) +++ boost/random/linear_congruential.hpp (working copy) @@ -54,12 +54,9 @@ // BOOST_STATIC_ASSERT(m == 0 || c < m);
explicit linear_congruential(IntType x0 = 1) - : _modulus(modulus), _x(_modulus ? (x0 % _modulus) : x0) + : _modulus(modulus) { - assert(c || x0); /* if c == 0 and x(0) == 0 then x(n) = 0 for all n */ - // overflow check - // disabled because it gives spurious "divide by zero" gcc warnings - // assert(m == 0 || (a*(m-1)+c) % m == (c < a ? c-a+m : c-a)); + seed(x0);
// MSVC fails BOOST_STATIC_ASSERT with std::numeric_limits at class scope #ifndef BOOST_NO_LIMITS_COMPILE_TIME_CONSTANTS @@ -77,8 +74,18 @@ // compiler-generated copy constructor and assignment operator are fine void seed(IntType x0 = 1) { - assert(c || x0); - _x = (_modulus ? (x0 % _modulus) : x0); + // wrap _x if it doesn't fit in the destination type + if(_modulus == 0) { + _x = x0; + } else { + _x = x0 % (modulus - min()); + } + // adjust to the correct range + if(_x < min()) { + _x += (modulus - min()); + } + assert(_x >= min()); + assert(_x <= max()); }
template<class It> @@ -86,8 +93,7 @@ { if(first == last) throw std::invalid_argument("linear_congruential::seed"); - IntType value = *first++; - _x = (_modulus ? (value % _modulus) : value); + seed(*first++); }
result_type min BOOST_PREVENT_MACRO_SUBSTITUTION () const { return c == 0 ? 1 : 0; } @@ -261,7 +267,7 @@ if(sizeof(T) < sizeof(uint64_t)) { return (static_cast<uint64_t>(x) << 16) | 0x330e; } else { - return(static_cast<uint64_t>(x)); + return(static_cast<uint64_t>(x)); } } static uint64_t cnv(float x) { return(static_cast<uint64_t>(x)); } Index: libs/random/instantiate.cpp =================================================================== --- libs/random/instantiate.cpp (revision 57008) +++ libs/random/instantiate.cpp (working copy) @@ -133,53 +133,63 @@ boost::gamma_distribution<RealType>(1)); }
-template<class URNG, class T> -void test_seed(URNG & urng, const T & t) { - URNG urng2(t); - BOOST_CHECK(urng == urng2); - urng2.seed(t); - BOOST_CHECK(urng == urng2); +template<class URNG, class T, class Converted> +void test_seed_conversion(URNG & urng, const T & t, const Converted &) { + Converted c = static_cast<Converted>(t); + if(static_cast<T>(c) == t) { + URNG urng2(c); + BOOST_CHECK_MESSAGE(urng == urng2, std::string("Testing seed constructor: ") + typeid(Converted).name()); + urng2.seed(c); + BOOST_CHECK(urng == urng2); + } }
// rand48 uses non-standard seeding -template<class T> -void test_seed(boost::rand48 & urng, const T & t) { +template<class T, class Converted> +void test_seed_conversion(boost::rand48 & urng, const T & t, const Converted &) { boost::rand48 urng2(t); urng2.seed(t); }
template<class URNG, class ResultType> -void instantiate_seed(const URNG &, const ResultType &) { +void test_seed(const URNG &, const ResultType & v) { + typename URNG::result_type value = static_cast<typename URNG::result_type>(v); + + URNG urng(value); + + // integral types + test_seed_conversion(urng, value, static_cast<char>(0)); + test_seed_conversion(urng, value, static_cast<signed char>(0)); + test_seed_conversion(urng, value, static_cast<unsigned char>(0)); + test_seed_conversion(urng, value, static_cast<short>(0)); + test_seed_conversion(urng, value, static_cast<unsigned short>(0)); + test_seed_conversion(urng, value, static_cast<int>(0)); + test_seed_conversion(urng, value, static_cast<unsigned int>(0)); + test_seed_conversion(urng, value, static_cast<long>(0)); + test_seed_conversion(urng, value, static_cast<unsigned long>(0)); +#if !defined(BOOST_NO_INT64_T) + test_seed_conversion(urng, value, static_cast<boost::int64_t>(0)); + test_seed_conversion(urng, value, static_cast<boost::uint64_t>(0)); +#endif + + // floating point types + test_seed_conversion(urng, value, static_cast<float>(0)); + test_seed_conversion(urng, value, static_cast<double>(0)); + test_seed_conversion(urng, value, static_cast<long double>(0)); +} + +template<class URNG, class ResultType> +void instantiate_seed(const URNG & urng, const ResultType &) { { URNG urng; URNG urng2; urng2.seed(); BOOST_CHECK(urng == urng2); } - { - int value = 127; - URNG urng(value); - - // integral types - test_seed(urng, static_cast<char>(value)); - test_seed(urng, static_cast<signed char>(value)); - test_seed(urng, static_cast<unsigned char>(value)); - test_seed(urng, static_cast<short>(value)); - test_seed(urng, static_cast<unsigned short>(value)); - test_seed(urng, static_cast<int>(value)); - test_seed(urng, static_cast<unsigned int>(value)); - test_seed(urng, static_cast<long>(value)); - test_seed(urng, static_cast<unsigned long>(value)); -#if !defined(BOOST_NO_INT64_T) - test_seed(urng, static_cast<boost::int64_t>(value)); - test_seed(urng, static_cast<boost::uint64_t>(value)); -#endif - - // floating point types - test_seed(urng, static_cast<float>(value)); - test_seed(urng, static_cast<double>(value)); - test_seed(urng, static_cast<long double>(value)); - } + test_seed(urng, 0); + test_seed(urng, 127); + test_seed(urng, 539157235); + test_seed(urng, ~0u); }
template<class URNG, class ResultType>
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

AMDG Robert Ramey wrote:
This sounds like the functions in question will silently alter the seed so that it "just works".
If this is what it is, I should say I don't like it. This means that if I do something invalid (a bad seed), I never find out about it. I would much prefer that if I pass an invalid parameter, I get either an return error code or an exception. Of course this is a pain as I have to go to the manual and figure out what I'm doing wrong. But it's much better than trying discover at some later time that things weren't doing what I thought they were.
I checked (a slightly dated) working draft and it looks the the standard does this kind of fixing up. explicit linear_congruential_engine(result_type s = default_seed); Effects: Constructs a linear_congruential_engine object. If c mod m is 0 and s mod m is 0, sets the engine’s state to 1, otherwise sets the engine’s state to s mod m. I guess the question is: what does the seed represent? It's only the initial state for the simple generators like linear_congruential etc. Since in general, seed applies some algorithm to generate the state from the seed, I don't see why the algorithm shouldn't be defined in such a way that any seed is valid.
It's also a pain that this means that the documentation has to be updated to explain why some seeds are not valid. But, that should already be in there.
The problem came up with lagged_fibonacci. There's no good reason, IMHO for lagged_fibonacci to reject any seed, since the state of a lagged_fibonacci generator is much larger than an int. On the other hand, it would be better if the seeding algorithm for lagged fibonacci never allowed distinct seeds to result in the same state. In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
explicit linear_congruential_engine(result_type s = default_seed); Effects: Constructs a linear_congruential_engine object. If c mod m is 0 and s mod m is 0, sets the engines state to 1, otherwise sets the engines state to s mod m.
So I might be passing two different seeds and generate the same sequence of results? This would be the exact opposite of what I would be trying to do and would likely result in a silent error that would be a major bitch to find. This isn't so much a specifice critcism of this particular situation as I don't really understand the library or the standard. It's more of rant against the things that seem to trap me into spending disproportional amounts of time on a regular basis. This idea of "silently fixing things" seems to be quite popular and to me, is the cause of much grief. Robert Ramey

AMDG Robert Ramey wrote:
Steven Watanabe wrote:
explicit linear_congruential_engine(result_type s = default_seed); Effects: Constructs a linear_congruential_engine object. If c mod m is 0 and s mod m is 0, sets the engine’s state to 1, otherwise sets the engine’s state to s mod m.
So I might be passing two different seeds and generate the same sequence of results? This would be the exact opposite of what I would be trying to do and would likely result in a silent error that would be a major bitch to find.
This is only a major problem if you explicitly pass the seed: boost::minstd_rand gen(42); If the seed comes from somewhere else: boost::minstd_rand gen(time(0)); then either the library or the user has to fix it up. Currently, the statement boost::minstd_rand gen(time(0)); will work almost all the time. However, it is wrong because it will fail when time(0) returns 2147483647. Is it worse to fix things up and possibly have multiple seeds yield that same sequence or to let code like this fail randomly once in a while? It seems to me that this bug is likely to be harder to trace because it won't even be reproducible.
This isn't so much a specifice critcism of this particular situation as I don't really understand the library or the standard. It's more of rant against the things that seem to trap me into spending disproportional amounts of time on a regular basis. This idea of "silently fixing things" seems to be quite popular and to me, is the cause of much grief.
In Christ, Steven Watanabe

Steven Watanabe wrote:
This is only a major problem if you explicitly pass the seed: boost::minstd_rand gen(42);
can't this the be trapped and an an exception thrown?
If the seed comes from somewhere else: boost::minstd_rand gen(time(0)); then either the library or the user has to fix it up.
Currently, the statement boost::minstd_rand gen(time(0)); will work almost all the time.
However, it is wrong because it will fail when time(0) returns 2147483647.
again, can't this be trapped?
Is it worse to fix things up and possibly have multiple seeds yield that same sequence or to let code like this fail randomly once in a while? It seems to me that this bug is likely to be harder to trace because it won't even be reproducible.
Isn't it possible trap the problem and throw an exception? To summarize, I would like any library I use to either work as advertised or notify me that it can't handle the arguments. And I would like this notification to be as soon as possible. If it can't do this, then how can I know that my program will really work? Robert Ramey

Robert Ramey wrote:
Steven Watanabe wrote:
Currently, the statement boost::minstd_rand gen(time(0)); will work almost all the time.
However, it is wrong because it will fail when time(0) returns 2147483647.
again, can't this be trapped?
During testing, that code might work without trouble. Then, when put into production, it fails, but only on some systems (or for some customers). That's hardly helpful, but suppose you try to debug the problem. Will it ever fail on you? Only on Jan 18 2038 at 22:14:07 EST (when time() returns 2147483647). If it failed in production, and you're now debugging it, the time must be after the bad value, so will it fail for you? The point is that unless minstd_rand_gen() throws an exception that indicates the failed value, you'd be hard pressed to understand why the failure occurred. Yes, you could study the valid input range, compare that to the valid output range of time(), and determine that they don't match, but that's a pain. Even if you know about the bad input, do you really want code that worked so well for so long to suddenly start failing? Maybe. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

AMDG Robert Ramey wrote:
Steven Watanabe wrote:
This is only a major problem if you explicitly pass the seed: boost::minstd_rand gen(42);
can't this the be trapped and an an exception thrown?
The code above is correct under any version of seeding. boost::minstd_rand gen(42); boost::minstd_rand gen2(2147483647 + 42); // gen and gen2 are the same Of course its possible to throw an exception. I'm not asking what's possible. I'm trying to determine what the correct behavior ought to be. In the case above, the user is explicitly passing the seeds and can easily make sure that the seeds match any preconditions of seed. Therefore in this case, the ideal behavior is not to do any fixing up. In the cases below, the user does not have as much control over the seed. Thus, it is much easier to accidentally violate preconditions. This wouldn't matter so much in another library. But the seed for a PRNG is usually going to be random. As a result, forbidding some seeds is likely to lead to intermittent failures.
If the seed comes from somewhere else: boost::minstd_rand gen(time(0)); then either the library or the user has to fix it up.
Currently, the statement boost::minstd_rand gen(time(0)); will work almost all the time.
However, it is wrong because it will fail when time(0) returns 2147483647.
again, can't this be trapped?
Only when it happens--which is very rare.
Is it worse to fix things up and possibly have multiple seeds yield that same sequence or to let code like this fail randomly once in a while? It seems to me that this bug is likely to be harder to trace because it won't even be reproducible.
Isn't it possible trap the problem and throw an exception?
To summarize, I would like any library I use to either work as advertised or notify me that it can't handle the arguments. And I would like this notification to be as soon as possible. If it can't do this, then how can I know that my program will really work?
seed will work as advertised. Regardless of the outcome of this discussion, I will make sure that the exact behavior is documented. In Christ, Steven Watanabe

seed will work as advertised. Regardless of the outcome of this discussion, I will make sure that the exact behavior is documented.
I think that's the key here: if the documentation clearly states that the seed is treated modulo N or whatever then that would be OK for me. Whatever the status quo is unacceptable given that seeding with unspecified values (from time() or some other non-deterministic device) is so commonplace. Throwing an exception for out-of-range values might work, but as pointed out the exception would occur so infrequently in practice that your just creating another source of bugs. Robert, what if the docs clearly stated that "Safe values for the seed are 0-N, values outside this range get wrapped within it via modulo N arithmetic, therefore different seeds can result in the same sequence of random values if one of those seeds is outside of the safe range." Or something like that ;-) John.

John Maddock wrote:
seed will work as advertised. Regardless of the outcome of this discussion, I will make sure that the exact behavior is documented.
I think that's the key here: if the documentation clearly states that the seed is treated modulo N or whatever then that would be OK for me.
Whatever the status quo is unacceptable given that seeding with unspecified values (from time() or some other non-deterministic device) is so commonplace. Throwing an exception for out-of-range values might work, but as pointed out the exception would occur so infrequently in practice that your just creating another source of bugs. Robert, what if the docs clearly stated that "Safe values for the seed are 0-N, values outside this range get wrapped within it via modulo N arithmetic, therefore different seeds can result in the same sequence of random values if one of those seeds is outside of the safe range." Or something like that ;-)
Fine by me. In fact, all of you have considered it more than I have so I'm very happy to defer to your judgment on this case. I just reacted to the general idea of fixing stuff up under the covers. It turns out that I just happened to need a quick and idiot-proof number generator so I used boost.random. I think the motivation for my original complaint was: "uh - oh, do have actually have to know something about random number generation to use this library?" If so, I've got a problem. I want it to "just work". Being idiot-proof was a pre-condition in this circumstance. I realize that satisifying such a pre-condition is the motivation for the proposed solution and I'm OK with that. But when this is done "silently", it can create problems which are devilishly hard to find. Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries. Of course at the root of the problem is the desire to use facilities that I have neither the knowledge nor time to implement myself. But that's what I use libraries for - so they almost have to be idiot proof to be useful to me. Just one man's opinion. Robert Ramey

AMDG Robert Ramey wrote:
Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries.
Do you have any specific issues?
Of course at the root of the problem is the desire to use facilities that I have neither the knowledge nor time to implement myself. But that's what I use libraries for - so they almost have to be idiot proof to be useful to me.
Just one man's opinion.
In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Robert Ramey wrote:
Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries.
Do you have any specific issues?
Since you asked. I think my problem is in the very first section under the title "Boost Random Number Library". I expected to find and example like: normal_distribution nd<int>(mean, standard_deviation) ... int random_number = nd(); ... Of course I realized pretty soon that I wasn't going to find that. So then I looked at the random_demo.cpp which was helpful and led me to look at library headers which wasn't all that helpful. Personally I would have found a small tutorial helpful. for example: The random number library is composed of: "generators" which generate random numbers and "distributors" which transform output from a "generator" into a particular distribution: A small complete example (from random_demo.cpp) // simulate rolling a die 10 times // construct a random simple random number generator boost::minstd_rand random_number_generator; // construct a uniform distribution for integers 1 through 6 boost::uniform<int> uniform_distribution(1,6); // hook the above together to create a die? // I'm not sure what to call this "variate_generator" didn't help me at all boost::variate_generator<boost::minstd_rand &, boost::uniform<int> & > die() // throw the die 10 times for(int i = 0; i < 10; ++i) std::out << die() << '\n' Notice the lack of typedefs and other stuff to make things more abstract". I eliminated these to make the example less ... uh abstract. To summarize, a little bit more introduction would have helped me get going faster. Robert Ramey

Robert Ramey wrote:
Steven Watanabe wrote:
AMDG
Robert Ramey wrote:
Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries.
Do you have any specific issues?
Since you asked. I think my problem is in the very first section under the title "Boost Random Number Library".
I expected to find and example like:
normal_distribution nd<int>(mean, standard_deviation) ... int random_number = nd(); ...
Funny you mention this. We were just having a discussion on IRC about boost documentation for some libraries. I think Robert is right on track. Many libraries are very rich is concept descriptions but lack a point of reference to start with. I'm sure it makes perfect sense for those who are intimate with purpose and guts of the library... but concepts alone leave you floundering for some time trying to hang the abstract on something concrete as you digest the possibilities. Having a simple example makes reading through the concepts so much easier. It provides a use-case for getting your bearings on what this library might do and how you might do it. Looking at concepts after and example enriches my understanding of the library. michael -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com

AMDG Robert Ramey wrote:
Steven Watanabe wrote:
Robert Ramey wrote:
Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries.
Do you have any specific issues?
Since you asked. I think my problem is in the very first section under the title "Boost Random Number Library".
<snip>
Thanks. I've created a ticket. In Christ, Steven Watanabe

To summarize, a little bit more introduction would have helped me get going faster.
Sigh... sadly the Random number library is in serious need of a face-lift, and it's author is rairly around here these days - not that we can blame him for that - Boost shouldn't be a life sentence - but we could really use someone experienced to give the library a makeover. John.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Thursday, October 22, 2009 7:47 PM To: boost@lists.boost.org Subject: Re: [boost] [random] seeding with arbitrary integers
Robert Ramey wrote:
Also, I found the documentation not good enough in general. I did make it work but it wasn't as easy to use as I would have hoped. Of course I could make the same criticism for many or most libraries.
I agree strongly with this - mentioning no names to protect the guilty ;-) Thin documentation is still major barrier to use of many Boost libraries. We all need to do better - and reviewers need to consider documentation more when considering libraries. But there is still a chicken and egg problem. Few people are going to write good documentation for other people until they are confident that they will have an audience. So I repeat my view that we need a two stage acceptance process. First that the library is fundamentally acceptable, and second that cosmetic aspects and documentation are good enough. Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Paul A. Bristow wrote:
-----Original Message-----
So I repeat my view that we need a two stage acceptance process.
First that the library is fundamentally acceptable, and second that cosmetic aspects and documentation are good enough.
I don't see this a real change. A tiny incremental change in the review process can address this. a) review b) acceptance This almost always conditional on some more work being done. At this point the author can check into the trunk etc. c) library is merged into release branch ONLY when the original review manager certifies that it's done. Robert Ramey

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Friday, October 23, 2009 3:46 PM To: boost@lists.boost.org Subject: Re: [boost] [random] seeding with arbitrary integers
Paul A. Bristow wrote:
-----Original Message-----
So I repeat my view that we need a two stage acceptance process.
First that the library is fundamentally acceptable, and second that cosmetic aspects and documentation are good enough.
I don't see this a real change. A tiny incremental change in the review process can address this.
a) review b) acceptance This almost always conditional on some more work being done. At this point the author can check into the trunk etc. c) library is merged into release branch ONLY when the original review manager certifies that it's done.
Sounds better to me. And improvements to the documentation toolchain (& its documentation!) are probably even more important. Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Does the documentation say that two different seeds will produce different sequences? If so, is there a stated guarantee when they will diverge? In general, unless you understand the guts of the algorithm -- including how a seed is turned into an initial state -- or have been given a guarantee by someone who presumably does, you should not assume that different seeds will definitely produce different results. You should be able, however, to assume that two rngs seeded with two arbitrarily chosen seeds have a very low probability of remaining in sync for very long. One mark of good quality for an implementation of an RNG is that the arbitrariness of the choice of those seeds doesn't have to be very high: seeds that are small, or negative if interpreted as signed, or powers of two or whatever don't cause problems. When writing RNGs I generally do something like xor the seed with an arbitrary mask so that rather unarbitrary seeds become much more arbitrary ones. That is a better solution than adding the minimum value to the seed, since it will not map small values into other slightly less-small values. That circumstance makes it much more likely that two carelessly chosen seeds will map to the same place. Addressing this issue, by the way, was most of my "contribution" to the initially-published/benchmark implementation of MT19937, that resulted in my name plastered all over the Net. A lot of visibility for an hour or so of code cleanup. MT19937 passes very stringent randomness tests, both theoretical and pragmatic, under the assumption that the starting point is chosen randomly. There are regions of the state space, though, where its behavior is very far from random. The state space is so large, however, the probability of a random starting point hitting such a region is vanishingly small and can be ignored -- IF the seed is chosen randomly. In particular if the 19937 bits in the state vector have very few 1 bits then the next generation of the state vector will also have very few 1 bits (though generally more than before). Certain casual choices for the 32 bit seed in the initially circulated version of the code could result in just this situation. I suggested some changes (frankly, its been a while and I don't remember exactly what) so that one would have to work very hard to figure out a 32 bit seed value that would result in one ending up in this Random Number Sargasso Sea. Topher Cooper Steven Watanabe wrote:
AMDG
Currently, some PRNGs cannot be seeded with certain values. For example, the seed for linear_congruential cannot be a multiple of the modulus if the offset is zero. lagged_fibonacci also cannot be seeded with 0, because it uses minstd_rand0. Since linear_congruential already tries to wrap the seed to fit in the range, I'd like to change it so that any seed works.
The other generator that needs to be changed is linear_feedback_shift. linear_feedback_shift cannot be seeded with a small integer. What I've done is to add the minimum value to the seed, if the seed is too small. I'm not sure that this is a good solution, since this means that there are a few (and only a few) seeds that result in identical generators. Thoughts? Would it be better to wrap the values to make them fit in range instead? At least this would be consistent with linear_congruential.
I've attached a patch with these changes and test cases.
See also: http://svn.boost.org/trac/boost/ticket/3516 which prompted this investigation.
In Christ, Steven Watanabe
------------------------------------------------------------------------
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (7)
-
John Maddock
-
Michael Caisse
-
Paul A. Bristow
-
Robert Ramey
-
Steven Watanabe
-
Stewart, Robert
-
Topher Cooper