uniform_real rather alarmingly non-random?

I seriously hope I'm doing something wrong here, but the last 20 bits in a double returned from uniform_real appear to always be zero. Is this by design? I really hope not! Here's the test code: template <class T> void test_random(T) { boost::mt19937 rnd; boost::uniform_real<T> ur_a(0.5, 1); boost::variate_generator<std::tr1::mt19937, std::tr1::uniform_real<T> > gen(rnd, ur_a); for(int j = 0; j < INT_MAX-2; ++j) { T v = gen(); // get the digits of T: unsigned long long i = (unsigned long long)std::floor(std::ldexp(v, std::numeric_limits<T>::digits)); i &= 0xfffff; // last 20 bits std::cout << std::hex << i << std::endl; } } Try calling with a double as an argument and it'll output nothing but zero's. John.

Hi John, John Maddock wrote:
I seriously hope I'm doing something wrong here, but the last 20 bits in a double returned from uniform_real appear to always be zero. Is this by design? I really hope not!
Here's the test code:
template <class T> void test_random(T) { boost::mt19937 rnd; boost::uniform_real<T> ur_a(0.5, 1); boost::variate_generator<std::tr1::mt19937, std::tr1::uniform_real<T> > gen(rnd, ur_a); for(int j = 0; j < INT_MAX-2; ++j) { T v = gen(); // get the digits of T: unsigned long long i = (unsigned long long)std::floor(std::ldexp(v, std::numeric_limits<T>::digits)); i &= 0xfffff; // last 20 bits std::cout << std::hex << i << std::endl; } }
Try calling with a double as an argument and it'll output nothing but zero's.
If memory serves, I've hit on this before - but I don't think you'll like answer - you're going to have to push the variate_generator gen (, and its dependants rnd, and ur_a) into the global namespace. Cheers, -- 18.03.2006 Manfred Doudar - Research Engineer National ICT Australia (NICTA) Research School of Information Sciences and Engineering (RSISE) The Australian National University - Canberra, ACT 0200 AUSTRALIA

John Maddock wrote:
I seriously hope I'm doing something wrong here, but the last 20 bits in a double returned from uniform_real appear to always be zero. Is this by design? I really hope not!
If memory serves, I've hit on this before - but I don't think you'll like answer - you're going to have to push the variate_generator gen (, and its dependants rnd, and ur_a) into the global namespace.
I had problems with copy constructor in my class that contained random generator, so as a quick'n'dirty solution I decided to put that generator as global variable in global space of my program. Looks like I was lucky to do that - otherwise I'd get non-random numbers? -- Janek Kozicki |

If memory serves, I've hit on this before - but I don't think you'll like answer - you're going to have to push the variate_generator gen (, and its dependants rnd, and ur_a) into the global namespace.
Sorry but I don't understand that, how does that influence runtime behaviour? John.

John Maddock wrote:
If memory serves, I've hit on this before - but I don't think you'll like answer - you're going to have to push the variate_generator gen (, and its dependants rnd, and ur_a) into the global namespace.
Sorry but I don't understand that, how does that influence runtime behaviour?
John.
Wish I could say definitively, but I would not seemingly get randomness otherwise - never looked for the reasons why at the time, but it worked - though like yourself I don't see why it would influence the runtime behaviour of the beast. However, the reasons posted by others on the list seem to sound plausible - I on the other hand did only recant from experience upon artificially manufacturing wave burst profiles from wave spectra using an IFFT, and do recall that the documentation for the mersene twister did the same. Cheers, -- Manfred 19.03.2006

John Maddock wrote:
I seriously hope I'm doing something wrong here, but the last 20 bits in a double returned from uniform_real appear to always be zero. Is this by design? I really hope not!
This seems to be dependant on the type of generator you use. Try using lagged_fibonacci. The casting and dividing involved in uniform_real appears to loose a lot of accuracy, but lagged_fibonacci generates doubles between 0 and 1, so there's no cast and it's dividing by 1. Interestingly, minstd_rand also seems to work better. Purely speculatively, I think this might be because of the ranges of values it generates - with mt19937 the random number is divided by 0xffffffff, while for minstd_rand it is divided by 0x7ffffffd. Although, I wouldn't be surprised if it performs badly by some other measure. Come to think of it, uniform_real is meant to be generating a half-open range - so maybe it should be diving by (max() - min() + 1) for integers. Looking at the concept documentation: 'For integer generators (i.e. integer T), the generated values x fulfill min() <= x <= max(), for non-integer generators (i.e. non-integer T), the generated values x fulfill min() <= x < max().' Or are you only meant to use floaing point generators when generating a floating point distribution? Daniel

This seems to be dependant on the type of generator you use. Try using lagged_fibonacci. The casting and dividing involved in uniform_real appears to loose a lot of accuracy, but lagged_fibonacci generates doubles between 0 and 1, so there's no cast and it's dividing by 1.
Good idea.
Interestingly, minstd_rand also seems to work better. Purely speculatively, I think this might be because of the ranges of values it generates - with mt19937 the random number is divided by 0xffffffff, while for minstd_rand it is divided by 0x7ffffffd. Although, I wouldn't be surprised if it performs badly by some other measure.
Come to think of it, uniform_real is meant to be generating a half-open range - so maybe it should be diving by (max() - min() + 1) for integers. Looking at the concept documentation: 'For integer generators (i.e. integer T), the generated values x fulfill min() <= x <= max(), for non-integer generators (i.e. non-integer T), the generated values x fulfill min() <= x < max().'
Or are you only meant to use floaing point generators when generating a floating point distribution?
No, but.... I've had a private mail from Jens to indicate that this is by design, or at least it's a QOI issue not a bug. In the current Boost implementation if you use a generator that produces int's, then you only get sizeof(int)*CHAR_BIT random bits in the result (which may or may not get distributed throughout the double). One solution seems to be to pack a couple of random int's into one double, but it's a hack I'd rather not get into just now. I'll try again with a lagged_fibonacci and see how it looks... Thanks all, John.

Interestingly, minstd_rand also seems to work better. Purely speculatively, I think this might be because of the ranges of values it generates - with mt19937 the random number is divided by 0xffffffff, while for minstd_rand it is divided by 0x7ffffffd. Although, I wouldn't be surprised if it performs badly by some other measure.
Come to think of it, uniform_real is meant to be generating a half-open range - so maybe it should be diving by (max() - min() + 1) for integers. Looking at the concept documentation: 'For integer generators (i.e. integer T), the generated values x fulfill min() <= x <= max(), for non-integer generators (i.e. non-integer T), the generated values x fulfill min() <= x < max().'
Or are you only meant to use floaing point generators when generating a floating point distribution?
No, but.... I've had a private mail from Jens to indicate that this is by design, or at least it's a QOI issue not a bug.
It is actually very important that uniform_real creates numbers in a half-open interval, and all random number libraries are actually implemented that way. To give just one example, the exponential distribution need to evaluate an expression like std::log(1.-u) where u is a uniform random number in the interval [0,1.). If instead u were from the interval [0,1], it could take on the value 1., leading to a zero argument to the log, and thus a problem. The half- open interval is thus the right thing to do. Matthias

Matthias Troyer wrote:
It is actually very important that uniform_real creates numbers in a half-open interval, and all random number libraries are actually implemented that way. To give just one example, the exponential distribution need to evaluate an expression like
std::log(1.-u)
where u is a uniform random number in the interval [0,1.). If instead u were from the interval [0,1], it could take on the value 1., leading to a zero argument to the log, and thus a problem. The half- open interval is thus the right thing to do.
Sorry, I was mistaken about this. I thought that the output of mt19937 was being fed directly into uniform_real, but what actually happens is that boost::variate_generator detects this situation and first uses boost::uniform_01 to correctly convert the distribution into a half-open one. This also explains a lot of the other things which were puzzling me. Daniel

This seems to be dependant on the type of generator you use. Try using lagged_fibonacci. The casting and dividing involved in uniform_real appears to loose a lot of accuracy, but lagged_fibonacci generates doubles between 0 and 1, so there's no cast and it's dividing by 1.
Just a quick update for those who are interested: using lagged_fibonacci does indeed solve the problem and allows the iostreams code I posted earlier to detect the VC8 streaming problem. Unfortunately lagged_fibonacci isn't part of TR1: the only real number generator that is (subtract_with_carry_01) appears to still leave the final 4 bits as zero :-( John.

On Mar 18, 2006, at 8:54 AM, John Maddock wrote:
This seems to be dependant on the type of generator you use. Try using lagged_fibonacci. The casting and dividing involved in uniform_real appears to loose a lot of accuracy, but lagged_fibonacci generates doubles between 0 and 1, so there's no cast and it's dividing by 1.
Just a quick update for those who are interested: using lagged_fibonacci does indeed solve the problem and allows the iostreams code I posted earlier to detect the VC8 streaming problem.
Unfortunately lagged_fibonacci isn't part of TR1: the only real number generator that is (subtract_with_carry_01) appears to still leave the final 4 bits as zero :-(
There are simple reason why lagged_fibonacci and the linear congruential generators fill the low bits with apparently random values, and the Mersenne twister does not. - The Mersenne twister produces integers modulo 2^32. Hence if you scale to a uniform real in the interval [0,1), the integer value is divided by 2^32, and the low bits remain 0. - With the 32-bit linear congruential generators, such as minstd_rand, the numbers produced are modulo 2^31-1, a prime number, and not modulo a power of 2. Hence, the low bits look random, although you actually get fewer distinct floating point values (only 2^31-1)! - The lagged Fibonacci floating point generators are by default seeded with minstd_rand0, and hence have low bits that are nonzero. If instead you were to seed it with numbers created from a Mersenne twister, you would see the same problems. Having said that I think that for most applications it is no problem that the low 20 bits remain zero. I don't know of any application requiring these low bits to be random as well. Usually 32 random bits are more than enough. If, however, you need more bits than I would recommend looking into 64 bit generators, such as the lcg64 in the SPRNG library, a wrapper for which exists in the sandbox. Matthias

On 2006-03-17, John Maddock <john@johnmaddock.co.uk> wrote:
I seriously hope I'm doing something wrong here, but the last 20 bits in a double returned from uniform_real appear to always be zero. Is this by design? I really hope not!
Take a look at the implementation of uniform_real - you'll see that it does: template<class Engine> result_type operator()(Engine& eng) { return static_cast<result_type>(eng() - eng.min()) / static_cast<result_type>(eng.max() - eng.min()) * (_max - _min) + _min; } (with suitable removal of min/max protection macros.) Since the engine is a integer based generator (probably 32 bits on most 32 bit platforms) there is no way it can generate more than 32 bits of randomness with this implementation. Hence the 20 bits of zeroes at the end of the double. What it really needs to do is detect that the engine type hasn't got enough bits and call the engine repeatedly until it has filled the result_type up... phil -- change name before "@" to "phil" for email
participants (6)
-
Daniel James
-
Janek Kozicki
-
John Maddock
-
Manfred Doudar
-
Matthias Troyer
-
Phil Richards