On Thu, Mar 5, 2015 at 2:39 PM, John Maddock <jz.maddock@googlemail.com> wrote:
First off, I notice there are no examples for generating floating point values in Boost.Random, so maybe what follows is based on a misunderstanding, or maybe not...
Lets say I generate values in [0,1] like so:
boost::random::mt19937 engine; boost::random::uniform_01<boost::random::mt19937, FPT> d(engine);
FPT d = d(); //etc
Where FPT is some floating point type.
Now my concern is that we're taking a 32-bit random integer and "stretching" it to a floating point type with rather more bits (53 for a double, maybe 113 for a long double, even more in the multi-precision world). So quantization effects will mean that there are many values which can never be generated.
It's true that I could use independent_bits_engine to gang together multiple random values and then pass that to uniform_01, however that supposes we have an unsigned integer type available with enough bits. cpp_int from boost.multiprecision would do it, and this does work, but the conversions involved aren't particularly cheap. It occurs to me that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case.
So I guess my questions are:
Am I worrying unnecessarily? and What is best practice in this area anyway?
Thanks, John.
I've worried about this in the past, but I've accepted that using a 64 bit integer engine instead of a 32 is good enough. A 64 bit engine reasonably saturates 64 bit float conversions, and having 2^-64 probability resolution is practically enough when computing statistics on large number of random draws (1 trillion draws<< 2^64) When using floating point random numbers there are a two main error sources: * the finite resolution of the probability engine -e.g. 32 bits in your example-. This determines the number of different random values you can generate. * but also the non linearity in the float representation. This determines the number of individual values you can generate in a small interval. E.g. there are many more float values close to zero then close to 1 when you convert the mt19937 integers to floats the interval U01. Since most statistical computations involve floating point computations so you'll have type 2) issues anyway. In that respect I would find it theoretically interesting (but I don't actually need it) to have random floating point numbers with a fixed exponent. That would remove the non-linearity of the float representation.