[random] Quantization effects in generating floating point values
First off, I notice there are no examples for generating floating point values in Boost.Random, so maybe what follows is based on a misunderstanding, or maybe not... Lets say I generate values in [0,1] like so: boost::random::mt19937 engine; boost::random::uniform_01<boost::random::mt19937, FPT> d(engine); FPT d = d(); //etc Where FPT is some floating point type. Now my concern is that we're taking a 32-bit random integer and "stretching" it to a floating point type with rather more bits (53 for a double, maybe 113 for a long double, even more in the multi-precision world). So quantization effects will mean that there are many values which can never be generated. It's true that I could use independent_bits_engine to gang together multiple random values and then pass that to uniform_01, however that supposes we have an unsigned integer type available with enough bits. cpp_int from boost.multiprecision would do it, and this does work, but the conversions involved aren't particularly cheap. It occurs to me that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case. So I guess my questions are: Am I worrying unnecessarily? and What is best practice in this area anyway? Thanks, John.
On Thu, Mar 5, 2015 at 2:39 PM, John Maddock <jz.maddock@googlemail.com> wrote:
First off, I notice there are no examples for generating floating point values in Boost.Random, so maybe what follows is based on a misunderstanding, or maybe not...
Lets say I generate values in [0,1] like so:
boost::random::mt19937 engine; boost::random::uniform_01<boost::random::mt19937, FPT> d(engine);
FPT d = d(); //etc
Where FPT is some floating point type.
Now my concern is that we're taking a 32-bit random integer and "stretching" it to a floating point type with rather more bits (53 for a double, maybe 113 for a long double, even more in the multi-precision world). So quantization effects will mean that there are many values which can never be generated.
It's true that I could use independent_bits_engine to gang together multiple random values and then pass that to uniform_01, however that supposes we have an unsigned integer type available with enough bits. cpp_int from boost.multiprecision would do it, and this does work, but the conversions involved aren't particularly cheap. It occurs to me that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case.
So I guess my questions are:
Am I worrying unnecessarily? and What is best practice in this area anyway?
Thanks, John.
I've worried about this in the past, but I've accepted that using a 64 bit integer engine instead of a 32 is good enough. A 64 bit engine reasonably saturates 64 bit float conversions, and having 2^-64 probability resolution is practically enough when computing statistics on large number of random draws (1 trillion draws<< 2^64) When using floating point random numbers there are a two main error sources: * the finite resolution of the probability engine -e.g. 32 bits in your example-. This determines the number of different random values you can generate. * but also the non linearity in the float representation. This determines the number of individual values you can generate in a small interval. E.g. there are many more float values close to zero then close to 1 when you convert the mt19937 integers to floats the interval U01. Since most statistical computations involve floating point computations so you'll have type 2) issues anyway. In that respect I would find it theoretically interesting (but I don't actually need it) to have random floating point numbers with a fixed exponent. That would remove the non-linearity of the float representation.
AMDG On 03/05/2015 06:39 AM, John Maddock wrote:
First off, I notice there are no examples for generating floating point values in Boost.Random, so maybe what follows is based on a misunderstanding, or maybe not...
Lets say I generate values in [0,1] like so:
boost::random::mt19937 engine; boost::random::uniform_01<boost::random::mt19937, FPT> d(engine);
FPT d = d(); //etc
Where FPT is some floating point type.
<aside> You're using the old interface to uniform_01 here, which is deprecated because it is inconsistent with the rest of the distributions. </aside>
Now my concern is that we're taking a 32-bit random integer and "stretching" it to a floating point type with rather more bits (53 for a double, maybe 113 for a long double, even more in the multi-precision world). So quantization effects will mean that there are many values which can never be generated.
It's true that I could use independent_bits_engine to gang together multiple random values and then pass that to uniform_01, however that supposes we have an unsigned integer type available with enough bits. cpp_int from boost.multiprecision would do it, and this does work, but the conversions involved aren't particularly cheap. It occurs to me that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case.
It's called generate_canonical.
So I guess my questions are:
Am I worrying unnecessarily? and
I don't think so. I haven't worried about it much because, as Thijs points out, using a 64-bit engine works well enough for float and double, which accounts for most use cases. For multiprecision, it could be an issue.
What is best practice in this area anyway?
I really don't know. In Christ, Steven Watanabe
<aside> You're using the old interface to uniform_01 here, which is deprecated because it is inconsistent with the rest of the distributions. </aside>
Understood. However there's nothing in the docs to say it's deprecated. In any case I could have picked any real-valued distribution for the example.
Now my concern is that we're taking a 32-bit random integer and "stretching" it to a floating point type with rather more bits (53 for a double, maybe 113 for a long double, even more in the multi-precision world). So quantization effects will mean that there are many values which can never be generated.
It's true that I could use independent_bits_engine to gang together multiple random values and then pass that to uniform_01, however that supposes we have an unsigned integer type available with enough bits. cpp_int from boost.multiprecision would do it, and this does work, but the conversions involved aren't particularly cheap. It occurs to me that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case.
It's called generate_canonical. Ah, good.
However, I don't see it in the docs anywhere? Ah... it's not listed in the docs Jamfile so it's not built in. My guess is that no one else has noticed it either?
So I guess my questions are:
Am I worrying unnecessarily? and I don't think so. I haven't worried about it much because, as Thijs points out, using a 64-bit engine works well enough for float and double, which accounts for most use cases. For multiprecision, it could be an issue.
What is best practice in this area anyway?
I really don't know.
Looks like "use generate_canonical" might be the answer? John.
On 05 Mar 2015, at 19:01, John Maddock <jz.maddock@googlemail.com> wrote:
It occurs to me
that an equivalent to independent_bit_engine but for floating point types could be much more efficient - especially in the binary floating point case. It's called generate_canonical. Ah, good.
However, I don't see it in the docs anywhere?
Ah... it's not listed in the docs Jamfile so it's not built in. My guess is that no one else has noticed it either?
There is also a std::generate_canonical in the C++11 standard, those descriptions and specs might also help. Eg http://en.cppreference.com/w/cpp/numeric/random/generate_canonical
There is also a std::generate_canonical in the C++11 standard, those descriptions and specs might also help.
Eg http://en.cppreference.com/w/cpp/numeric/random/generate_canonical
For sure, but first you have to realise that Boost.Random supports it ;-) shuffle_output is also omitted from the docs build BTW. John.
AMDG On 03/05/2015 11:34 AM, John Maddock wrote:
shuffle_output is also omitted from the docs build BTW.
shuffle_output is deliberately omitted. The C++11 name is shuffle_order_engine. In Christ, Steven Watanabe
AMDG On 03/05/2015 11:01 AM, John Maddock wrote:
<aside> You're using the old interface to uniform_01 here, which is deprecated because it is inconsistent with the rest of the distributions. </aside>
Understood. However there's nothing in the docs to say it's deprecated.
Nothing in the documentation indicates that this works at all.
In any case I could have picked any real-valued distribution for the example.
It's called generate_canonical.
Ah, good.
However, I don't see it in the docs anywhere?
Ah... it's not listed in the docs Jamfile so it's not built in. My guess is that no one else has noticed it either?
Oops. I even wrote a doxygen comment. In Christ, Steven Watanabe
<aside> You're using the old interface to uniform_01 here, which is deprecated because it is inconsistent with the rest of the distributions. </aside> Understood. However there's nothing in the docs to say it's deprecated.
Nothing in the documentation indicates that this works at all.
Sorry, I don't understand - the class is documented, so why wouldn't it work? John.
AMDG On 03/05/2015 11:57 AM, John Maddock wrote:
<aside> You're using the old interface to uniform_01 here, which is deprecated because it is inconsistent with the rest of the distributions. </aside> Understood. However there's nothing in the docs to say it's deprecated.
Nothing in the documentation indicates that this works at all.
Sorry, I don't understand - the class is documented, so why wouldn't it work?
There's nothing wrong with using uniform_01, but the declaration is: template<typename RealType = double> class uniform_01; not template<typename Engine, typename RealType> class uniform_01; which is what you used. In Christ, Steven Watanabe
There's nothing wrong with using uniform_01, but the declaration is:
template<typename RealType = double> class uniform_01;
not
template<typename Engine, typename RealType> class uniform_01;
which is what you used.
That's what the docs say, but the "new" form is only dispatched to for native floating point types... there was also a discrepancy between what intellisense said and what the docs said. But we're digressing.. John.
AMDG On 03/05/2015 12:40 PM, John Maddock wrote:
There's nothing wrong with using uniform_01, but the declaration is:
template<typename RealType = double> class uniform_01;
not
template<typename Engine, typename RealType> class uniform_01;
which is what you used.
That's what the docs say, but the "new" form is only dispatched to for native floating point types...
Aha. That's definitely a bug. When I wrote the dispatching code originally, I assumed that it would only have to work with builtin types. In Christ, Steven Watanabe
AMDG On 03/05/2015 12:40 PM, John Maddock wrote:
There's nothing wrong with using uniform_01, but the declaration is:
template<typename RealType = double> class uniform_01;
not
template<typename Engine, typename RealType> class uniform_01;
which is what you used.
That's what the docs say, but the "new" form is only dispatched to for native floating point types... there was also a discrepancy between what intellisense said and what the docs said. But we're digressing..
Perhaps it's time to retire the dispatching entirely. It's been quite a few years since I updated it, and the original form was practically impossible to use correctly in the first place. In Christ, Steven Watanabe
participants (4)
-
John Maddock
-
Steven Watanabe
-
Thijs (M.A.) van den Berg
-
Thijs van den Berg