On Sun, Apr 27, 2014 at 10:24 AM, Mathias Gaunard <
mathias.gaunard@ens-lyon.org> wrote:
On 27/04/14 14:58, Thijs (M.A.) van den Berg wrote:
You would still need to split the range [0,n) into non-overlapping
subsets per thread.
That's what the parallelization of the algorithm does.
(In the case in the previous mail, it's done automatically by OpenMP)
You will need some unique id per thread -either a thread id, or some
static counter that threads use when they are initialized- to do that. That
unique id can also be used to seed regular random engines, so that is to me
not a big addition.
That would defeat the point IMO.
I want to be able to write parallel code involving random number
generation where the result does not depend at all of how the code is
parallelized.
Counter-based random number generators can compute the i-th random number
in O(1) instead of the classical O(n).
That means that for a given generator with a given seed, I can compute the
i-th random number and the j-th random number in parallel. That's what I
want to do. I don't want per-thread seeds.
I've dusted off the code I wrote a couple of years ago and republished it,
now on github:
https://github.com/DEShawResearch/Random123-Boost
I've tried to respond to the discussion in this thread. In particular, the
README, and the example now emphasize CBRNG's small footprint and fast
initialization and use with Distributions. The primary example
demonstrates how to use Distributions in a thread-agnostic way.
I also removed the functions related to AES because I felt they added a lot
of clutter and introduced some technical issues (endian questions,
hardware-specific instructions and intrinsics) that would be distractions
at this point. We can certainly revisit them in the future, if and when we
agree on a framework.
Here's the new README:
Random123-Boost
===============
Proposed Random123 functions for Boost.Random
The purpose of this source tree is to develop a new family of
"Counter Based Uniform Random Number Generators" (CBURNGs) for the
Boost.Random library. CBURNGs were introduced in the paper, "Parallel
Random Numbers -- As Easy as 1, 2, 3", by Salmon, Moraes, Dror & Shaw,
which won the Best Paper award at the SC'11 conference:
http://dl.acm.org/citation.cfm?doid=2063405
also available at
http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
Conventional generators (such as those in Boost.Random or the C++11
<random> library) are large and difficult to initialize. Boost's
documentation explicitly advises against frequent initialization, so
common practice is to serialize access through a single global
generator, or, in a parallel program, to instantiate one generator
per-thread.
Unlike conventional generators CBURNGs require very little storage and
are designed to be created and destroyed frequently. If created and
destroyed in an "inner loop", a good compiler can often keep their
state entirely in registers and can generate random values with a few
dozen inlined instructions. In addition, they have extremely fast,
constant-time implementations of 'discard()', which allows callers to
easily "leapfrog" through a logical stream of random numbers.
These features make them ideal for parallel computation.
Consider the following program fragment, using a conventional
RandomNumberEngine, mt19937:
using namespace boost::random;
mt_19937 rng(seed); // seed may come from the command line
normal_distribution nd;
for(size_t i=0; i Prf;
normal_distribution nd;
Prf::key_type key = {seed};
Prf prf(key); // seed may come from command line
for(size_t i=0; i
philox<2, uint64_t>
All PRFs have a key_type, a domain_type, and a range_type, all
of which are boost::arrays of the underlying value type. I.e.,
threefry::key_type = boost::array
domain_type = boost::array
range_type = boost::array
philox::key_type = boost::array
domain_type = boost::array
range_type = boost::array
For threefry and philox, initialization is extremely fast. For other
PRFs, e.g., the cryptographic AES function (not implemented), there
may be non-trivial computation associated with initialization, so
initializing PRFs is discouraged in *inner* loops, but is perfectly
reasonable at thread-scope or anywhere else that a few dozen
invocations of the generator will amortize the initialization cost.
Threefry and Philox are "pseudo-random", meaning that the outputs from
any set of inputs are practically indistiguishable from random. In
particular, one can obtain apparently random output simply by
"counting" through inputs in an entirely regular way. Strong
empirical evidence is presented in the SC11 paper for the statistical
quality of the threefry and philox functions. Furthermore, the
pseudo-random functions obtained with different keys are shown to be
statistically independent, even if the keys differ by only a single
bit or follow regular patterns. Among other things, threefry and
philox pass the entire BigCrush suite of tests.
counter_based_urng
------------------
The counter_based_urng class uses the pseudo-random property of PRFs
to perform random number generation in accordance with the
requirements of a UniformRandomNumberGenerator. It reserves some
number (by default, all) of most-significant bits of the highest-index
member of the domain_type array for its own internal use as a
"counter". It is an error for the domain_type constructor argument to
have non-zero bits in this range. Whenever new random values are
required, the "counter bits" are incremented and the PRF is called,
generating a new set of random values that will be returned by
counter_based_urng. It is an error to request random values after the
counter is exhausted. Thus, counter_based_urngs will typically have
fairly "short" sequence lengths (anything from 4 up to 2^64). This is
usually more than sufficient to provide input to one or more
Distributions, which generally call their engine a non-deterministic,
but usually small number of times. On the other hand, it is cheap and
efficient to create huge numbers (2^64 or more) of independent
counter_based_urngs on demand.
counter_based_engine
--------------------
The counter_based_engine class is a templated adapter class that
models a bona fide RandomNumberEngine. In particular, it is Seedable
in the same way as other Engines, so it can be used in any program
that expects a RandomNumberEngine. The counter_based_engine offers
two very useful properties:
1 - it has a very small size, and a very fast constructor, so it is
practical to instantiate millions of them in a large parallel
application.
2 - the discard() function is very fast and runs in constant time.
Parallel programs can use this to "leapfrog" multiple sequences over
one another in different threads, or it can be used to initialize
generators in different threads with starting points that are
separated by enough to avoid overlap.