Re: [boost] [random] new threefry random engine

6 May 2014

      There's a new version at:

https://github.com/DEShawResearch/Random123-Boost

that I think addresses your concerns.  Click on the link to see the new
README.

On Fri, May 2, 2014 at 11:43 AM, Steven Watanabe <watanabesj@gmail.com>wrote:
...
AMDG
On 04/29/2014 10:00 AM, John Salmon wrote:
...
I've dusted off the code I wrote a couple of years ago and republished
it,
...
now on github:
https://github.com/DEShawResearch/Random123-Boost

...
...
<snip>
CBRNGs overcome all these problems.  With CBRNGs, the code looks like
this (see libs/random/examples/counter_based_example.cpp for a fully
worked example):
using namespace boost::random;
    typedef threefry<4, uint32_t> Prf;
    normal_distribution nd;
    Prf::key_type key = {seed};
    Prf prf(key);  // seed may come from command line
    for(size_t i=0; i<atoms.size(); ++i){
        float boltzmannfactor = sqrt(kT/atoms[i].mass);
        Prf::domain_type d = {atoms[i].id, timestep, THERMALIZE_CTXT};
        counter_based_urng<Prf> cbrng(prf, d);
        nd.reset();
        atoms[i].vx = boltzmannfactor*nd(cbrng);
        atoms[i].vy = boltzmannfactor*nd(cbrng);
        atoms[i].vz = boltzmannfactor*nd(cbrng);
    }
Is there a good reason why this shouldn't be written as:
for(...) {
  ...
  std::seed_seq s{ seed, atoms[i].id, timestep, THERMALIZE_CTXT };
  counter_based_engine<Prf> cbrng(s);
  ...
}
Seed_seq is the wrong tool for the job.  It's a middleman who charges a
hefty commission (time and space) and then delivers your product slightly
damaged.  You give it values that you have carefully chosen so that they're
*guaranteed* to be unique.  Seed_seq then transforms them into values that
are *probably* unique, but there are no guarantees.  If you use it a lot,
the Birthday Paradox will eat away at that "probably" and  after a few
billion uses you'll be in for a nasty surprise:  two of your generators
will produce exactly the same output.

Seed_seq is an attempt to work around the problem that many generators (and
some of the most popular ones) are difficult to initialize in a way that
results in independent streams.  The CBRNGs are designed to give you
independent streams without the overhead or risk of a seed_seq.

I think you'll like the new example code better.  From the new README:

    typedef threefry<4, uint32_t> Prf;
    counter_based_engine<Prf, 32> cbeng(seed);
    for(size_t i=0; i<Natoms; ++i){
        float boltzmannfactor = sqrt(kT/atoms[i].mass);
        normal_distribution nd(0., rmsvelocity);
        Prf::domain_type base = {atoms[i].id, timestep, THERMALIZE_CTXT};
        cbeng.restart(base);
        atoms[i].vx = boltzmannfactor*nd(cbeng);
        atoms[i].vy = boltzmannfactor*nd(cbeng);
        atoms[i].vz = boltzmannfactor*nd(cbeng);
    }

You can seed the counter_based_engine directly with an arithmetic seed, or,
if you prefer,
you can use the the much larger key_type, or you can pre-initialize a Prf
and use that.

In addition to seed()-ing (which meets all the standard expectations),
counter_based_engine
can be 'restart()-ed' with a new 'base counter'.  This is how you associate
different, independent
sequences with whatever program elements you like:  threads, atoms,
whatever.
...
I don't really like making the details of the
algorithm (i.e. key and domain) part of the
primary interface.
I think it's important that they be available, but in the new version, you
don't have to use them if you don't want to.   A user *may* limit himself
to using standard seeding methods and avoid restart().  Their one
remaining advantage is that they offer constant-time discard().  The
additional features that rely on the details of the underlying
Prf are entirely optional.
...
Let's consider the code changes between the two fragments:
...
<snip>
- Since it models a URNG, cbrng can be passed as an argument to the
normal_distribution, nd.  In order to make each atom independent,
nd.reset()
is called each time through the loop.
<snip>
FWIW, I don't really think that reset was a good idea
in the first place, and it's currently a no-op in all
distributions.  To run the loop in parallel, you'd need
to make a copy of nd in every thread, anyway.
Fair enough.  In the example, I've moved the normal distribution
into the loop and removed reset().
...
...
counter_based_urng
------------------
The counter_based_urng class uses the pseudo-random property of PRFs
to perform random number generation in accordance with the
requirements of a UniformRandomNumberGenerator.  <snip>
counter_based_engine
--------------------
The counter_based_engine class is a templated adapter class that
models a bona fide RandomNumberEngine.  <snip>
Is there really a good reason to make two separate
templates?  A RandomNumberEngine is a URNG, by definition.
There was, but not any more.  There is now only a counter_based_engine.
It takes a template parameter that determines the length of individual
sequences and the number of permissible 'base counters'.
It has a 'restart()' method that resets the 'base counter'
to begin a new sequence.  And it has a few constructor() and
seed() overloads that extend the capabilities of a generic Engine.