
Boost.Random and the C++11 standard library ship with a 24- and a 48-bit RANLUX pseudo-random number generator (PRNG). I propose to add 32- and 64-bit RANLUX PRNGs because they are at least 40% faster. Additionally, the 32-bit RANLUX PRNG can be implemented as follows: #include <boost/cstdint.hpp> #include <boost/random.hpp> namespace br = boost::random; typedef br::subtract_with_carry_engine<boost::uint32_t,32,3,17> ranlux32_base; typedef br::discard_block_engine<ranlux32_base, 389, 16> ranlux32; There is C++98 benchmark code at the end of this e-mail requiring only Boost and a C++ compiler. Below I will briefly discuss the theory behind RANLUX PRNGs, where the new generators are coming from, and why the 64-bit RANLUX PRNG requires (a few lines of) new code. Drawing from RANLUX PRNGs is a two-step process. In the first step a value is drawn from a "subtract-with-borrow" (SWB) generator implementing b-bit modular arithmetic [1]. Then, values are generously discarded yielding uniformly distributed values (uniform in a dynamical systems model) [2]. The RANLUX PRNGs using 24- and 48-bit arithmetic always gave me an itch so I looked into the motivation behind this. It turns out [3] that this restriction arose likely from a) computational limitations in the 1990s and b) Lüscher's interest in the generation of floating-point numbers (single-precision floating-point numbers have 24-bit significand). SWB PRNGs come in two flavors (Type I, Type II) and they make use of three parameters called base b, short lag s, and long lag r. Using modern prime factorization methods, one can quickly come up with the 32-bit PRNG above, a Type I SWB generator with b=2^32, r=17, and s=3. This one can be implemented as shown above. To generate 64-bit values, one should use a Type II SWB with b=2^64, r=62, and s=5 (all other 64-bit generators with b=2^64 and r < 100 will be awfully slow). What distinguishes the different types of SWBs is the recurrence for the computation of variates. For Type I it is x_n := (x_{n-s} - x_{n-r} + c_n) mod b, where c_n is a carry bit. For Type II it is x_n := (x_{n-r} - x_{n-s} + c_n) mod b. (Observe the order of long lang and short lag in these formulas.) Type II SWBs are not difficult to implement but it requires either a modification of `subtract_with_carry_engine` or a new class. Let me know what you think about this proposal. Christoph Conrads [1] G. Marsaglia, A. Zaman: "A New Class of Random Number Generators". 1991. URL: https://www.jstor.org/stable/2959748 [2] Martin Lüscher: "A Portable High-Quality Random Number Generator for Lattice Field Theory Simulations". 1994. URL: https://arxiv.org/abs/hep-lat/9309020 [3] Christoph Conrads: "Faster RANLUX Pseudo-Random Number Generators". 2019. URL: https://christoph-conrads.name/faster-ranlux-pseudo-random-number-generators // Build with: // c++ -Wextra -Wall -std=c++98 -pedantic benchmark.cpp -O3 \ // -march=native -DNDEBUG -lboost_chrono -lboost_system #include <boost/chrono.hpp> #include <boost/cstdint.hpp> #include <boost/random.hpp> #include <cassert> #include <cstdio> #include <limits> #include <string> namespace chrono = boost::chrono; typedef chrono::milliseconds milliseconds; typedef chrono::process_cpu_clock my_clock; namespace br = boost::random; typedef br::subtract_with_carry_engine<boost::uint32_t,32,3,17> ranlux32_base; typedef br::discard_block_engine<ranlux32_base, 389, 16> ranlux32; struct dummy_engine { typedef boost::uint32_t result_type; static result_type max() { return std::numeric_limits<result_type>::max(); } static result_type min() { return std::numeric_limits<result_type>::min(); } result_type operator() (); result_type state_; }; dummy_engine::result_type dummy_engine::operator() () { return state_++; } std::string get_name(const dummy_engine&) { return "dummy-prng"; } std::string get_name(const br::ranlux24&) { return "ranlux24"; } std::string get_name(const br::ranlux48&) { return "ranlux48"; } std::string get_name(const ranlux32&) { return "ranlux32";} template<typename Generator> typename Generator::result_type draw(Generator& gen) __attribute__((noinline)); template<typename Generator> typename Generator::result_type draw(Generator& gen) { return gen(); } template<typename Generator> void run(boost::uintmax_t num_draws) { assert(Generator::min() == 0); unsigned w = Generator::max() == std::numeric_limits<boost::uint64_t>::max()?8u: Generator::max() == (UINTMAX_C(1)<<48) - 1u ?6u: Generator::max() == std::numeric_limits<boost::uint32_t>::max()?4u: Generator::max() == (UINTMAX_C(1)<<24) - 1u ?3u: Generator::max() == std::numeric_limits<boost::uint16_t>::max()?2u: Generator::max() == std::numeric_limits<boost::uint8_t>::max() ?1u: 0u ; Generator gen; my_clock::time_point t_0 = my_clock::now(); for(uintmax_t i = 0; i < num_draws; ++i) draw(gen); my_clock::time_point t_1 = my_clock::now(); milliseconds t_ms = chrono::duration_cast<milliseconds>(t_1 - t_0); double bytes_per_sec = w * num_draws * 1000u/double(t_ms.count()); std::string name = get_name(gen); boost::uint16_t dummy = boost::uint16_t(gen()); std::printf( "%-26s | %10lu | %20.2e | %hu\n", name.c_str(), t_ms.count(), bytes_per_sec, dummy ); } int main() { boost::uintmax_t num_draws = UINTMAX_C(100000000); run<dummy_engine>(num_draws); run<br::ranlux24>(num_draws); run<ranlux32>(num_draws); run<br::ranlux48>(num_draws); }