[BGL] Revisiting Parallel BGL

Hi, I raised the issue of parallel BFS not working correctly here http://lists.boost.org/boost-build/2009/10/22631.php. The issue remained unresolved. I revisited it for last two days and the incorrectness in likelihood is probably due to initialization of distance property map for all non-start vertex to be infinity. Hope this helps. Thanks Sandeep

I have uploaded Niederreiter's base 2 (NB2) low discrepancy sequence generator (as described in Bratley, Fox, Niederreiter, ACM Trans. Model. Comp. Sim. 2, 195 (1992)) to Boost Vault. Low discrepancy sequences are particularly important for quasi random Monte Carlo integration approximations, because they enable algorithms to converge approx. the order of magnitude faster. The implementation is tested against GSL (GNU Scientific Library), and generates identical uniform [0, 1) output for all dimensions up to 12 (GSL does not support more dimensions). The implementation, however, supports up to 20 dimensions. Some notes: 1) Initialization is pretty fast. To initialize NB2 state is a lot of work, and some of it is alleviated by (I hope) judicious use of metaprogramming. I did not measure really rigorously, but the impression is that initialization is about twice as fast. 2) I am not exactly a Boost.Random guru, so I probably missed some conceptual requirements, but it works just fine with uniform_01 template (see dim2demo.cpp). IMO Boost.Random (and a lot of other Boost libraries) is one of the best things after the sliced bread, but it sorely lacks some low discrepancy sequence generator, be it Niederreiter's, Sobel's or something else equally effective. J.

AMDG Justinas V. D. wrote:
For validation, Boost.Random uses the 10,000th value produced by a default constructed generator.
This is fine. Generators should not generally be initialized very often.
A few things that you could add a) non-default seeding. b) stream operators.
Would you be willing to write documentation for this as well? In Christ, Steven Watanabe

2010.03.25 19:50, Steven Watanabe rašė:
AMDG Ad maiorem
For validation, Boost.Random uses the 10,000th value produced by a default constructed generator.
It certainly matches.
Agreed.
Non default seeding is an interesting question in this case, because it is not really clear whether this means "allow the user to initialize the lattice as she wishes", because in this case we would (potentially) lose quasirandomness, which is the whole point of this generator, or does this mean "allow the user to initialize the current state"? This could probably be used to fast-forward to the desired state, but I am not sure that semantically this is the same as "seeding". I think the latter would be The Right Thing in case of streaming.
Would you be willing to write documentation for this as well?
Certainly. I think I will come up with something by the next week. J.

AMDG Justinas V. D. wrote:
Okay.
Would you be willing to write documentation for this as well?
Certainly. I think I will come up with something by the next week.
If you could add doxygen comments to the header, that would be great. There should probably also be some general documentation of quasi-random sequences. There isn't currently a concept for quasi-random number generators. The source for the documentation is at http://svn.boost.org/svn/boost/trunk/libs/random/doc You can view the current generated html at http://boost-sandbox.sourceforge.net/doc/html/boost_random.html In Christ, Steven Watanabe

2010.03.25 23:14, Steven Watanabe rašė:
I have uploaded the new version (v2) to Boost Vault. Does the implementation converge to the desirable state?
Would you be willing to write documentation for this as well?
I will tackle the documentation now. As everybody knows, it is the hardest part of the code to write. :-) Especially in this case, having in mind that the BoostBook format is something new altogether, but it seems grokable. J.

AMDG Justinas V. D. wrote:
I didn't look at it very closely, but I have a few notes: * bit_count. This appears to be equivalent to std::numeric_limits::digits? * validation is eventually going to die, since it isn't in the standard. It will be moved to the tests, since that's the only place that it is used. * I'm not sure that the extra constructor/seed overloads are a good idea. (Yes, I know I asked for it, but it seems to expose the implementation details, and I don't think it's all that useful without a corresponding way to get the state. The rationale that you gave for only having the default constructor seems reasonable.)
There's a fairly decent description of what you need to get the toolchain set up at https://svn.boost.org/trac/boost/wiki/BoostDocs/GettingStarted In Christ, Steven Watanabe

2010.03.29 18:59, Steven Watanabe rašė:
Yes, thank you! I will remove the bit_count<T> cruft.
Yes, will remove them. I do imagine, however, that the user might want to skip first N vectors in R^Dimension. Not sure if that merits a special constructor, because its easy to write for( int i = 0; i < N*Dimension; ++i ) gen(); // skipping values and specific implementation would not be significantly faster.
Reading up. J.

2010.03.29 18:59, Steven Watanabe rašė:
The interface has settled down and it even looks fine, i.e., I am finally happy with it. Many thanks to you, your comments were enormously useful. The last version is uploaded to Vault. (No doxygen comments yet.)
Would you be willing to write documentation for this as well?
I have written the concept part. J.

AMDG Justinas V. D. wrote:
* You don't need to #include <iostream> (I know the current Boost.Random headers do this, but istream and ostream are enough.) * It might be clearer if you encoded the primitive polynomials in hex instead of decimal. * Is there any hope that the algorithm would work with a 64 bit integer type? I noticed that you're using uint_fast64_t in places, and that makes me wonder whether you're assuming a 32-bit integer type, even though it's templated.
Would you be willing to write documentation for this as well? I have written the concept part.
* The link that says quasi-random number generator should not point to the Pseudo-Random number Generator section. (i.e. don't use the prng template) * Is it necessarily correct for QuasiRandomNumberGenerator to be a refinement of PseudoRandomNumberGenerator? For example, the new standard requires several specific seed overloads, which don't make sense for a QuasiRandomNumberGenerator. * I don't like having two ways to get the same information, i.e. dim and dimension. This is probably a dumb question, but, does the dimension of a quasi-random number generator need to be provided? The number of dimensions of equidistribution is also a property of PRNGs, but they don't explicitly provide it. * Can you define "low-discrepency sequence" when you use it? In Christ, Steven Watanabe

2010.03.31 01:22, Steven Watanabe rašė:
I will change that.
* It might be clearer if you encoded the primitive polynomials in hex instead of decimal.
I don't know whether that would be more clear. Decimals show nice prime property which would be obscured by hex.
* Is there any hope that the algorithm would work with a 64 bit integer type?
No. Assume P and Q are polynomials. deg(P*Q) = deg(P) + deg(Q), therefore maximum degree in 20th dimension for a polynomial product would be (with signed 64 bit type) ((63 / 6) + 1) * 6 = 66. It is frustrating, but it falls just outside the range of a 64 bit int type, and would require a 66 bit int to encode. I could STATIC_ASSERT that.
Yes, you are right. The other question would be how would we check the returned random numbers if all libraries support only 32 bit ints?
Right.
Could you, please, provide a link to the exact PRGN interface that is demanded by the C++ standard?
Yes, it is immensely important. Specific things like a uniform point generation on a sphere using a chord model would require a quasi random number generator in 4th dimension. In general quasi-Monte Carlo methods formulated to be bound by a [0,1]^s unit hypercube, and dimensionality is an intrinsic part of the generator (and a problem which we have to solve). To put it simply -- you just cannot plug-in whichever s-dimensional generator you want. X::dimension and X::dim() are pretty much in the vein with X::max_value and X::max().
* Can you define "low-discrepency sequence" when you use it?
Uh... Don't people have to know the definitions before they use something? :-) Something like this? Low discrepancy sequence is a sequence that necessarily has the property that for all values of N, its subsequence x1, ..., xN has a low discrepancy. Roughly speaking, this means that if we have 2D low discrepancy sequence of points falling in a square, this square would be covered uniformly, and the number of points falling to any part of the square, would be proportional to the whole square, i.e., we do not have such a situation when random points lump together or space out. (This is a very trivial wording that leaves out Lebesgue measures and so on.) J.

AMDG Justinas V. D. wrote:
Okay.
Okay. It isn't necessarily wrong to have limitations on the allowed types, but these limitations need to be documented and checked.
Very carefully. The first library to implement anything can't rely on testing against others... I think you would just have to generate lots and lots of random numbers and compare the discrepancy to the expected discrepancy. This test should probably be written anyway.
The current working draft of the standard is http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3090.pdf Note that this is not official yet.
I guess here's what makes me uncomfortable. First of all, I don't really like having min_value and max_value either. I think they seemed like a good idea at the time, but nothing in the library uses them, and they just add extra complications, because of the has_static_min dance. Your description of dimension makes it sound as though it isn't required. If it isn't required, then unless there's a way to detect its presence, it's completely useless in generic code and should not be part of the concept. If it is required, then there's no reason to have dim as well. If it's not required but detectable, we have the same interface mess as for min_value/max_value.
I think it would be enough if you conveyed the intuitive idea that a low discrepancy sequence is more evenly distributed than a truly random sequence would be. In Christ, Steven Watanabe

2010.03.31 19:41, Steven Watanabe rašė:
AMDG
[skipped]
I have modified the lattice generation code to use std::bitset. Works just fine, so we don't have limitations on IntType anymore. So much more power to users. :-)
As of now, I have the test suite to check the generated values against the values produced by GSL. It checks THE WHOLE range, i.e., 2^31 - 1 values for each dimension. It is probably an overkill, but we, mathematicians, are never happy with just 10001th value, simply because it might be a coincidence, a fluke, however unlikely it would be. But this "unlikely" would require very careful exploration just to answer the question what is the exact probability for two finite state automatons to have the same 10001th state (I hasten to add that this probability will NOT be equal to 0). Speaking of other possible niederreiter_base2 typedefs, I don't really believe that we have to ensure that it produces proper quasi-random values in ALL cases simply because there is a myriad of ways to instantiate that template. Users have to know what they are doing and thus responsibility is all theirs.
Aha. So I have added the discard member function, deleted those constants (min_value, max_value, etc.). Seems like we are good to go, except for the missing member function template<class Sseq> void seed(Sseq& q) and the matching constructor. My previous take that made you uneasy about the exposed internal data was something similar to this, but I don't think I will reimplement those until we have SeedSequence concept in Boost.Random. I can see the value of such members. They would allow easier experiments with seeding and would be left to experts only. [skipped]
The constant is not required because dimensionality can always be detected by those templates that expect a dimensional random number number generator. Dimensionality itself is very important and is required. I can imagine algorithms that want to change the underlying result_type but would want a generator that has specific dimension. Anyway, I removed X::dimension constant and spelled X::dim() as X::dimension(). [skipped]
I have updated concepts.qbk (diff is in Vault). What do you think? J.

AMDG Justinas V. D. wrote:
Cool.
Of course, this assumes that GSL is correct... Anyway, testing against GSL is impractical in the Boost.Random regression tests. I'll reply to the rest of your post when I find some time. In Christ, Steven Watanabe

AMDG Justinas V. D. wrote:
What exactly would this constructor do? It looks to me as though, if it jest set the initial state, then (for a one-dimensional generator) the effective result would be as though you xor'ed the initial state with the normal values of the generator.
Anyway, I removed X::dimension constant and spelled X::dim() as X::dimension().
Okay.
I've tweaked the working a little bit. The resulting patch is attached. Also, if you use LaTex in the Doxygen comments, enclose it in \f$\f$ so it will be processed. If you want macros to be expanded by doxygen, you need to add them to the predefined macros section of the doxyfile. The following test case breaks because min and max are not defined correctly #include <boost/random/niederreiter_base2.hpp> #define BOOST_TEST_MAIN #include <boost/test/unit_test.hpp> BOOST_AUTO_TEST_CASE(test_niederreiter_base2_1d) { boost::niederreiter_base2_1d gen; for(int i = 0; i < 100; ++i) { int val = gen(); BOOST_CHECK_GE(val, (gen.min)()); BOOST_CHECK_LE(val, (gen.max)()); } } Finally, since the algorithm does a lot of bit-twiddling, would uint32_t be more appropriate than int? In Christ, Steven Watanabe

2010.04.09 23:46, Steven Watanabe rašė:
This could used, for example, to initialize nextq to specific values that nextq would be unlikely to acquire otherwise. This scenario is admittedly a little bit contrived.
Patched version in Vault. The fix was very trivial.
Finally, since the algorithm does a lot of bit-twiddling, would uint32_t be more appropriate than int?
You mean those typedefs of niederreiter_base2? Does not really matter. You could, for example instantiate template class niederreiter_base2 like this: niederreiter_base2<long, D, 1, (1 << 63)> gen; [1] Scaling down to the [0, 1]^D unit hypercube you would get more precision this way. (Of course, you could use uint32_t, too, and the typedef would look like this: niederreiter_base2<uint32_t, D, 1, (1U << 32) /*!*/> [2], but g++ complains about left shift count >= width of type, and spits out a warning in the instantiation site.) I think this option could be best left to users, who know what they want. Speaking of discard member function, I improved it in a way that makes more sense to me. According to the draft, discard(z) ought to behave as if z consecutive operator() invocations were executed. I refined this to be Dimension*z, so discard(z) now actually discards z consecutive s-dimensional vectors (s = Dimension). In 1-dimensional case the behaviour is the same as dictated by the standard. J. [1] On x86_64 platform long has 64 bits. [2] Tested with long and uint32_t, too, and aforementioned examples would produce the exact same sequence as int instantiation (scaled down to [0, 1]^D with uniform_real), the only difference being precision of division.

AMDG Justinas V. D. wrote:
I guess my point was that since the sequences produced with different seeds will be strongly correlated, I don't see that it really matters. On the other hand, this behavior should make it pretty easy to show that you'll still get a low-discrepency sequence if you use a different seed, so at least it's safe. (I'm not absolutely certain that this is true, since I haven't worked through the math)
long is not guaranteed to be 64 bits (I know that it isn't on some systems). int is not guaranteed to be 32 bits, either. If you rely on the number of bits, then you should use uint32_t or int32_t.
I do find it odd that you use (1 << 31) which is a negative number. I'd prefer it if you used <uint32_t, D, 1, ((uint32_t)1 << 31)>
I'm not sure that this is a good idea. If you want discard to operate on a whole vector, then shouldn't operator() return a whole vector too? I'm guessing that returning a vector from operator() would make using the generator more inconvenient. In Christ, Steven Watanabe

2010.04.11 17:44, Steven Watanabe rašė:
Done. You were right, as usual. I distinctly remember int being a 16 bit type in the times of old, when x286 reigned supreme. Patched and updated.
Yes, all those are very valid points. My rationale behind vector-discarding version was that I can do that in O(1), whereas consecutive operator() invocations unnecessarily make it a O(n) operation. Nevertheless, with a bit of twiddling it is possible to achieve O(1) for z consecutive elements, too (already done). So now we have as is told in the Draft: X u = v; for( int i = 0; i < N; ++i ) u(); v.discard(N); assert( u() == v() ); Cheers, Justinas
participants (3)
-
Justinas V. D.
-
Sandeep Gupta
-
Steven Watanabe