On 20 Apr 2014, at 01:39, Steven Watanabe
AMDG
On 04/19/2014 04:35 PM, Thijs van den Berg wrote:
What’s your view on limiting the round to <=20 for the template
I still don't like it. If all else fails, you can use the optimized version for rounds <= 20, and the slow version for rounds > 20. Ok, that’s a good last resort.
and providing only the 20 round as a typedef?
I'd favor providing both. There are plenty of inferior algorithms in Boost.Random. I would anticipate that anyone who wants to use an algorithm other than mt19937 would have some idea of the tradeoffs.
Ok.
I have addressed most other points you’ve mentioned, but the performance issue of a generic rounds version has failed me.
In theory it could be optimized. What compiler and optimization settings are you using? In particular, are you using -funroll-loops (GCC)? The version you show unrolls the loop 4x. What if you unroll 8x and kill the constant arrays? What about 20x and eliminate the % 5 in the key addition? Or 40x and eliminate both?
Yes, good ideas. I think I’m going to make a template version that eliminated array indices computations. Right now I was using the default build of the performance tool in random with Clang-503.0.38 on apple dadrwin. I’m now compile and time in manually. One thing I found earlier is that the compiler recognises rotl only with -O3. However, that’s a different issues than the loop/ un-rolled version.
In Christ, Steven Watanabe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost