AMDG On 04/19/2014 04:35 PM, Thijs van den Berg wrote:
What’s your view on limiting the round to <=20 for the template
I still don't like it. If all else fails, you can use the optimized version for rounds <= 20, and the slow version for rounds > 20.
and providing only the 20 round as a typedef?
I'd favor providing both. There are plenty of inferior algorithms in Boost.Random. I would anticipate that anyone who wants to use an algorithm other than mt19937 would have some idea of the tradeoffs.
I have addressed most other points you’ve mentioned, but the performance issue of a generic rounds version has failed me.
In theory it could be optimized. What compiler and optimization settings are you using? In particular, are you using -funroll-loops (GCC)? The version you show unrolls the loop 4x. What if you unroll 8x and kill the constant arrays? What about 20x and eliminate the % 5 in the key addition? Or 40x and eliminate both? In Christ, Steven Watanabe