boost::random uniform_int bias and speed

Looking inside boost::random, I noticed a few things that I think could be better, and since I could not find mentions about those in the lists's archive, I'll ask here. 1. the case : brange / _range <= 4 Is implemented by rejection, directly. But the average number of rejects could be decreased (by a factor up to 4) by rescaling before rejection. i.e. (pseudo code) : .k = (brange+1) / (_range+1) .rejection_thresh = k * (_range+1) .keep generating random number until result < rejection_thresh .return result/k; the probabilty of a reject is then less than _range/brange, and also always less than 1/2, and can reach 0 (if (brange+1) is a multiple of (_range+1)) Was this possible speedup considered useless ? 2. the case brange / _range > 4 since direct rejection could mean very very many retries in this case, this case is dealt with uniform_smallint (which always uses exactly one number generation). But by using previous rejection scheme, there is no need for this case since the average number of needed number generations will be fairly close to 1, and the generator's speed will be equivalent to uniform_smallint. Plus, using a rejection scheme would completely avoid any added bias (contrary to uniform_smallint in which the added bias can reach 1/32, thus unsuitable for high-precision statistics programs) Why not use such a "rescaled" rejection scheme as I describe rather than uniform_smallint's possibly biased computation ? -- Samuel

le Monday 23 August 2004 15:35, krempp@crans.ens-cachan.fr écrivit :
Plus, using a rejection scheme would completely avoid any added bias (contrary to uniform_smallint in which the added bias can reach 1/32, thus
To be more precise : I meant one number can be 1/32 times more probable than some other, and that would be 1/32/range bias. (in practice, the biggest bias is 0.52%, for a range of 3) -- Samuel

le Monday 23 August 2004 15:35, krempp@crans.ens-cachan.fr écrivit :
Plus, using a rejection scheme would completely avoid any added bias (contrary to uniform_smallint in which the added bias can reach 1/32, thus unsuitable for high-precision statistics programs)
the bias of uniform_int is definitely a problem. With current uniform_int using small_int, the observed frequencies on random numbers in {0..2} based on mt1993b are about as much biased as I expected (~0.5%) : frequence [0] : 0.335951 frequence [1] : 0.331972 frequence [2] : 0.332077 These are frequencies over 100e6 iterations, and are signficantly consistent (at least 3 digits have already converged). With proposed change, the same test yields : frequence [0] : 0.333284 frequence [1] : 0.333346 frequence [2] : 0.33337 There is now almost 100 times less bias. (and it will converge to 0 as we increase the number of iterations) I attach the patch. I can commit it and the other one I sent about the overflow bug if there's no objection on those patches. -- Samuel
participants (1)
-
Samuel Krempp