Mersenne Twister 19937 suffers from "lattice effect"?

Hello, I tried to implement a two-dimensional multi-modal gaussian mixture and used the "mt19937" as the underlying generator. After I drew some samples (from the gaussian mixture) the outcome was not as I expected. After some bug tracking I suppose that the reason is something that is called "lattice effect" in German math literature. A pseudo-random generator suffers from a lattice effect, if one draws a sequence of samples, groups them together into N-dimensional vectors and these N-dimensional vectors are not uniformly distributed in the N-dimensional euclidean space. Does anybody know if the boost implementation has this problem? In the following, I will present my problem (and my algorithm) more elaborately, especially how I implemented my multi-model gaussian that is the origin of my problem: Lets assume we want a 2-dimensial gaussian mixture with M gaussian. 1) Create three lists (stl vector objects in my case), each with M entries. The first list stores the weight of each gaussian. The weights sum up to 1. The second list stores the mean value of each gaussian, i.e. a two-dimensional vector (in the mathematical sense, not the stl vector). The third list stores a 2-by-2 covariance matrix for each gaussian 2) Create a "mt19937" object, a "discrete_distribution" object and a "normal_distribution" object The "mt19937" is the common source of randomness for "discrete_distribution" and "normal_distribution". The "discrete_distribution" produces integers between 0 and M (number of gaussians) according the the given weights The "normal_distribution" is parameteized with mean = 0 and variance = 1, i.e. it is the standard normal distribtion. 3) Drawing one sample from the gaussian mixture - Draw one sample from the mt19937 and feed it into the discrete_distribution object. Denote the result by "index". - Draw a second sample from the mt19937 and feed it into the normal_distribution object. Denote the result by "u". - Draw a third sample from the mt19937 and feed it into the normal_distribution object. Denote the result by "v". - Pick up the mean vector and covariance matrix from the lists with respect to "index" - Transform the vector (u,v) according to the selected mean vector and covariance matrix. Denote the result "(x,y)". - Return (x,y) Lets remark that three samples are drawn from mt19937 to obtain one sample from the gaussian mixture. The result of this algorithm depends on the numbers of gaussians that build the mixture. 1) If M equals 1, the result looks fine. The (x,y) samples form an ellipsoid in the 2-dimensional euclidean space 2) If M equals 2, one gaussian looks like an ellipsoid, but it is too small. The second "gaussian" looks like a torus. Obviously all samples (u,v) with large coordinates are assigned to one gaussian, all (u,v) near zero are assigned to the other gaussian 3) If M equals 3, the "gaussians" look like segments of an ellipsoid. In each ellipsoid one third is missing. 4) For higher M the result is not correct either, but not as vivid and easily to explain as the "small" cases. As soon as I switch to random_device instead of mt19937 the phenomena disappears. ---------------------------------------------------------------------- Matthias Nagel Willy-Andreas-Allee 1, Zimmer 506 76131 Karlsruhe Telefon: +49 721 8695-1506 Telefax: +49 721 8695-4506 Mobil: +49 151 15998774 e-Mail: matthias.nagel@student.kit.edu ICQ: 499797758 Skype: nagmat84

AMDG On 01/26/2012 10:57 AM, Matthias Nagel wrote:
Hello,
I tried to implement a two-dimensional multi-modal gaussian mixture and used the "mt19937" as the underlying generator. After I drew some samples (from the gaussian mixture) the outcome was not as I expected. After some bug tracking I suppose that the reason is something that is called "lattice effect" in German math literature. A pseudo-random generator suffers from a lattice effect, if one draws a sequence of samples, groups them together into N-dimensional vectors and these N-dimensional vectors are not uniformly distributed in the N-dimensional euclidean space. Does anybody know if the boost implementation has this problem?
mt19937 is equidistributed in 623 dimensions. The algorithm is standard. Any implementation is going to have the same problems. The only differences will be the seeding and the distribution algorithms.
In the following, I will present my problem (and my algorithm) more elaborately, especially how I implemented my multi-model gaussian that is the origin of my problem:
Lets assume we want a 2-dimensial gaussian mixture with M gaussian.
1) Create three lists (stl vector objects in my case), each with M entries. The first list stores the weight of each gaussian. The weights sum up to 1. The second list stores the mean value of each gaussian, i.e. a two-dimensional vector (in the mathematical sense, not the stl vector). The third list stores a 2-by-2 covariance matrix for each gaussian
2) Create a "mt19937" object, a "discrete_distribution" object and a "normal_distribution" object The "mt19937" is the common source of randomness for "discrete_distribution" and "normal_distribution". The "discrete_distribution" produces integers between 0 and M (number of gaussians) according the the given weights The "normal_distribution" is parameteized with mean = 0 and variance = 1, i.e. it is the standard normal distribtion.
3) Drawing one sample from the gaussian mixture - Draw one sample from the mt19937 and feed it into the discrete_distribution object. Denote the result by "index". - Draw a second sample from the mt19937 and feed it into the normal_distribution object. Denote the result by "u". - Draw a third sample from the mt19937 and feed it into the normal_distribution object. Denote the result by "v". - Pick up the mean vector and covariance matrix from the lists with respect to "index" - Transform the vector (u,v) according to the selected mean vector and covariance matrix. Denote the result "(x,y)". - Return (x,y)
That's not quite how the distributions work. discrete_distribution uses two inputs to produce one output. normal_distribution uses two inputs to produce to outputs.
Lets remark that three samples are drawn from mt19937 to obtain one sample from the gaussian mixture.
The result of this algorithm depends on the numbers of gaussians that build the mixture.
1) If M equals 1, the result looks fine. The (x,y) samples form an ellipsoid in the 2-dimensional euclidean space
2) If M equals 2, one gaussian looks like an ellipsoid, but it is too small. The second "gaussian" looks like a torus. Obviously all samples (u,v) with large coordinates are assigned to one gaussian, all (u,v) near zero are assigned to the other gaussian
3) If M equals 3, the "gaussians" look like segments of an ellipsoid. In each ellipsoid one third is missing.
4) For higher M the result is not correct either, but not as vivid and easily to explain as the "small" cases.
As soon as I switch to random_device instead of mt19937 the phenomena disappears.
Can you post small self-contained code for this? In Christ, Steven Watanabe

Dear all,
On Thu, 26 Jan 2012 14:58:14 -0800
Steven Watanabe
Can you post small self-contained code for this?
so far I was not aware of any serious statistical problem with the Mersenne Twister. Therefore, I also would like to see some self-contained code to reproduce and possibly investigate the problem. Heiko -- -- Die Menschen stolpern nicht über Berge, sondern über Maulwurfshügel. -- (Konfuzius, chin. Philosoph, 551-479 v.Chr.) -- Cluster Computing @ http://www.clustercomputing.de -- Number Crunch Blog @ http://numbercrunch.de

Am Freitag 27 Januar 2012, 10:28:21 schrieb Heiko Bauke:
Dear all,
On Thu, 26 Jan 2012 14:58:14 -0800 Steven Watanabe
wrote: Can you post small self-contained code for this?
so far I was not aware of any serious statistical problem with the Mersenne Twister. Therefore, I also would like to see some self-contained code to reproduce and possibly investigate the problem.
Heiko
No problem. I will create a self-contained example, but I don't have the time until Monday. ---------------------------------------------------------------------- Matthias Nagel Willy-Andreas-Allee 1, Zimmer 506 76131 Karlsruhe Telefon: +49 721 8695-1506 Telefax: +49 721 8695-4506 Mobil: +49 151 15998774 e-Mail: matthias.nagel@student.kit.edu ICQ: 499797758 Skype: nagmat84
participants (3)
-
Heiko Bauke
-
Matthias Nagel
-
Steven Watanabe