
At 03:58 PM 1/30/2007, you wrote:
For now it seems that you use the default variance implementation which should be the naive estimator from the sum and sum of squares.
Why would you supply a poor implementation as the default when the alternative (West's algorithm) is efficient, easily implemented and is much more precise? The only reasons I can think for including the naive algorithm at all is for those rare cases where a small increase in performance is more important than a potentially very large loss in precision or if you are using exact arithmetic (e.g., if instead of floating point you are using rational numbers or if all your values are integers). The "pathological cases" where the naive algorithm does poorly are when the sum of squares (and the square of the sum) is large relative to the variance, this is very frequently the case. It can occur when the variance is small relative to the mean or when there are more than a few terms involved. How small relative to the mean or how many terms depends on how much precision you really care about in your variance. Because we are dealing with squares the error mounts pretty quickly. For the record: West's algorithm: t1 = (x[k] - M[k-1]) t2 = t1/k; M[k] = M[k-1] + t2; T[k] = T[k-1] + (k-1)*t1*t2 Mean{X[1]...X[n]} = M[n] Var{X[1]...X[n]}=T[n]/m Where "m" = (n-1) for the unbiased estimator of the population variance or = n for the minimal-variance estimator of the population variance Topher Cooper