
On Jan 31, 2007, at 7:47 PM, Eric Niebler wrote:
james.jones@firstinvestors.com wrote:
From: Matthias Troyer <troyer@phys.ethz.ch>
These were easy, trickier are robust estimators of variance and moments, even trickier will be median estimators where I currently even do not see how this should be done without storing most of the time series.
Isn't this already a problem with sequences? Suppose you're storing the current median, and a new value comes along. What's the new median - without checking all the previous values? There are statistical estimates for this, but I don't know any exact way other than essentially resorting the data and checking the new median.
To calculate an exact median, you're correct. Of course, nothing stops you from writing an accumulator that calculates the exact median by storing all the samples seen so far. There are approximate median algorithms that don't need to do that, though, and that's what Matthias is referring to. Those might be tricky to combine.
Correct. These approximate (e.g. the p_square methods) algorithms are impossible to combine. Also many of the algorithms for correlated sequences of data are hard or impossible to combine. Matthias