
Michael Stevens wrote:
I an very interested in this framework so I have started to take a look.
General Accumulators are something I could make use of myself. I really like the conceptual design of the framework, and how it allows accumulators to be inter-dependant..
Great! Glad you like it.
After a quick browse through the documentation I decided to take a look at the code. In particular I was interested in the numerics.
I think the current implementation has some serious numerical weaknesses. I looked at 2 algorithms 'sum' and 'variance':
In 'sum' I expected to see a compensated summation, this is numerically a lot better then just adding the numbers together.
'sum' is one I implemented. I'm not surprised to learn there are better approaches. The framework allows for different implementation strategies for the statistics, though. Using the extensibility features, you can define your own "compensated_sum" accumulator and declare that it satisfies the "sum" feature (so that "compensated_sum" and "sum" are indistinguishable from the POV of dependency resolution), and even come up with clever syntax for it, like: accumulator_set< double, features< sum(compensated) > > acc; You might even try writing "compensated_sum" yourself and submitting it, just to see what happens. :-) The questions of what the default "sum" should do, and what alternate implementations should be provided, are open.
The 'variance' accumulator has a lazy calculation of variance using the formula \sigma_n^2 = M_n^{(2)} - \mu_n^2. This formal is specifically cited for it poor performance in the presence of rounding error. Indeed it may even return negative results.
Any chance of getting your statistics guys to take a look at the numerics of the solutions? If people were to use library as is they would be in for nasty surprised!
I'll forward this message off to the stats guys. This would certainly be a good issue to re-raise once the review starts. -- Eric Niebler Boost Consulting www.boost-consulting.com