
This is the first part, of what will probably end up a multi-part review. First the headline - I do think that the library should be included in Boost - although I do have some reservations as noted below. This part-review covers the documentation and a few simple tests I've conducted. Later I'll try writing some accumulators so that I can comment better on the design. Documentation review ~~~~~~~~~~~~~~~~~~~ Tutorial: ~~~~~~~~~ I'm fairly sure that the whole accumulator_set_wrapper business could be completely avoided with a TR1 conforming reference_wrapper - which unfortunately we don't have in Boost just yet - none the less this may be worthwhile mentioning. The "push all the data in this sequence" idiom seems like a very common use case to me, is it not worthwhile supporting this directly via a couple of member functions: template <class It> accumulator_set& accumulator_set::push_range(It begin, It end); template <class Seq> accumulator_set& accumulator_set::push_sequence(Seq const& s); Typo in "This will calcualte the left" Statistic accumulators: ~~~~~~~~~~~~~~~~~~~~~~ Kurtosis ~~~~~~~~ This is going to confuse folks - I know it's got Paul and I into trouble already! You're returning the "Kurtosis Excess" rather than the "Kurtosis Proper". In the Math Toolkit stats functions we called these functions "kurtosis" and "kurtosis_excess", and I'd favour similar naming here, or at least a strong warning to the effect that there are multiple definitions in the literature - otherwise unwary users will get into all kinds of trouble. Differing implementations? ~~~~~~~~~~~~~~~~~~~~~~~~~~ feature_of and as_feature get mentioned in the tutorial, along with a sentence to say that there can be different implementations of the same feature, but there is no explanation of how this works, and the links to the reference pages for as_feature and feature_of are distinctly detail free :-( This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail. I'm also missing accessors for: the standard deviation, "unbiased" variance, and the unbiased (N-1) standard deviation. I realise I could write these myself, but they would seem to be common enough cases to warrant inclusion in the library. As it stands, I had to jump through hoops to get data I could compare with "known good values". BTW, I assume that there are no accumulators for Autocorrelation Coefficient calculation? Which leads on to a quick comparison I ran against the "known good" data here: http://www.itl.nist.gov/div898/strd/univ/homepage.html The test program is attached, and outputs the relative error in the statistics calculated. Each test is progressively harder than the previous one, the output is: PI data: Error in mean is: 0 Error in SD is: 3.09757e-016 Lottery data: Error in mean is: 6.57202e-016 Error in SD is: 0 Accumulator 2 data: Error in mean is: 9.25186e-015 Error in SD is: 2.71685e-012 Accumulator 3 data: Error in mean is: 5.82076e-016 Error in SD is: 0.0717315 Accumulator 4 data: Error in mean is: 9.87202e-015 Error in SD is: -1.#IND As you can see the calculated standard deviation gets progressively worse, in the final case, the computed variance is actually -2, which is clearly non-sensical (and hence the NaN when using it to compute the standard deviation) :-( I haven't tried to debug these cases: stepping through the accumulator code is not something I would recommend to anyone frankly (having tried it previously). No doubt other torture tests could be constructed for the mean - exponentially distributed data where you see one or two large values, followed by a large number of very tiny values springs to mind - this is the sort of situation that the Kahan summation algorithm gets employed for. I'm not sure what one should do about this. These certainly illustrate the kinds of trap that the unwary can fall into. It's particularly a problem for the "scientist in a rush" who may not be either a C++ or statistical expert (whether (s)he thinks they are or not!) but needs a library to quickly and reliably calculate some stats. Better documentation would certainly help, but guidance on (or provision of - perhaps by default) better algorithms might be welcome as well. That's all for now, Regards, John.

John Maddock wrote:
Which leads on to a quick comparison I ran against the "known good" data here: http://www.itl.nist.gov/div898/strd/univ/homepage.html The test program is attached, and outputs the relative error in the statistics calculated.
Oops, forgot the attachment, here it is. John.

John Maddock wrote:
John Maddock wrote:
Which leads on to a quick comparison I ran against the "known good" data here: http://www.itl.nist.gov/div898/strd/univ/homepage.html The test program is attached, and outputs the relative error in the statistics calculated.
Oops, forgot the attachment, here it is.
After some more investigation: 1) I fell into the N vs N-1 standard deviation trap myself, corrected test file attached: hopefully right this time! The output is now: PI data: Error in mean is: 0 Error in SD is: 3.09757e-016 Lottery data: Error in mean is: 6.57202e-016 Error in SD is: 0 Accumulator 2 data: Error in mean is: 9.25186e-015 Error in SD is: 0.000499625 Accumulator 3 data: Error in mean is: 5.82076e-016 Error in SD is: 0.071196 Accumulator 4 data: Error in mean is: 9.87202e-015 Error in SD is: -1.#IND 2) I re-ran the last calculation using NTL::RR at 1000-bit precision, the final test case does give a sensible answer now rather than a NaN. But... 3) The results for standard deviation (taken as the square root of the variance) are still off, In the last "torture test" set of data from http://www.itl.nist.gov/div898/strd/univ/data/NumAcc4.dat I see: Test | Result | Rel Error Your code: | 0.1046776164 | 0.04677616448 Naive RMSD | 0.09995003803 | 0.0004996197271 True Value | 0.1 | 0 The "Naive RMSD" just does a very naive "root mean square deviation from the mean" calculation. I believe (but haven't checked) that the remaining difference between this "naive" calculation and the true value results from the inputs having inexact binary representations - I would need to lexical_cast everything from a string representation to an NTL::RR rather than storing as an intermediate double to verify. Can't be bothered to test this at present I'm afraid :-( It's still rather alarming though that the "naive" method appears to be 100 times more accurate than the accumulator. Hoping I'm doing something wrong yours, John.

In the tutorial it says: " Even the extractors can accept named parameters. In a bit, we'll see a situation where that is useful." But I can't find any further mention of extractors with named parameters, I ask because I believe it may be indeed be useful - for example a student's t test could defer requiring the acceptable risk level until the extractor is actually called - so the same test could be conducted multiple times with different risk levels for example just by calling the extractor multiple times. One other thing I've missed is the ability to "reset" an accumulator so it can be used more than once. Is this possible? Potentially it might save quite a bit of construction cost in some cases? Thanks, John.

John Maddock wrote:
In the tutorial it says:
" Even the extractors can accept named parameters. In a bit, we'll see a situation where that is useful."
But I can't find any further mention of extractors with named parameters, I ask because I believe it may be indeed be useful - for example a student's t test could defer requiring the acceptable risk level until the extractor is actually called - so the same test could be conducted multiple times with different risk levels for example just by calling the extractor multiple times.
Whoops. I had intended to document how to use an external accumulator for the weight statistics, which can then be passed in as a parameter at extraction time. (Essentially, dependencies within an accumulator set can be resolved by promising to pass in the required accumulator at extraction time.) But apparently I never got around to it. I'll fix that.
One other thing I've missed is the ability to "reset" an accumulator so it can be used more than once. Is this possible? Potentially it might save quite a bit of construction cost in some cases?
Would you like to be able to reset an accumulator, or the whole set? Nothing stops you from writing an accumulator with a reset() member function. You can use accumulator_set::extract() to fetch the accumulator corresponding to a certain feature and then call reset() on it. It might be nice to have an accumulator_set::reset() member function that resets all the accumulators in order. But what should happen to accumulators that don't define a reset() function? Destroy and reconstruct them in-place? That could be trouble. Skipping them would be trouble, too. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
Would you like to be able to reset an accumulator, or the whole set? Nothing stops you from writing an accumulator with a reset() member function. You can use accumulator_set::extract() to fetch the accumulator corresponding to a certain feature and then call reset() on it.
It might be nice to have an accumulator_set::reset() member function that resets all the accumulators in order. But what should happen to accumulators that don't define a reset() function? Destroy and reconstruct them in-place? That could be trouble. Skipping them would be trouble, too.
I was thinking of the whole set yes, but I also see how this could be problemetic :-( Thanks for the responses, John.

I'm including Matthias Troyer on this, in the hopes he might have some feedback about your kurtosis suggestion and about your torture test. John Maddock wrote:
This is the first part, of what will probably end up a multi-part review.
First the headline - I do think that the library should be included in Boost - although I do have some reservations as noted below.
Thanks, John.
This part-review covers the documentation and a few simple tests I've conducted. Later I'll try writing some accumulators so that I can comment better on the design.
Documentation review ~~~~~~~~~~~~~~~~~~~
Tutorial: ~~~~~~~~~
I'm fairly sure that the whole accumulator_set_wrapper business could be completely avoided with a TR1 conforming reference_wrapper - which unfortunately we don't have in Boost just yet - none the less this may be worthwhile mentioning.
Really? I guess I'm not familiar enough with TR1 reference_wrapper. It forwards operator(), up to N args?
The "push all the data in this sequence" idiom seems like a very common use case to me, is it not worthwhile supporting this directly via a couple of member functions:
template <class It> accumulator_set& accumulator_set::push_range(It begin, It end);
template <class Seq> accumulator_set& accumulator_set::push_sequence(Seq const& s);
Good idea.
Typo in "This will calcualte the left"
Yep, thanks.
Statistic accumulators: ~~~~~~~~~~~~~~~~~~~~~~
Kurtosis ~~~~~~~~
This is going to confuse folks - I know it's got Paul and I into trouble already! You're returning the "Kurtosis Excess" rather than the "Kurtosis Proper". In the Math Toolkit stats functions we called these functions "kurtosis" and "kurtosis_excess", and I'd favour similar naming here, or at least a strong warning to the effect that there are multiple definitions in the literature - otherwise unwary users will get into all kinds of trouble.
Matthias?
Differing implementations? ~~~~~~~~~~~~~~~~~~~~~~~~~~
feature_of and as_feature get mentioned in the tutorial, along with a sentence to say that there can be different implementations of the same feature, but there is no explanation of how this works, and the links to the reference pages for as_feature and feature_of are distinctly detail free :-(
Yes, I need to document these.
This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail.
Very naive, yes.
I'm also missing accessors for: the standard deviation, "unbiased" variance, and the unbiased (N-1) standard deviation. I realise I could write these myself, but they would seem to be common enough cases to warrant inclusion in the library. As it stands, I had to jump through hoops to get data I could compare with "known good values".
These would be nice to have. I'm gladly accepting patches. :-)
BTW, I assume that there are no accumulators for Autocorrelation Coefficient calculation?
(This is the point at which I reveal my statistical naiveté.) Huh?
Which leads on to a quick comparison I ran against the "known good" data here: http://www.itl.nist.gov/div898/strd/univ/homepage.html The test program is attached, and outputs the relative error in the statistics calculated. Each test is progressively harder than the previous one, the output is:
PI data: Error in mean is: 0 Error in SD is: 3.09757e-016
Lottery data: Error in mean is: 6.57202e-016 Error in SD is: 0
Accumulator 2 data: Error in mean is: 9.25186e-015 Error in SD is: 2.71685e-012
Accumulator 3 data: Error in mean is: 5.82076e-016 Error in SD is: 0.0717315
Accumulator 4 data: Error in mean is: 9.87202e-015 Error in SD is: -1.#IND
As you can see the calculated standard deviation gets progressively worse, in the final case, the computed variance is actually -2, which is clearly non-sensical (and hence the NaN when using it to compute the standard deviation) :-(
I haven't tried to debug these cases: stepping through the accumulator code is not something I would recommend to anyone frankly (having tried it previously).
No doubt other torture tests could be constructed for the mean - exponentially distributed data where you see one or two large values, followed by a large number of very tiny values springs to mind - this is the sort of situation that the Kahan summation algorithm gets employed for.
I'm not sure what one should do about this. These certainly illustrate the kinds of trap that the unwary can fall into. It's particularly a problem for the "scientist in a rush" who may not be either a C++ or statistical expert (whether (s)he thinks they are or not!) but needs a library to quickly and reliably calculate some stats. Better documentation would certainly help, but guidance on (or provision of - perhaps by default) better algorithms might be welcome as well.
I'm hoping that Matthias can comment on your test and results. (Matthias, John posted his test in later msgs to the boost-devel list.) I implemented the accumulators very naively according to equations provided by Matthias. It's possible I needed to be more clever with the implementations, or that Matthias was unconcerned (or unaware) of the way the statistical accumulators would perform on torture tests such as these. The framework accommodates many different implementations of each feature. The idea was to allow quick-and-dirty approximations as well has slower, more accurate implementations. I would LOVE for a statistical expert to contribute some highly accurate statistical accumulators. Perhaps they can be based on the good work you've done in this area. (Nudge, nudge. :-) -- Eric Niebler Boost Consulting www.boost-consulting.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Eric Niebler Sent: 30 January 2007 18:35 To: boost@lists.boost.org; Matthias Troyer Subject: Re: [boost] Accumulators review
I'm including Matthias Troyer on this, in the hopes he might have some feedback about your kurtosis suggestion .
Some useful links are http://en.wikipedia.org/wiki/Kurtosis and most definitely, as always, Weisstein, Eric W. "Kurtosis." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Kurtosis.html As one who has been rescused from this pit by John more than once, I strongly recommend using kurtosis for the 'proper' version, and always saying kurtosis excess (kurtosis - 3) whenever applicable. We have eventually done this with the math Toolkit distribution properties. Unwisely IMO, Wikipedia have not done this, and there is much confusion, and some mistakes that I have corrected. Paul PS and about your torture test - I don't think this should be the focus of the review. There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong! and that there should have never have been used to calculate a simple mean etc anyway - it needs a robust mean?). --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

On 30 Jan 2007, at 20:07, Paul A Bristow wrote:
PS and about your torture test - I don't think this should be the focus of the review.
There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong!
and that there should have never have been used to calculate a simple mean etc anyway - it needs a robust mean?).
The accumulator library can easily extended by more robust estimators.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Matthias Troyer Sent: 30 January 2007 22:08 To: boost@lists.boost.org Subject: Re: [boost] Accumulators review
On 30 Jan 2007, at 20:07, Paul A Bristow wrote:
PS and about your torture test - I don't think this should be the focus of the review.
There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong!
and that there should have never have been used to calculate a simple mean etc anyway - it needs a robust mean?).
The accumulator library can easily extended by more robust estimators.
Indeed - its strength is that it is a framework ready for such extensions. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Paul A Bristow wrote:
PS and about your torture test - I don't think this should be the focus of the review.
There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong!
In case it wasn't clear from my post, I actually agree with this. The library can be extended to more/better algorithms at a later date anyway, the important thing at this stage is the framework. What is important though is documentation: most people would miss the suble difference between the "immediate" and "naive" variance calculations, and potentially (although very rairly) fall into all kinds of hard-to-spot traps. John.

John Maddock wrote:
Paul A Bristow wrote:
PS and about your torture test - I don't think this should be the focus of the review.
There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong!
In case it wasn't clear from my post, I actually agree with this. The library can be extended to more/better algorithms at a later date anyway, the important thing at this stage is the framework.
Right! :-)
What is important though is documentation: most people would miss the suble difference between the "immediate" and "naive" variance calculations, and potentially (although very rairly) fall into all kinds of hard-to-spot traps.
Agree 100% -- the docs should be clearer on this point. I may be following up with you and Matthias offline for ideas about how best to document the complexity and error of the various statistical accumulators. But a big WARNING up front couldn't hurt, either. -- Eric Niebler Boost Consulting www.boost-consulting.com

On Jan 31, 2007, at 10:26 AM, John Maddock wrote:
Paul A Bristow wrote:
PS and about your torture test - I don't think this should be the focus of the review.
There are horses for courses, and the not-always-accurate version should be fine for most applications. (If the data are going to be as unreasonable as the torture data are, one could argue that they are wrong!
In case it wasn't clear from my post, I actually agree with this. The library can be extended to more/better algorithms at a later date anyway, the important thing at this stage is the framework.
What is important though is documentation: most people would miss the suble difference between the "immediate" and "naive" variance calculations, and potentially (although very rairly) fall into all kinds of hard-to-spot traps.
It might be useful to have variance(accurate) as an alternative, or just to allow the accurate option to all features. Matthias

On Jan 30, 2007, at 7:35 PM, Eric Niebler wrote:
Statistic accumulators: ~~~~~~~~~~~~~~~~~~~~~~ Kurtosis ~~~~~~~~ This is going to confuse folks - I know it's got Paul and I into trouble already! You're returning the "Kurtosis Excess" rather than the "Kurtosis Proper". In the Math Toolkit stats functions we called these functions "kurtosis" and "kurtosis_excess", and I'd favour similar naming here, or at least a strong warning to the effect that there are multiple definitions in the literature - otherwise unwary users will get into all kinds of trouble.
Matthias?
OK with me.
This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail.
Very naive, yes.
Actually tag::variance is the naive implementation while tag::variance (immediate) is a more accurate implementation.
BTW, I assume that there are no accumulators for Autocorrelation Coefficient calculation?
(This is the point at which I reveal my statistical naiveté.) Huh?
In the design phase we had discussed support for correlated samples, and this will be a major extension project once the basic library for uncorrelated samples is accepted. In addition to autocorrelation coefficients with arbitrary lags, the accumulators for correlated data will need to include autocorrelation time estimators and error estimators for correlated samples. I plan to work on this over the next year.
Which leads on to a quick comparison I ran against the "known good" data here: http://www.itl.nist.gov/div898/strd/univ/ homepage.html The test program is attached, and outputs the relative error in the statistics calculated. Each test is progressively harder than the previous one, the output is: PI data: Error in mean is: 0 Error in SD is: 3.09757e-016 Lottery data: Error in mean is: 6.57202e-016 Error in SD is: 0 Accumulator 2 data: Error in mean is: 9.25186e-015 Error in SD is: 2.71685e-012 Accumulator 3 data: Error in mean is: 5.82076e-016 Error in SD is: 0.0717315 Accumulator 4 data: Error in mean is: 9.87202e-015 Error in SD is: -1.#IND As you can see the calculated standard deviation gets progressively worse, in the final case, the computed variance is actually -2, which is clearly non-sensical (and hence the NaN when using it to compute the standard deviation) :-( I haven't tried to debug these cases: stepping through the accumulator code is not something I would recommend to anyone frankly (having tried it previously). No doubt other torture tests could be constructed for the mean - exponentially distributed data where you see one or two large values, followed by a large number of very tiny values springs to mind - this is the sort of situation that the Kahan summation algorithm gets employed for. I'm not sure what one should do about this. These certainly illustrate the kinds of trap that the unwary can fall into. It's particularly a problem for the "scientist in a rush" who may not be either a C++ or statistical expert (whether (s)he thinks they are or not!) but needs a library to quickly and reliably calculate some stats. Better documentation would certainly help, but guidance on (or provision of - perhaps by default) better algorithms might be welcome as well.
I'm hoping that Matthias can comment on your test and results. (Matthias, John posted his test in later msgs to the boost-devel list.) I implemented the accumulators very naively according to equations provided by Matthias. It's possible I needed to be more clever with the implementations, or that Matthias was unconcerned (or unaware) of the way the statistical accumulators would perform on torture tests such as these.
The framework accommodates many different implementations of each feature. The idea was to allow quick-and-dirty approximations as well has slower, more accurate implementations. I would LOVE for a statistical expert to contribute some highly accurate statistical accumulators. Perhaps they can be based on the good work you've done in this area. (Nudge, nudge. :-)
I will take a look, but am a bit busy the next couple of days since I have to pack everything for the whole family including our baby for a one-month trip to Santa Barbara. I'll comment in more detail later. For now it seems that you use the default variance implementation which should be the naive estimator from the sum and sum of squares. The variance(immediate) feature should be more accurate (see libs/ accumulators/doc/html/boost/accumulators/impl/ immediate_variance_impl.html) for the equation used. Matthias

Matthias Troyer wrote:
This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail.
Very naive, yes.
Actually tag::variance is the naive implementation while tag::variance (immediate) is a more accurate implementation.
OK I switched my torture test to use the "immediate" versions of the mean and variance, and got all the right answers :-) So with some suitable warnings in the documentation about which to use when, I'm happy now. Thanks, John.

John Maddock wrote:
Matthias Troyer wrote:
This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail.
Very naive, yes. Actually tag::variance is the naive implementation while tag::variance (immediate) is a more accurate implementation.
OK I switched my torture test to use the "immediate" versions of the mean and variance, and got all the right answers :-) So with some suitable warnings in the documentation about which to use when, I'm happy now.
Whew! :-) Thanks for doing these tests, John.If you send me the latest and greatest, I'll turn it into a regression test. -- Eric Niebler Boost Consulting www.boost-consulting.com

On 1/31/07, Eric Niebler <eric@boost-consulting.com> wrote:
John Maddock wrote:
Matthias Troyer wrote:
This all leads on to the algorithms used for the mean and variance etc: I assume that these are of the "naive" variety? Perfectly satisfactory for simple cases, but there are well known pathological situations where these fail.
Very naive, yes. Actually tag::variance is the naive implementation while tag::variance (immediate) is a more accurate implementation.
OK I switched my torture test to use the "immediate" versions of the mean and variance, and got all the right answers :-) So with some suitable warnings in the documentation about which to use when, I'm happy now.
In the interest of users not getting burned by this, perhaps the default behavior should be the stable algorithm with something like tag::variance(fast_and_dangerous) being, well, the fast and dangerous algorithm. —Ben

At 03:58 PM 1/30/2007, you wrote:
For now it seems that you use the default variance implementation which should be the naive estimator from the sum and sum of squares.
Why would you supply a poor implementation as the default when the alternative (West's algorithm) is efficient, easily implemented and is much more precise? The only reasons I can think for including the naive algorithm at all is for those rare cases where a small increase in performance is more important than a potentially very large loss in precision or if you are using exact arithmetic (e.g., if instead of floating point you are using rational numbers or if all your values are integers). The "pathological cases" where the naive algorithm does poorly are when the sum of squares (and the square of the sum) is large relative to the variance, this is very frequently the case. It can occur when the variance is small relative to the mean or when there are more than a few terms involved. How small relative to the mean or how many terms depends on how much precision you really care about in your variance. Because we are dealing with squares the error mounts pretty quickly. For the record: West's algorithm: t1 = (x[k] - M[k-1]) t2 = t1/k; M[k] = M[k-1] + t2; T[k] = T[k-1] + (k-1)*t1*t2 Mean{X[1]...X[n]} = M[n] Var{X[1]...X[n]}=T[n]/m Where "m" = (n-1) for the unbiased estimator of the population variance or = n for the minimal-variance estimator of the population variance Topher Cooper

On Jan 31, 2007, at 4:40 PM, Topher Cooper wrote:
At 03:58 PM 1/30/2007, you wrote:
For now it seems that you use the default variance implementation which should be the naive estimator from the sum and sum of squares.
Why would you supply a poor implementation as the default when the alternative (West's algorithm) is efficient, easily implemented and is much more precise? The only reasons I can think for including the naive algorithm at all is for those rare cases where a small increase in performance is more important than a potentially very large loss in precision or if you are using exact arithmetic (e.g., if instead of floating point you are using rational numbers or if all your values are integers).
Yes, this is the reason - maybe one could just make the more accurate version the default?

At 03:57 PM 1/31/2007, you wrote:
Yes, this is the reason - maybe one could just make the more accurate version the default?
The default should be the one that won't get you into trouble -- i.e. the accurate version. Especially when the cost is one extra addition two extra multiplies and a divide per iteration -- rarely a significant cost compared to the process generating the samples (e.g., an input operation or a non-uniform RNG). User's who know enough and care enough to choose the dirty version intelligently could do so. Eventually you could specialize to default to the naive method if the accumulators are using exact arithmetic and you can be sure you won't overflow (remember, the naive method requires roughly twice the number of bits to store the final cumulants). Topher
participants (6)
-
Ben FrantzDale
-
Eric Niebler
-
John Maddock
-
Matthias Troyer
-
Paul A Bristow
-
Topher Cooper