[Accumulators] Are all statistics lazily evaluated by default?

Hi all, I really like the range of statistical properties that the accumulators library supports but am a little unclear about the laziness properties of the library. Suppose I wish to offer a interface that lets the user send a bunch of data and request the k-th moments where k is specified by the user at runtime. Will the accumulator_set accumulator_set< value_type , ba::tag::moments > acc; where k_max >= k (but the inequality may be strict) be efficient. I.e. I only do the calculations necessary to get the k-th moment not the k-max-th? The reason I ask is the documentation makes a distinction between variance and lazy_variance, but this distinction does not seem to be made for other statistics. Thanks in advance for any assistance, Pete

pete@pcbartlett.com wrote:
Hi all,
I really like the range of statistical properties that the accumulators library supports but am a little unclear about the laziness properties of the library.
Suppose I wish to offer a interface that lets the user send a bunch of data and request the k-th moments where k is specified by the user at runtime. Will the accumulator_set
accumulator_set< value_type , ba::tag::moments
> acc; where k_max >= k (but the inequality may be strict) be efficient. I.e. I only do the calculations necessary to get the k-th moment not the k-max-th? The reason I ask is the documentation makes a distinction between variance and lazy_variance, but this distinction does not seem to be made for other statistics.
Hi, The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you. HTH, -- Eric Niebler BoostPro Computing http://www.boostpro.com

Eric Niebler wrote:
Hi,
The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you.
HTH,
Thanks for the reply, Eric. It's great that the framework doesn't force things one way or the other. I was particularly interested in some of the statistics supplied with the library. With your handy hint for determining laziness (i.e. operator() or result() ), I see in the code that moment<> is eager. It's just a nit, but the documentation for those statistics would be improved IMO if they stated that - this might be as easy as a blanket statement that supplied statistics are eager unless otherwise stated. For my purposes, I need lazy_moment<> which thanks to the framework you've come up with is very straightforward to implement. Then things like lazy_skewness and lazy_kurtosis will rapidly follow. These will have near identical implementations to the existing skewness and kurtosis statistics with the exception that all moment<n>s will be replaced by lazy_moment<n>s. Half of me thinks there could be value in templating over the "moment type" in such cases but perhaps that is over-engineering. In any case, if I did offer up lazy versions, might there be interest in including them in the library itself? Pete

Pete Bartlett wrote:
Eric Niebler wrote:
Hi,
The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you.
HTH,
Thanks for the reply, Eric. It's great that the framework doesn't force things one way or the other. I was particularly interested in some of the statistics supplied with the library. With your handy hint for determining laziness (i.e. operator() or result() ), I see in the code that moment<> is eager. It's just a nit, but the documentation for those statistics would be improved IMO if they stated that - this might be as easy as a blanket statement that supplied statistics are eager unless otherwise stated.
Agreed, the docs would be improved if this information were provided.
For my purposes, I need lazy_moment<> which thanks to the framework you've come up with is very straightforward to implement. Then things like lazy_skewness and lazy_kurtosis will rapidly follow. These will have near identical implementations to the existing skewness and kurtosis statistics with the exception that all moment<n>s will be replaced by lazy_moment<n>s. Half of me thinks there could be value in templating over the "moment type" in such cases but perhaps that is over-engineering.
If you can find a way to use templates to eliminate needless code duplication, I wouldn't call that over-engineering.
In any case, if I did offer up lazy versions, might there be interest in including them in the library itself?
Patches are welcome. If you add new accumulators, you'll also need to submit patches for the docs and the tests, though. -- Eric Niebler BoostPro Computing http://www.boostpro.com
participants (3)
-
Eric Niebler
-
Pete Bartlett
-
pete@pcbartlett.com