[Accumulators] Are all statistics lazily evaluated by default?

older
(Boost.Python) How to print from...

pete＠pcbartlett.com

11 May 2009 11 May '09

9:51 a.m.

Hi all, I really like the range of statistical properties that the accumulators library supports but am a little unclear about the laziness properties of the library. Suppose I wish to offer a interface that lets the user send a bunch of data and request the k-th moments where k is specified by the user at runtime. Will the accumulator_set accumulator_set< value_type , ba::tag::moments > acc; where k_max >= k (but the inequality may be strict) be efficient. I.e. I only do the calculations necessary to get the k-th moment not the k-max-th? The reason I ask is the documentation makes a distinction between variance and lazy_variance, but this distinction does not seem to be made for other statistics. Thanks in advance for any assistance, Pete

Attachments:

attachment.html (text/html — 916 bytes)

Show replies by date

Eric Niebler

12 May 12 May

12:59 a.m.

New subject: [Accumulators] Are all statistics lazily evaluated by default?

pete@pcbartlett.com wrote:

...

Hi all,

I really like the range of statistical properties that the accumulators library supports but am a little unclear about the laziness properties of the library.

Suppose I wish to offer a interface that lets the user send a bunch of data and request the k-th moments where k is specified by the user at runtime. Will the accumulator_set

accumulator_set< value_type , ba::tag::moments<k_max> > acc;

where k_max >= k (but the inequality may be strict) be efficient. I.e. I only do the calculations necessary to get the k-th moment not the k-max-th? The reason I ask is the documentation makes a distinction between variance and lazy_variance, but this distinction does not seem to be made for other statistics.

Hi, The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you. HTH, -- Eric Niebler BoostPro Computing http://www.boostpro.com

Pete Bartlett

8:43 p.m.

New subject: [Accumulators] Are all statistics lazilyevaluated by default?

Eric Niebler wrote:

...

Hi,

The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you.

HTH,

Thanks for the reply, Eric. It's great that the framework doesn't force things one way or the other. I was particularly interested in some of the statistics supplied with the library. With your handy hint for determining laziness (i.e. operator() or result() ), I see in the code that moment<> is eager. It's just a nit, but the documentation for those statistics would be improved IMO if they stated that - this might be as easy as a blanket statement that supplied statistics are eager unless otherwise stated. For my purposes, I need lazy_moment<> which thanks to the framework you've come up with is very straightforward to implement. Then things like lazy_skewness and lazy_kurtosis will rapidly follow. These will have near identical implementations to the existing skewness and kurtosis statistics with the exception that all moment<n>s will be replaced by lazy_moment<n>s. Half of me thinks there could be value in templating over the "moment type" in such cases but perhaps that is over-engineering. In any case, if I did offer up lazy versions, might there be interest in including them in the library itself? Pete

Eric Niebler

14 May 14 May

4:14 p.m.

New subject: [Accumulators] Are all statistics lazilyevaluated by default?

Pete Bartlett wrote:

...

Eric Niebler wrote:

...
Hi,

The accumulators library doesn't mandate laziness or eagerness. As you've noticed, some accumulators come in lazy or eager flavors. Each accumulator defines an operator() that accepts a sample and a result() function that extracts the result. Whether the bulk of the work gets done in operator() (eager) or in result() (lazy) is up to you.

HTH,

Thanks for the reply, Eric. It's great that the framework doesn't force things one way or the other. I was particularly interested in some of the statistics supplied with the library. With your handy hint for determining laziness (i.e. operator() or result() ), I see in the code that moment<> is eager. It's just a nit, but the documentation for those statistics would be improved IMO if they stated that - this might be as easy as a blanket statement that supplied statistics are eager unless otherwise stated.

Agreed, the docs would be improved if this information were provided.

...

For my purposes, I need lazy_moment<> which thanks to the framework you've come up with is very straightforward to implement. Then things like lazy_skewness and lazy_kurtosis will rapidly follow. These will have near identical implementations to the existing skewness and kurtosis statistics with the exception that all moment<n>s will be replaced by lazy_moment<n>s. Half of me thinks there could be value in templating over the "moment type" in such cases but perhaps that is over-engineering.

If you can find a way to use templates to eliminate needless code duplication, I wouldn't call that over-engineering.

...

In any case, if I did offer up lazy versions, might there be interest in including them in the library itself?

Patches are welcome. If you add new accumulators, you'll also need to submit patches for the docs and the tests, though. -- Eric Niebler BoostPro Computing http://www.boostpro.com

5900

Age (days ago)

5903

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Eric Niebler
Pete Bartlett
pete＠pcbartlett.com