[Review] Review of the Accumulators library begins today, Jan 29

newer
Re: [boost] [Review] Review of the...

John Phillips

29 Jan 2007 29 Jan '07

2:15 p.m.

The formal review of the Boost.Accumulators library, submitted by Eric Neibler begins today and ends on Wednesday, February 7. The library is available from http://boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics From the documentation: Boost.Accumulators is both a library for incremental statistical computation as well as an extensible framework for incremental calculation in general. The library deals primarily with the concept of an accumulator, which is a primitive computational entity that accepts data one sample at a time and maintains some internal state. These accumulators may offload some of their computations on other accumulators, on which they depend. Accumulators are grouped within an accumulator set. Boost.Accumulators resolves the inter-dependencies between accumulators in a set and ensures that accumulators are processed in the proper order. Your comments may be brief or lengthy. If you identify problems along the way, please note if they are minor, serious, or showstoppers. Here are some questions you might want to answer in your review: • What is your evaluation of the design? • What is your evaluation of the implementation? • What is your evaluation of the documentation? • What is your evaluation of the potential usefulness of the library? • Did you try to use the library? With what compiler? Did you have any problems? • How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? • Are you knowledgeable about the problem domain? And finally, every review should answer this question: • Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. All interested parties are encouraged to submit a review. These can be sent to the developer list, the user list, or if you don't want to share your review with the general public it can be sent directly to me. Thanks to all for the work you will do on this review. John Phillips Review Manager

Show replies by date

Niitsuma Hirotaka

30 Jan 30 Jan

9:10 a.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Boost.Accumulators is quit useful library. Until I find this library, I had used Torch http://www.torch.ch Torch also provides general framework for various statistical and machine-learning methods. Boost.Accumulators is more general framework. And also we can easily describe various statistical methods. In this framework, the statistical methods become readable components. (Torch's components are not readable) But, in order to add a component requires covarite input, we need to modify statistics_fwd.hpp. (When I described join_histgram component, I need to modify statistics_fwd.hpp and depended other components density.hpp and so on .) I think this is not good. If possible, just add a file which describes a statistical method, and include the file, is better. It is seem to similar framework can be given by boost::signal boost::signal1<void, double> sig; sig.connect( &mean ); sig.connect( &variant ); sig.connect( &sum ); for_each(vec.begin(),vec.end(),boost::bind<void>( boost::ref(sig ),_1) ); In this case, we can add new components by just adding file. But Boost.Accumulators is better than boost::signal in many points. can describe dependency among components can divide incremental part and final part. Document is good. … I will switch Torch to Boost.Accumulators. Request: I can not find way bind covarite1. Plz give example for bind covarite1 The following code has compile error. double d=0.0; accumulators::accumulator_set<double, accumulators::stats< accumulators::tag::covariance<double , accumulators::tag::covariate1> > > acc(accumulators::sample = d,accumulators::covariate1 = d); std::for_each( boost::make_zip_iterator( boost::make_tuple(beg1, beg2) ), boost::make_zip_iterator( boost::make_tuple(end1, end2) ), boost::bind<void>( boost::ref(acc),(_1, accumulators::covariate1 = _2 ) ) );

John Phillips

12:35 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Just to make sure I don't read any intent into your post, do you recommend the accumulators library be added to boost? John (Review Manager) Niitsuma Hirotaka wrote:

...

Boost.Accumulators is quit useful library. Until I find this library, I had used Torch http://www.torch.ch Torch also provides general framework for various statistical and machine-learning methods. Boost.Accumulators is more general framework. And also we can easily describe various statistical methods. In this framework, the statistical methods become readable components. (Torch's components are not readable)

But, in order to add a component requires covarite input, we need to modify statistics_fwd.hpp. (When I described join_histgram component, I need to modify statistics_fwd.hpp and depended other components density.hpp and so on .) I think this is not good. If possible, just add a file which describes a statistical method, and include the file, is better.

It is seem to similar framework can be given by boost::signal boost::signal1<void, double> sig; sig.connect( &mean ); sig.connect( &variant ); sig.connect( &sum ); for_each(vec.begin(),vec.end(),boost::bind<void>( boost::ref(sig ),_1) );

In this case, we can add new components by just adding file.

But Boost.Accumulators is better than boost::signal in many points.

can describe dependency among components can divide incremental part and final part. Document is good. …

I will switch Torch to Boost.Accumulators.

Request: I can not find way bind covarite1. Plz give example for bind covarite1 The following code has compile error.

double d=0.0; accumulators::accumulator_set<double, accumulators::stats< accumulators::tag::covariance<double , accumulators::tag::covariate1> > > acc(accumulators::sample = d,accumulators::covariate1 = d);

std::for_each( boost::make_zip_iterator( boost::make_tuple(beg1, beg2) ), boost::make_zip_iterator( boost::make_tuple(end1, end2) ), boost::bind<void>( boost::ref(acc),(_1, accumulators::covariate1 = _2 ) ) ); _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Niitsuma Hirotaka

12:56 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

...

Just to make sure I don't read any intent into your post, do you recommend the accumulators library be added to boost?

John (Review Manager)

Yes I would like to vote adding to boost.

John Phillips

2:26 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Thanks for the reply. John Niitsuma Hirotaka wrote:

...

...
Just to make sure I don't read any intent into your post, do you recommend the accumulators library be added to boost?

John (Review Manager)

Yes I would like to vote adding to boost. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Eric Niebler

5:55 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Niitsuma Hirotaka wrote:

...

Boost.Accumulators is quit useful library. Until I find this library, I had used Torch http://www.torch.ch Torch also provides general framework for various statistical and machine-learning methods. Boost.Accumulators is more general framework. And also we can easily describe various statistical methods. In this framework, the statistical methods become readable components. (Torch's components are not readable)

But, in order to add a component requires covarite input, we need to modify statistics_fwd.hpp. (When I described join_histgram component, I need to modify statistics_fwd.hpp and depended other components density.hpp and so on .) I think this is not good. If possible, just add a file which describes a statistical method, and include the file, is better.

I'm not sure I understand. Why did you have to modify statistics_fwd.hpp? You shouldn't have to.

...

It is seem to similar framework can be given by boost::signal boost::signal1<void, double> sig; sig.connect( &mean ); sig.connect( &variant ); sig.connect( &sum ); for_each(vec.begin(),vec.end(),boost::bind<void>( boost::ref(sig ),_1) );

In this case, we can add new components by just adding file.

But Boost.Accumulators is better than boost::signal in many points.

can describe dependency among components can divide incremental part and final part.

This is all dynamically bound (slow) and doesn't not ensure that accumulators are updated in an order that corresponds to the dependency graph.

...

Document is good. …

I will switch Torch to Boost.Accumulators.

Request: I can not find way bind covarite1. Plz give example for bind covarite1 The following code has compile error.

double d=0.0; accumulators::accumulator_set<double, accumulators::stats< accumulators::tag::covariance<double , accumulators::tag::covariate1> > > acc(accumulators::sample = d,accumulators::covariate1 = d);

std::for_each( boost::make_zip_iterator( boost::make_tuple(beg1, beg2) ), boost::make_zip_iterator( boost::make_tuple(end1, end2) ), boost::bind<void>( boost::ref(acc),(_1, accumulators::covariate1 = _2 ) ) );

Boost.Bind is a wonderful library, but due to limitations in the language, there are some things it just doesn't handle well. Rather than tell you how to write the bind expression (it involves taking the address of the operator= of the covariate1 named parameter, yuk!), I'll suggest you write a plain for loop. -- Eric Niebler Boost Consulting www.boost-consulting.com

Niitsuma Hirotaka

4 Feb 4 Feb

9:45 a.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

...

...
But, in order to add a component requires covarite input, we need to modify statistics_fwd.hpp. (When I described join_histgram component, I need to modify statistics_fwd.hpp and depended other components density.hpp and so on .) I think this is not good. If possible, just add a file which describes a statistical method, and include the file, is better.

I'm not sure I understand. Why did you have to modify statistics_fwd.hpp? You shouldn't have to.

Let us consider the case covariance depend on p_square_cumulative_distribution and p_square_cumulative_distribution_of_variates Attached files are implimentation of the covariance depend on p_square_cumulative_distribution_of_variates. Plz try to compile these file without replacing statistics_fwd.hpp to attached file. In my environment, without modifing statistics_fwd.hpp , I can not compile these files. In order to compile, we need to add the following line in statistics_fwd.hpp namespace impl { template<typename Sample, typename Tag = tag::sample> struct p_square_cumulative_distribution_impl; }

Eric Niebler

6 Feb 6 Feb

7:13 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Niitsuma Hirotaka wrote:

...

...
...
But, in order to add a component requires covarite input, we need to modify statistics_fwd.hpp. (When I described join_histgram component, I need to modify statistics_fwd.hpp and depended other components density.hpp and so on .) I think this is not good. If possible, just add a file which describes a statistical method, and include the file, is better.

I'm not sure I understand. Why did you have to modify statistics_fwd.hpp? You shouldn't have to.

Let us consider the case covariance depend on p_square_cumulative_distribution and p_square_cumulative_distribution_of_variates

Attached files are implimentation of the covariance depend on p_square_cumulative_distribution_of_variates. Plz try to compile these file without replacing statistics_fwd.hpp to attached file. In my environment, without modifing statistics_fwd.hpp , I can not compile these files. In order to compile, we need to add the following line in statistics_fwd.hpp namespace impl { template<typename Sample, typename Tag = tag::sample> struct p_square_cumulative_distribution_impl; }

The forward declaration must appear *somewhere*. It doesn't have to be in statistics_fwd.hpp. As an end user of the statistical accumulators library, you do not have to modify any of the files in the library if you want to extend it by implementing your own stats. Your code would work just as well if you moved the forward declaration out of statistics_fwd.hpp and into your covariance2.hpp. -- Eric Niebler Boost Consulting www.boost-consulting.com

Ronald Garcia

1 Feb 1 Feb

8:35 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

On Jan 29, 2007, at 9:15 AM, John Phillips wrote:

...

The formal review of the Boost.Accumulators library, submitted by Eric Neibler begins today and ends on Wednesday, February 7.

This review is based on reading through the documentation for the accumulators library as well as reviews of the library written by others (including threads of conversation). This library seems to me to be clearly useful for addressing matters that involve streaming data. The design is sound and takes advantage of existing Boost libraries (in particular the Parameters and MPL) to provide a clean and extensible interface. Assuming that those who have experience using the library approve, and that issues of accuracy and interfaces for Sequences and Ranges are added, I vote that the library be accepted into Boost. ron

Paul A Bristow

5 Feb 5 Feb

2:38 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Phillips Sent: 29 January 2007 14:15 To: boost@lists.boost.org Subject: [boost] [Review] Review of the Accumulators library begins today,Jan 29

The formal review of the Boost.Accumulators library, submitted by Eric Neibler begins today and ends on Wednesday, February 7. The library is available from

http://boost-consulting.com/vault/index.php?directory=Math%20-%>20Numerics

Here are some questions you might want to answer in your review: . What is your evaluation of the design?

It looks to be a rather useful framework for handling incrementally arriving data. It may also prove convenient where the data could just as well be handling in a plain array, vector or similar format. (But what about compared to using Boost circular buffer?)

...

. What is your evaluation of the implementation?

I can see why it is done this way and believe it is sound, but at a price of looking pretty intimidating, though probably easy enough in practice. I worry slightly about compile time and perhaps run time costs, but worth it. Basic testing looks fine. It would be a mistake now to focus too much on the detailed implementation/accuracy of actual functions - it is the framework that matters now. However there is a danger, as with all statistical calculations, of confusing data with information. For example, hardly any useful *information* on skewness or kurtosis is likely to emerge from a handful of values, even if mathematically accurate. But the framework is templated so that one can envisage a floating-point type that adds uncertainty estimates as well as central values. So we would know that the skew is -0.5 (but + or - a very lot).

...

. What is your evaluation of the documentation?

Looks good, and has a good structure - but I haven't used it 'in anger' - when things are always missing/unclear.

...

. What is your evaluation of the potential usefulness of the library?

I am uncertain if it will meet a niche market - real-time systems seem the obvious applications, and yet I fear that those working on these small machines - the 'toaster market ;-) may take fright at the apparent complexity, especially if the chip lacks built-in floating point (but has a UDFPT).

...

. Did you try to use the library?

Not much - some examples worked OK MSVC 8.0.

...

. How much effort did you put into your evaluation?

Quickish read of docs, and some code and ran a few examples.

...

. Are you knowledgeable about the problem domain?

Not especially.

...

And finally, every review should answer this question: . Do you think the library should be accepted as a Boost library?

Yes - definitely. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Eric Niebler

6 Feb 6 Feb

7:21 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Paul A Bristow wrote:

...

...
From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of John Phillips

The formal review of the Boost.Accumulators library, submitted by Eric Neibler begins today and ends on Wednesday, February 7. The library is available from

http://boost-consulting.com/vault/index.php?directory=Math%20-%>20Numerics

Here are some questions you might want to answer in your review: . What is your evaluation of the design?

It looks to be a rather useful framework for handling incrementally arriving data.

It may also prove convenient where the data could just as well be handling in a plain array, vector or similar format. (But what about compared to using Boost circular buffer?)

I'm not sure I understand the question. Could you rephrase?

...

...
. What is your evaluation of the implementation?

I can see why it is done this way and believe it is sound, but at a price of looking pretty intimidating, though probably easy enough in practice.

I worry slightly about compile time and perhaps run time costs, but worth it.

Compile times are long, true. The runtime costs should be zero or close to it, and it's been reported here that on msvc, the runtime cost is indeed zero, at least for small-ish accumulator sets.

...

Basic testing looks fine.

It would be a mistake now to focus too much on the detailed implementation/accuracy of actual functions - it is the framework that matters now.

However there is a danger, as with all statistical calculations, of confusing data with information.

For example, hardly any useful *information* on skewness or kurtosis is likely to emerge from a handful of values, even if mathematically accurate.

True, but that's hardly a failing of Boost.Accumulators. :-)

...

But the framework is templated so that one can envisage a floating-point type that adds uncertainty estimates as well as central values. So we would know that the skew is -0.5 (but + or - a very lot).

...
. What is your evaluation of the documentation?

Looks good, and has a good structure - but I haven't used it 'in anger' - when things are always missing/unclear.

...
. What is your evaluation of the potential usefulness of the library?

I am uncertain if it will meet a niche market - real-time systems seem the obvious applications, and yet I fear that those working on these small machines - the 'toaster market ;-) may take fright at the apparent complexity, especially if the chip lacks built-in floating point (but has a UDFPT).

I would think in those markets, Boost.Accumulators would be especially appealing because of the pay-only-for-what-you-use nature of templates.

...

...
. Did you try to use the library?

Not much - some examples worked OK MSVC 8.0.

...
. How much effort did you put into your evaluation?

Quickish read of docs, and some code and ran a few examples.

...
. Are you knowledgeable about the problem domain?

Not especially.

...
And finally, every review should answer this question: . Do you think the library should be accepted as a Boost library?

Yes - definitely.

Thanks, Paul. -- Eric Niebler Boost Consulting www.boost-consulting.com

Cromwell Enage

7 Feb 7 Feb

7:13 a.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

...

What is your evaluation of the design?

Simple and straightforward. Its extensibility is a big plus. I am not a statistician, however, so I cannot judge its usability in that regard. I'm curious about the design of weighted samples in an accumulator_set. Can operations other than multiplication be applied to the weight? I also echo John Maddock's request for pushing the elements of a sequence to an accumulator. By logical extension, the ability to add objects of the same accumulator_set type together should be considered as well.

...

What is your evaluation of the implementation?

Nicely done. I'm sure it's first-rate.

...

What is your evaluation of the documentation?

Aack. In addition to the gentler introductory tutorial previously suggested by others, I would also like to see a motivation and/or rationale that is more explanatory than the "old adage", e.g. "Why Not Just A for Loop?" or "Going Beyond std::accumulate". Usually, I expect the reference documentation to be categorized by class and/or function instead of by header. Staring at a long list of #includes, even as well organized as they are, does not raise my confidence in my ability to comprehend the inner workings of a library like this one.

...

What is your evaluation of the potential usefulness of the library?

Even outside the field of statistics, I sense its great value in large-scale applications. However, for small-scale programs (like the neural network example I recently added to my as-yet-unannounced automata library), it's hard to beat the equivalent for loops in terms of readability and efficiency.

...

Did you try to use the library?

Tried and succeeded.

...

With what compiler?

GCC 3.4.5 (MinGW special)

...

Did you have any problems?

Not at this time, no.

...

How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I'd say a quick reading. Enough to comprehend the tutorials, then a few excursions within the reference material.

...

Are you knowledgeable about the problem domain?

I am familiar with the basics, so I'm not that intimidated by "kurtosis" and other finer details.

...

Do you think the library should be accepted as a Boost library?

The shape of its documentation is its biggest weakness right now, but it is outweighed by the robustness of its design. I know that the documentation can be improved and I trust that it will be improved. I vote yes. Cromwell D. Enage ____________________________________________________________________________________ Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. http://tools.search.yahoo.com/toolbar/features/mail/

Eric Niebler

7:50 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

Cromwell Enage wrote:

...

...
What is your evaluation of the design?

Simple and straightforward. Its extensibility is a big plus. I am not a statistician, however, so I cannot judge its usability in that regard.

I'm curious about the design of weighted samples in an accumulator_set. Can operations other than multiplication be applied to the weight?

Each accumulator is free to do anything with the weight parameter that it sees fit, including ignore it completely. But weighted samples have a well understood meaning in statistics, and giving it a different meaning would probably lead to confusion.

...

I also echo John Maddock's request for pushing the elements of a sequence to an accumulator. By logical extension, the ability to add objects of the same accumulator_set type together should be considered as well.

The first is a simple extension. The second, no less useful, is less simple, but not impossible, AFAIK.

...

...
What is your evaluation of the implementation?

Nicely done. I'm sure it's first-rate.

...
What is your evaluation of the documentation?

Aack.

In addition to the gentler introductory tutorial previously suggested by others, I would also like to see a motivation and/or rationale that is more explanatory than the "old adage", e.g. "Why Not Just A for Loop?" or "Going Beyond std::accumulate".

Agreed.

...

Usually, I expect the reference documentation to be categorized by class and/or function instead of by header. Staring at a long list of #includes, even as well organized as they are, does not raise my confidence in my ability to comprehend the inner workings of a library like this one.

I actually agree with this, but it's not a problem specific to the Accumulators reference section. I'm using the standard Doxygen/BoostBook integration that is part of Boost's documentation tool chain. It has been observed before that the header-based categorization is less than ideal, but nobody has stepped up to improve it. This would be an opportunity for someone to make a huge contribution to Boost. Takers?

...

...
What is your evaluation of the potential usefulness of the library?

Even outside the field of statistics, I sense its great value in large-scale applications. However, for small-scale programs (like the neural network example I recently added to my as-yet-unannounced automata library), it's hard to beat the equivalent for loops in terms of readability and efficiency.

Hand-coded loops are the gold-standard for performance, that's true, but ideally a higher-level abstraction should be more readable, not less. And templates let us have the abstraction without the penalty. IMO, it's a matter of familiarity. People new to STL might feel that a for-loop is more readable that a call to std::transform(), for instance, but not me.

...

...
Did you try to use the library?

Tried and succeeded.

...
With what compiler?

GCC 3.4.5 (MinGW special)

...
Did you have any problems?

Not at this time, no.

...
How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I'd say a quick reading. Enough to comprehend the tutorials, then a few excursions within the reference material.

...
Are you knowledgeable about the problem domain?

I am familiar with the basics, so I'm not that intimidated by "kurtosis" and other finer details.

...
Do you think the library should be accepted as a Boost library?

The shape of its documentation is its biggest weakness right now, but it is outweighed by the robustness of its design. I know that the documentation can be improved and I trust that it will be improved. I vote yes.

Thanks, Cromwell. -- Eric Niebler Boost Consulting www.boost-consulting.com

Paul A Bristow

8 Feb 8 Feb

4:27 p.m.

New subject: [Review] Review of the Accumulators library begins today, Jan 29

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Eric Niebler Sent: 07 February 2007 19:50 To: boost@lists.boost.org Subject: Re: [boost] [Review] Review of the Accumulators library begins today, Jan 29

...

It has been observed before that the header-based categorization is less than ideal, but nobody has stepped up to improve it. This would be an opportunity for someone to make a huge contribution to Boost. Takers?

I would also re-iterate my wish for a general INDEX of documentation(s). Even with documentation that I have partly written, I have resorted to using the Adobe Reader Find facility on the pdf version in order to locate where a piece(s) of information is hiding! For the Boost docs as a whole, Google is helpful, but of course covers ALL the Boost docs. For documentation on ones own machine, Google Desktop may be some help - but again it it is usually not specific enough. I anyone has any ideas or experience, I feel this would be really useful. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Robert Ramey

5:46 p.m.

New subject: [Review] Review of the Accumulators library beginstoday, Jan 29

Paul A Bristow wrote:

...

I anyone has any ideas or experience, I feel this would be really useful.

Perhaps you might consider a system similar to that used by the serialiation library. I make a separate pane to hold the document outline/navigation. Subsequent to that Jonathon Turkanis submitted a better more general version of this. I've always hoped that someone would write an XSLT script that would automatically generate this navigation pane for each library as well as for boost as whole. It seems to me that Jonathon already did the heavy lifting here and with just a little bit more effort this could be accomplished. It hasn't happened but maybe someday ... Robert Ramey

Hans Meine

7 Feb 7 Feb

9:48 a.m.

New subject: Accumulators library documentation review

Hi Eric, John, I wanted to send a full review for the library, but it looks as if I need a boost CVS checkout for that (for boost/typeof/typeof.hpp), and I am very short on time right now. So I at least send what I already have: Lots of comments on the documentation: Comments on the Documentation ============================= (Disclaimer: In some of the following, I purposely pretended to be dumber than I am, in order to show where the explanations could use some improvements. ;-) ) In the "Passing Optional Parameters", I read about tail and tail_variate. Not being an expert in statistics, I had problems understanding the example, so I tried to find out what tail_variate<> does. Unfortunately, I could only find out that it takes three template parameters (VariateType, VariateTag, LeftRight), but I could not even see what they mean. LeftRight is passed to tail<>, and then to tail_cache_size_named_arg<>, but I still do not see what the left/right is for. -> Update: I found the answers in the Statistical Accumulators Library's documentation. It would be much better AFAICS if the links of tail<> and tail_variate<> would point to that appropriate documentation. Having said that, I think it could be useful to include links to external sites explaining some of the statistics terms, if you know such. I bet there are more people like me, who have only little more than basic knowledge in statistics but who are interested in the more complicated features once they start using your library. A comment in a code example reads:

...

All accumulators should inherit from accumulator_base. Add a small note somewhere why this is necessary / what is inherited? Also does "should" mean that this is mandatory? (Update: OK, now I saw the note later down in the docs about the default operator(). Still I don't know if that's all.)

Then you wrote:

...

Although not necessary, it can be a good idea to put your accumulator implementations in the boost::accumulators::impl namespace. This namespace pulls in any operators defined in the boost::numeric::operators namespace with a using directive. The Numeric Operators Sub-Library defines some additional overloads that will make your accumulators work with all sorts of data types. It does not look clean to me to put my own stuff in someone else's namespaces; is there a problem with just "using namespace numeric::operators;"? (OK, which? ;-) )

Documentation of the Accumulator Concept: Maybe link back to "Optional Accumulator Member Functions" for post_construct and on_drop? (As it stands, one can understand the concept definition only after reading much of the documentation before, although one might want to quickly check the concepts - possibly when coming from the first reference to the concepts in the docs, which is before the explanations.) In the Feature Concept, I cannot understand the explanation of F::is_weight_accumulator - it seems to me as if a second occurence of S (template parameter?) is missing somewhere? Ah, now I see:

...

The weight accumulators are made external if the weight type is specified using the external<> template. Still, that part of the API is missing an example, isn't it? I have difficulties understanding when and how to use external<>.

At the end, just above the TODO, there are two items:

...

* Mapping multiple impls to the same feature with feature_of * Creating aliases for features with as_feature I guess these are also placeholders for future doc snippets?

The Statistical Accumulators Library ==================================== I was very glad to find formulas and article references for non-trivial accumulators like the p^2 median estimators. How about sth. like that for more of the accumulators (e.g. "kurtosis", or even for mean and variants like immediate_mean)? I also wondered why you always write

...

For implementation details, see foo_impl. and "hide" the formulas there?

I think the statistical accumulators reference docs can be greatly improved with a better structuring. Also, I very much dislike the look in my browser (Konqueror), but I guess it may be the fault of some "standard boost CSS". Looking into it right now, it seems that since "The Statistical Accumulators Library" is a subsection of the user's guide, all sections inside it become either <h3> or <h4> and thus nearly-indistinguishably small (hardly larger font sizes than the usual text). That should be probably discussed in a separate thread to get some attention by the responsible... Greetings, Hans

Eric Niebler

8:26 p.m.

New subject: Accumulators library documentation review

Hans Meine wrote:

...

Hi Eric, John,

I wanted to send a full review for the library, but it looks as if I need a boost CVS checkout for that (for boost/typeof/typeof.hpp), and I am very short on time right now.

Yes, sorry. It works with the RC of 1.34 and with CVS HEAD.

...

So I at least send what I already have: Lots of comments on the documentation:

Comments on the Documentation =============================

(Disclaimer: In some of the following, I purposely pretended to be dumber than I am, in order to show where the explanations could use some improvements. ;-) )

In the "Passing Optional Parameters", I read about tail and tail_variate. Not being an expert in statistics, I had problems understanding the example, so I tried to find out what tail_variate<> does. Unfortunately, I could only find out that it takes three template parameters (VariateType, VariateTag, LeftRight), but I could not even see what they mean. LeftRight is passed to tail<>, and then to tail_cache_size_named_arg<>, but I still do not see what the left/right is for. -> Update: I found the answers in the Statistical Accumulators Library's documentation. It would be much better AFAICS if the links of tail<> and tail_variate<> would point to that appropriate documentation.

Noted.

...

Having said that, I think it could be useful to include links to external sites explaining some of the statistics terms, if you know such. I bet there are more people like me, who have only little more than basic knowledge in statistics but who are interested in the more complicated features once they start using your library.

It would be nice to include a link to a good online statistics reference. Can anybody recommend one?

...

A comment in a code example reads:

...
All accumulators should inherit from accumulator_base. Add a small note somewhere why this is necessary / what is inherited? Also does "should" mean that this is mandatory? (Update: OK, now I saw the note later down in the docs about the default operator(). Still I don't know if that's all.)

I'll fix that.

...

Then you wrote:

...
Although not necessary, it can be a good idea to put your accumulator implementations in the boost::accumulators::impl namespace. This namespace pulls in any operators defined in the boost::numeric::operators namespace with a using directive. The Numeric Operators Sub-Library defines some additional overloads that will make your accumulators work with all sorts of data types. It does not look clean to me to put my own stuff in someone else's namespaces; is there a problem with just "using namespace numeric::operators;"? (OK, which? ;-) )

Yeah, that's a bit untidy. I should change the recommendation to putting your accumulator implementations into your own impl namespace and putting a "using namespace boost::numeric::operators;" directive in your namespace.

...

Documentation of the Accumulator Concept: Maybe link back to "Optional Accumulator Member Functions" for post_construct and on_drop? (As it stands, one can understand the concept definition only after reading much of the documentation before, although one might want to quickly check the concepts - possibly when coming from the first reference to the concepts in the docs, which is before the explanations.)

Good suggestion.

...

In the Feature Concept, I cannot understand the explanation of F::is_weight_accumulator - it seems to me as if a second occurence of S (template parameter?) is missing somewhere?

Ah, now I see:

...
The weight accumulators are made external if the weight type is specified using the external<> template. Still, that part of the API is missing an example, isn't it? I have difficulties understanding when and how to use external<>.

Yes, John already noted that I was missing an example of using an external weight accumulator. That should make it clear(er).

...

At the end, just above the TODO, there are two items:

...
* Mapping multiple impls to the same feature with feature_of * Creating aliases for features with as_feature I guess these are also placeholders for future doc snippets?

Whoops. Yes, that's a TODO list that never got fleshed out. Thanks for spotting that.

...

The Statistical Accumulators Library ====================================

I was very glad to find formulas and article references for non-trivial accumulators like the p^2 median estimators. How about sth. like that for more of the accumulators (e.g. "kurtosis", or even for mean and variants like immediate_mean)?

I'll see if I can convince someone who knows LaTeX to write some more formulas for me.

...

I also wondered why you always write

...
For implementation details, see foo_impl. and "hide" the formulas there?

That's due to some shortcomings in our documentation tool chain. It's pretty easy to get doxygen to generate the formulas from LaTeX embedded in C++ comments. Anything else involves hacking LaTeX and our tool chain. I know that's a lame answer, sorry. :-P

...

I think the statistical accumulators reference docs can be greatly improved with a better structuring.

As I mention in a previous reply, I use Boost's Doxygen/BoostBook tool chain to generate the reference section. I'd hate to abandon that approach, because it gives all Boost docs a uniform structure. It so happens that the structure is uniformly /bad/, and making it good would take a lot of work. Not sure what to do about that.

...

Also, I very much dislike the look in my browser (Konqueror), but I guess it may be the fault of some "standard boost CSS". Looking into it right now, it seems that since "The Statistical Accumulators Library" is a subsection of the user's guide, all sections inside it become either <h3> or <h4> and thus nearly-indistinguishably small (hardly larger font sizes than the usual text). That should be probably discussed in a separate thread to get some attention by the responsible...

That sounds like a bug in boost.css. Could you report it to boost-docs@lists.sourceforge.net? -- Eric Niebler Boost Consulting www.boost-consulting.com

6740

Age (days ago)

6750

Last active (days ago)

List overview

Download

16 comments

8 participants

participants (8)

Cromwell Enage
Eric Niebler
Hans Meine
John Phillips
Niitsuma Hirotaka
Paul A Bristow
Robert Ramey
Ronald Garcia