
I'm wondering if anyone has started a statistical analysis library at all? Something I might be interesting in starting if not. It could include features such as: sample mean, alpha-trimmed mean, weighted-mean, geometric-mean, harmonic-mean, mode, median, midrange. mean deviation, sample std deviation, RMS, sample range, interquartile range higher order sample moments maximum likelihood estimators (and other estimators for various distributions), confidence intervals, hypothesis testing linear regression, ANOVA even Any interested in such a lib? (i think i saw a probability lib somewhere, it could build on that as well). Chris

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Chris Fairles Sent: 02 August 2007 15:45 To: boost@lists.boost.org Subject: [boost] interest in a stats library?
I'm wondering if anyone has started a statistical analysis library at all? Something I might be interesting in starting if not. It could include features such as:
sample mean, alpha-trimmed mean, weighted-mean, geometric-mean, harmonic-mean, mode, median, midrange. mean deviation, sample std deviation, RMS, sample range, interquartile range higher order sample moments maximum likelihood estimators (and other estimators for various distributions), confidence intervals, hypothesis testing linear regression, ANOVA even
Any interested in such a lib? (i think i saw a probability lib somewhere, it could build on that as well).
A statistics is, of course, always of interest, but first of all you will want to look at the Boost Vault at John Maddock's Math Toolkit http://tinyurl.com/3y6jds This has been reviewed and accepted and we are currently nearing completion of work suggested during the review to add fine-grained control of error-handling. It should be in the next Boost release and be available for use in the sandbox soon. You can still start digesting the extensive documentation now ;-) This toolkit has all the math functions & distributions you could ever want to produce more detailed statistics packages, such as you list. (A Windows applet to calculate all the properties of distributions has also been produced and will also be freely available when we find a home for it). It would probably also be useful to add the probability/likehood concepts from Brook Milligan boost.probability-0.2.2.tar.gz The main documentation is available at http://biology.nmsu.edu/software/probability/ You will also want to study Eric Neibler's Time Series accumulator framework, currently under review. This deals with getting your data, perhaps arriving in real-time, in the 'right place'. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Chris Fairles wrote:
I'm wondering if anyone has started a statistical analysis library at all? Something I might be interesting in starting if not. It could include features such as:
sample mean, alpha-trimmed mean, weighted-mean, geometric-mean, harmonic-mean, mode, median, midrange. mean deviation, sample std deviation, RMS, sample range, interquartile range higher order sample moments maximum likelihood estimators (and other estimators for various distributions), confidence intervals, hypothesis testing linear regression, ANOVA even
Any interested in such a lib? (i think i saw a probability lib somewhere, it could build on that as well).
As Paul has already said, there is quite a lot of work already done in this area: Eric Niebler's accumulator and time series lib's handle a lot of the data collection / descriptive statistic calculation tasks, and the Math/Distributions library provides all the underlying math code needed to perform tests etc as well as some fairly comprehensive examples of doing so. There is still a need to tie some of this together with some high level interfaces for ANOVA and hypothesis testing etc though should you wish to rise to the challenge :-) John. HTH, John.

Thanks guys. I took a quick look and it looks like most of the low level stuff is done through math utils, probability and time series. I'll dig into those libs a little more and find out what implementation details there are when interfacing with them. Cheers, Chris On 8/2/07, John Maddock <john@johnmaddock.co.uk> wrote:
Chris Fairles wrote:
I'm wondering if anyone has started a statistical analysis library at all? Something I might be interesting in starting if not. It could include features such as:
sample mean, alpha-trimmed mean, weighted-mean, geometric-mean, harmonic-mean, mode, median, midrange. mean deviation, sample std deviation, RMS, sample range, interquartile range higher order sample moments maximum likelihood estimators (and other estimators for various distributions), confidence intervals, hypothesis testing linear regression, ANOVA even
Any interested in such a lib? (i think i saw a probability lib somewhere, it could build on that as well).
As Paul has already said, there is quite a lot of work already done in this area: Eric Niebler's accumulator and time series lib's handle a lot of the data collection / descriptive statistic calculation tasks, and the Math/Distributions library provides all the underlying math code needed to perform tests etc as well as some fairly comprehensive examples of doing so.
There is still a need to tie some of this together with some high level interfaces for ANOVA and hypothesis testing etc though should you wish to rise to the challenge :-)
John. HTH, John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 2 Aug 2007, at 08:45, Chris Fairles wrote:
I'm wondering if anyone has started a statistical analysis library at all? Something I might be interesting in starting if not. It could include features such as:
sample mean, alpha-trimmed mean, weighted-mean, geometric-mean, harmonic-mean, mode, median, midrange. mean deviation, sample std deviation, RMS, sample range, interquartile range higher order sample moments maximum likelihood estimators (and other estimators for various distributions),
This should all be in the accumulator library, or very easy to add.
confidence intervals,
can be added easily
hypothesis testing linear regression, ANOVA even
should be no problem to add that on top of the accumulator and math toolkit libraries. Matthias
participants (4)
-
Chris Fairles
-
John Maddock
-
Matthias Troyer
-
Paul A Bristow