Hi, Does Boost Math Library support Statistic Calculation? (e.g. median, std deviation, etc)?
From here, it does not appear Boost Math library calculate has statistic algorithm. If not, can someone please recommend some c/c++ library which does statistics?
Thank you. http://www.boost.org/libs/libraries.htm#Algorithms Math and numerics * math - Several contributions in the domain of mathematics, from various authors. * numeric/conversion - Optimized Policy-based Numeric Conversions, from Fernando Cacciola. * integer - Headers to ease dealing with integral types. * interval - Extends the usual arithmetic functions to mathematical intervals, from Guillaume Melquiond, Hervé Brönnimann and Sylvain Pion. * math/common_factor - Greatest common divisor and least common multiple, from Daryle Walker. * math/octonion - Octonions, from Hubert Holin. * math/quaternion - Quaternions, from Hubert Holin. * math/special_functions - Mathematical special functions such as atanh, sinc, and sinhc, from Hubert Holin. * multi_array - Multidimensional containers and adaptors for arrays of contiguous data, from Ron Garcia. * operators - Templates ease arithmetic classes and iterators, from Dave Abrahams and Jeremy Siek. * random - A complete system for random number generation, from Jens Maurer. * rational - A rational number class, from Paul Moore. * uBLAS - Basic linear algebra for dense, packed and sparse matrices, from Joerg Walter and Mathias Koch. __________________________________________ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com
Does Boost Math Library support Statistic Calculation? (e.g. median, std deviation, etc)?
No not at present, but I've been playing with the attached (very simple) code for this purpose, I may make this a Boost submission at some point (or then again may not!). But in the mean time feel free to use for whatever purpose you want, and let me know if it meets your needs or not: you'll notice it doesn't calculate the median, but does do mean, variance, min/max values, rms mean etc. It doesn't do the median BTW 'cos it would have to remember all the values, so if that's the statistic you want I suggest you just sort your data :-) Regards, John Maddock
yinglcs2@yahoo.com wrote:
Hi,
Does Boost Math Library support Statistic Calculation? (e.g. median, std deviation, etc)?
From here, it does not appear Boost Math library calculate has statistic algorithm. If not, can someone please recommend some c/c++ library which does statistics?
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/libs/stat/... http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/libs/stat/test/regression.cpp?rev=1.1&view=markup -Thorsten
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here:
Otto: first off we really need something like this, in Boost and in the std as well IMO. It really galls me that my desktop calculator has more math functions than <cmath> has, and wait for it: my calculator is 25 years old !!!!!!!! However, I'm not sure that either of us has the right interface yet: I'm particularly concerned that you're making these algorithms, it means that if you want to access more than one statistic you have to make multiple passes over the data. The advantage of the "make it an object" approach is that pretty much all stats you could want are accessible after a single pass over the data. More than that you can: * Pause at any time and read off the stats, and then continue adding more data if you want. * An extension of the above would be to make the stats object serialisable. * Two or more objects can be "added" together to obtain the stats for the combined data without reaccessing the original data: imagine a weather station gathering temperature data over time: hourly stats can be combined into daily or weekly stats without going back to the original data - which may be either discarded (unlikely) or stored in offline storage. Unfortunately: this method is prone to numerical overflow/underflow :-( Knuth has some fancy algorithms that I believe avoid that, but then you loose the simplicity and checkpointing/additive behaviour of the "scorecard" based system. So as usual, there ain't no free lunch! John.
John Maddock wrote:
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here:
Otto: first off we really need something like this, in Boost and in the std as well IMO. It really galls me that my desktop calculator has more math functions than <cmath> has, and wait for it: my calculator is 25 years old !!!!!!!!
I would support the need for such a library as I have ended up writing something similar to this for nearly every project I have undertaken.
...The advantage of the "make it an object" approach is that pretty much all stats you could want are accessible after a single pass over the data. More than that you can:
* Pause at any time and read off the stats, and then continue adding more data if you want.
IMHO that is the right approach.
* An extension of the above would be to make the stats object serialisable. * Two or more objects can be "added" together to obtain the stats for the combined data without reaccessing the original data: imagine a weather station gathering temperature data over time: hourly stats can be combined into daily or weekly stats without going back to the original data - which may be either discarded (unlikely) or stored in offline storage.
Just think of the uses for processing stats and network stats in high reliability systems... Can't wait :-) Jim
John Maddock wrote:
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here:
Otto: first off we really need something like this, in Boost and in the std as well IMO. It really galls me that my desktop calculator has more math functions than <cmath> has, and wait for it: my calculator is 25 years old !!!!!!!!
I don't disagree with that :-) I dont' plan to work on this myself in the next year; I have enough to do. We should have an official list of volunteers that where looking for ways to contribute.
However, I'm not sure that either of us has the right interface yet:
I'm not satisfied with my own use of tuple as return-values since it so easy to forget which tuple element is what. I would prefer named tuples or simply template< class T > struct least_square_result { T slope, intersection, correlation; }; etc.
I'm particularly concerned that you're making these algorithms, it means that if you want to access more than one statistic you have to make multiple passes over the data.
Right. It's a trade-off. OTOH, if you only call one algorithm, you don't want to pay for accumulation that is not used. So maybe we need algorithms that takes iterators and algorithms that take some kind of accumulator object.
The advantage of the "make it an object" approach is that pretty much all stats you could want are accessible after a single pass over the data. More than that you can:
* Pause at any time and read off the stats, and then continue adding more data if you want.
Right, this could be very useful.
* An extension of the above would be to make the stats object serialisable. * Two or more objects can be "added" together to obtain the stats for the combined data without reaccessing the original data: imagine a weather station gathering temperature data over time: hourly stats can be combined into daily or weekly stats without going back to the original data - which may be either discarded (unlikely) or stored in offline storage.
Unfortunately: this method is prone to numerical overflow/underflow :-(
How can this be different than just accumulating it all from scratch? (Or is it the accumulator method in general that is error-prone?) -Thorsten
Right. It's a trade-off. OTOH, if you only call one algorithm, you don't want to pay for accumulation that is not used. So maybe we need algorithms that takes iterators and algorithms that take some kind of accumulator object.
Or you could parameterise the accumulator so it only accumuates what you need.
Unfortunately: this method is prone to numerical overflow/underflow :-(
How can this be different than just accumulating it all from scratch? (Or is it the accumulator method in general that is error-prone?)
Yes, the method we're both using I think is the "schoolboy" accumulate the sum of the squares method. It's perfectly good enough for many purposes but Knuth has an alternative that accumulates the sum of the differences from the "working" mean only, see the last method at http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance I haven't investigated this at all except to note that it exists, and actually looking again it might support most of the operations that the current methodology does (not sure about rms mean and the S N-1 "unbiased" variance though). John.
John Maddock wrote:
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here:
Otto: first off we really need something like this, in Boost and in the std as well IMO. It really galls me that my desktop calculator has more math functions than <cmath> has, and wait for it: my calculator is 25 years old !!!!!!!!
However, I'm not sure that either of us has the right interface yet: I'm particularly concerned that you're making these algorithms, it means that if you want to access more than one statistic you have to make multiple passes over the data. The advantage of the "make it an object" approach is that pretty much all stats you could want are accessible after a single pass over the data. More than that you can:
* Pause at any time and read off the stats, and then continue adding more data if you want.
I have been working on a statistical library based on an incremental
calculation model. The syntax is like this:
// define an accumulator set for calculating mean and max
// of a sequence of doubles
accumulator_set
Eric Niebler wrote:
I have been working on a statistical library based on an incremental calculation model. The syntax is like this:
// define an accumulator set for calculating mean and max // of a sequence of doubles accumulator_set
> acc; // push some data into the accumulator acc(1.2); acc(2.3); acc(3.4);
// fetch some intermediate results std::cout << mean(acc) << std::endl;
/* push more data, etc. */
Some statistics may depend on the results of other statistics, in which case they are calculated automatically and in the correct order.
The set of statistical accumulators is extensible, of course.
I hope to have a version available online Real Soon Now.
I've now put what I have in the Boost File Vault. You can find it under the name stats.tar.gz at the following location: http://www.boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics& No docs, yet -- working on that right now. But the tests might give some inkling of how to use it. Feedback welcome. -- Eric Niebler Boost Consulting www.boost-consulting.com
I've now put what I have in the Boost File Vault. You can find it under the name stats.tar.gz at the following location:
http://www.boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics&
No docs, yet -- working on that right now. But the tests might give some inkling of how to use it.
Very cool, go boy go! :-) I haven't done much more than quickly browse the docs and a couple of test programs, but it looks like you have everything covered to me. John.
John Maddock wrote:
I've now put what I have in the Boost File Vault. You can find it under the name stats.tar.gz at the following location:
http://www.boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics&
No docs, yet -- working on that right now. But the tests might give some inkling of how to use it.
Very cool, go boy go! :-)
I haven't done much more than quickly browse the docs and a couple of test programs, but it looks like you have everything covered to me.
Can one calculate covariance and simple linear regression (correlation, intersection, slope)? -Thorsten
Thorsten Ottosen wrote:
John Maddock wrote:
I've now put what I have in the Boost File Vault. You can find it under the name stats.tar.gz at the following location:
http://www.boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics&
No docs, yet -- working on that right now. But the tests might give some inkling of how to use it.
Very cool, go boy go! :-)
I haven't done much more than quickly browse the docs and a couple of test programs, but it looks like you have everything covered to me.
Can one calculate covariance and simple linear regression (correlation, intersection, slope)?
Not yet, but adding new statistical accumulators is pretty easy. Patches would be graciously accepted. -- Eric Niebler Boost Consulting www.boost-consulting.com
On 1/12/06, Eric Niebler
Can one calculate covariance and simple linear regression (correlation, intersection, slope)?
Not yet, but adding new statistical accumulators is pretty easy. Patches would be graciously accepted.
I'm working on converting a median accumulator algorithm I wrote a little while back to your framework. It does however require some parameters to optimize the performance. How is that handled in your framework? Mike Gibson megibson@gmail.com
Mike Gibson wrote:
On 1/12/06, Eric Niebler
wrote: Can one calculate covariance and simple linear regression (correlation, intersection, slope)?
Not yet, but adding new statistical accumulators is pretty easy. Patches would be graciously accepted.
I'm working on converting a median accumulator algorithm I wrote a little while back to your framework.
Great!
It does however require some parameters to optimize the performance. How is that handled in your framework?
I'm actively working on the docs, updated daily at http://boost-sandbox.sf.net/libs/accumulators. In particular, have a look at http://tinyurl.com/dpzmt. Basically, you'll use named parameters from the Boost.Parameters library. Have a look at how order_impl is implemented to accept the cache_size named parameter. I need to add an Extensibility section to the docs that describes this in detail. -- Eric Niebler Boost Consulting www.boost-consulting.com
John Maddock wrote:
I have some simple stuff in the sandbox under the "stat" directory. You can view its usage here:
Otto: first off we really need something like this, in Boost and in the std as well IMO.
First, I'm strongly interested in such library for statistics calculations in std or boost. I'm going to test frequently what you're working on. Second, I think it may be interesting to take a look at the R Project http://www.r-project.org. May be, instead of inventing the wheel, we could provide a kind of interoperability between C++ and R. I have in mind something like Boost.Python. R is a very rich statistical computing package what could shorten the time-to-market of statistical package for boost. What do you think? Cheers -- Mateusz Łoskot http://mateusz.loskot.net
First, I'm strongly interested in such library for statistics calculations in std or boost. I'm going to test frequently what you're working on.
Second, I think it may be interesting to take a look at the R Project http://www.r-project.org. May be, instead of inventing the wheel, we could provide a kind of interoperability between C++ and R. I have in mind something like Boost.Python. R is a very rich statistical computing package what could shorten the time-to-market of statistical package for boost.
What do you think?
I think it needs someone to step up and volunteer to write it :-) John.
John Maddock wrote:
First, I'm strongly interested in such library for statistics calculations in std or boost. I'm going to test frequently what you're working on.
Second, I think it may be interesting to take a look at the R Project http://www.r-project.org. May be, instead of inventing the wheel, we could provide a kind of interoperability between C++ and R. I have in mind something like Boost.Python. R is a very rich statistical computing package what could shorten the time-to-market of statistical package for boost.
What do you think?
I think it needs someone to step up and volunteer to write it :-)
Yes, but unfortunately I'm not the right person to start such work. I'm a pretty new Boost user and meta-programming in C++, so the design (first steps) could be a challenge for me. Inspite of that I'm still ready to add my 5 cents to such project of statistical library for Boost. Cheers -- Mateusz Łoskot http://mateusz.loskot.net
participants (7)
-
Eric Niebler
-
Jim Douglas
-
John Maddock
-
Mateusz Łoskot
-
Mike Gibson
-
Thorsten Ottosen
-
yinglcs2@yahoo.com