Re: [boost] math statistical distribution: multivariate gaussian

19 Nov 2008


      Stjepan Rajko wrote:
...
2008/11/19 Thijs van den Berg <thijs@sitmo.com>:
...
I see a couple of things that we could start working on, perhaps you have
other/additional idea's!
* define a name for the sandbox folder & create it.
How about sandbox/multivariate_distributions ?
very good!
...
If this is geared towards Boost.Math, we should probably consider that
library's directory structure.  Currently, the statistical
distributions files appear to be placed as follows:
include files: ..../boost/include/math/distributions
docs: ..../libs/math/distributions/doc/sf_and_dist
tests: ..../libs/math/distributions/test
examples: ..../libs/math/distributions/example
I'm not sure whether we should re-use all of those directories or
change some (or all) of them.
I agree, let's keep it simple for a start-we are in the sandbox- put the 
doc & code directly in sandbox/multivariate_distributions? We can also 
split things up once we have a start
...
...
* start a doc in that folder where we collect the details: interfaces,
function, equations, algorithms.
Can that be done in Latex & compiled pdf ? What would be a good doc format?
I would recommend quickbook, like John suggested.  I can set up the
basic files for a starting docs build once we decide on the directory
structure.
I like quickbook too, I'm almost ready installing the generation tools. 
It would be great if you could set up the basic files!
...
...
----
These first two are probably the best way to start... John Maddock suggest
starting with docs, I agree with that, that should be covered with these
first two points! More thing that we will need to do are:
* define a list of generic function for generic multivariate densities (non
member properties) along the lines of  this:
http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too...
I'd suggest following John's suggestion in starting with the subset of
that list that applies to multivariate distributions.  If we start
adding things and this ends up in Boost.Math, then the same things we
add for multivariate distributions should also probably be added for
the univariate distributions (if they apply) for consistency.
very good, but there will indeed be some things that only apply to 
multivariate
...
...
some things that "I" need to implement -as a user- for some other project
can be seen in this list
http://www.cs.toronto.edu/~roweis/notes/gaussid.pdf
..and there are many more things being used related to multivariate
Gaussians. E.g. a lot of machine learning project work with multivariate
Gaussians -they need parameter estimation from data- Some of these things
might be too specific to add to boost distributions, and could fill up a
whole "Gaussian lib" in itself! I don't know.
I think parameter estimation from data would be a very useful thing to
add, but if we do we should keep all distributions in mind.
Yes, and that a big extension! For consistency, this would imply that 
all current univariate distributions would also need param estrimation. 
Another feature used a lot (in code) is to draw random samples from 
distributions. Anther possibility I see is to keep some of those things 
ouside this scope, and put it somewhere else.
...
...
We might also look at other mathematic packages like Matlab, R, Octave to
see what they do with multivariate distributions.
* using that list, we will see what type of matrix operators we will need,
and that will allow us think about either between making a dependency to
ublas & other, or keep it void from external dependencies & implement it
ourselves.
I think the concept-based way is the way to go.  We can let the user
provide the matrix type, as long as it provides the operations we
need.  Maybe we can use ublas matrices as the default type if it is
sufficient, since that is already in boost (and header-only), and
maybe test with some other libraries just to make sure we're not
requiring syntax that is too ublas-specific.
Yes I agree!
...
Best,
Stjepan