
Stjepan Rajko wrote:
2008/11/19 Thijs van den Berg <thijs@sitmo.com>:
I see a couple of things that we could start working on, perhaps you have other/additional idea's!
* define a name for the sandbox folder & create it.
How about sandbox/multivariate_distributions ?
very good!
If this is geared towards Boost.Math, we should probably consider that library's directory structure. Currently, the statistical distributions files appear to be placed as follows:
include files: ..../boost/include/math/distributions docs: ..../libs/math/distributions/doc/sf_and_dist tests: ..../libs/math/distributions/test examples: ..../libs/math/distributions/example
I'm not sure whether we should re-use all of those directories or change some (or all) of them.
I agree, let's keep it simple for a start-we are in the sandbox- put the doc & code directly in sandbox/multivariate_distributions? We can also split things up once we have a start
* start a doc in that folder where we collect the details: interfaces, function, equations, algorithms. Can that be done in Latex & compiled pdf ? What would be a good doc format?
I would recommend quickbook, like John suggested. I can set up the basic files for a starting docs build once we decide on the directory structure.
I like quickbook too, I'm almost ready installing the generation tools. It would be great if you could set up the basic files!
---- These first two are probably the best way to start... John Maddock suggest starting with docs, I agree with that, that should be covered with these first two points! More thing that we will need to do are:
* define a list of generic function for generic multivariate densities (non member properties) along the lines of this: http://www.boost.org/doc/libs/1_37_0/libs/math/doc/sf_and_dist/html/math_too...
I'd suggest following John's suggestion in starting with the subset of that list that applies to multivariate distributions. If we start adding things and this ends up in Boost.Math, then the same things we add for multivariate distributions should also probably be added for the univariate distributions (if they apply) for consistency.
very good, but there will indeed be some things that only apply to multivariate
some things that "I" need to implement -as a user- for some other project can be seen in this list http://www.cs.toronto.edu/~roweis/notes/gaussid.pdf ..and there are many more things being used related to multivariate Gaussians. E.g. a lot of machine learning project work with multivariate Gaussians -they need parameter estimation from data- Some of these things might be too specific to add to boost distributions, and could fill up a whole "Gaussian lib" in itself! I don't know.
I think parameter estimation from data would be a very useful thing to add, but if we do we should keep all distributions in mind.
Yes, and that a big extension! For consistency, this would imply that all current univariate distributions would also need param estrimation. Another feature used a lot (in code) is to draw random samples from distributions. Anther possibility I see is to keep some of those things ouside this scope, and put it somewhere else.
We might also look at other mathematic packages like Matlab, R, Octave to see what they do with multivariate distributions.
* using that list, we will see what type of matrix operators we will need, and that will allow us think about either between making a dependency to ublas & other, or keep it void from external dependencies & implement it ourselves.
I think the concept-based way is the way to go. We can let the user provide the matrix type, as long as it provides the operations we need. Maybe we can use ublas matrices as the default type if it is sufficient, since that is already in boost (and header-only), and maybe test with some other libraries just to make sure we're not requiring syntax that is too ublas-specific.
Yes I agree!
Best,
Stjepan