RE: [boost] Proposed Boost Library for Data Series Analysis

Response to Matthew Hurd post:
Efficiency is often paramount with these types of apps. Being able to do finite difference approaches, backwards and forwards, and in absolute terms is important (often you only want the latest data point).
Agreed.
You have the incremental case covered in your notes. Being able to calculate it at a point in an absolute sense, optional caching of results, working with series of unknown lengths and accuracy / validity enabling would be nice too.
Agreed.
By accuracy / validity enabling I mean, for say, your example of a 5 step moving average, it is not valid for the first four points as these are not an accurate average, so there is a length modification for the result.
The value for the first four elements is invalid. The user has to adjust for this when they iterate over the returned vector. Example: (three day moving average) std::vector<double> out; std::transform(in.begin(), in.end(), std::back_inserter(out), moving_average(3)); out[0] = NA out[1] = NA out[2] = NA out[3] = NA std::transform(out.begin()+4, out.end(), std::cout << _1 << "\n");
Another simple example would be an exponential moving average where the number of significant digits required will determine how far back you need to go in the series. At another level, quality of service for some functors, the fast or accurate spectrum, is often appropriate.
Perhaps expression templates might help here. Compile time results for validity calcs for series of known lengths would be helpful where possible.
It would be nice if the algorithms could be simply specified, similar to your approach, and then "mounted" into the appropriate
Based on what I've done the past you also end up wanting arbitrary n-dimensional structures with other data types flowing
Then soon you want to split the computation amongst
I think a way approach it would be an expression template approach with some adaptability in terms of absolute and incremental analytics with some accuracy (at least something like the number of
You need to ensure that you can easily reuse code out there. No one wants to spend their life writing the zillions of numerical algorithms already out there. If you want a vector ARMA model,
Agreed. Expression templates could be used very effecitively here. I will look into this. framework. Perhaps SFINAE or another technique could be used to determine an algorithm's capabilities. That is, the framework could juggle the most appropriate methods to use based on the results required. E.g. use incremental if you can otherwise absolute. through as well. Being able to associate ids/names with data items and keeping them associated through sorting/ranking. E.g. tag the series with data codes, rank them, and you can get the code for the best. You also want grouping n dimensional constructs and splitting dimensions and "slicing" operators that operate across the current dimension, which, with the appropriate framework, are one in the same. E.g. choose the minimum or sum the results of a bunch of functions. processors and machines and you want treat the computation as a dataflow graph and parallelise it some way, which I did a few years ago with BGL's precursor the GGCL. A topological sort gets you most of the way there... Also, you can then use things like Metis to partition your computation graph in nice ways. points being chewed up and not valid or some such). Everything else could come on top of this. Sounds very powerful. Is there any additional web information to help me incorporate these ideas. there are only a couple out there I think, and they are in FORTRAN, for example, wrap it, don't write it, I'd hope. Agreed. Thank you for your comments and spending the time to look at the library. __________________________________ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/
participants (1)
-
Tom Brinkman