RE: [boost] Proposed Boost Library for Data Series Analysis

3 Jun 2004

      Response to Matthew Hurd post:
...
...
Efficiency is often paramount with these types of
apps.  Being able to
do finite difference approaches, backwards and
forwards, and in absolute
terms is important (often you only want the latest
data point).
Agreed.
...
...
You have the incremental case covered in your
notes.  Being able to
calculate it at a point in an absolute sense, optional
caching of
results, working with series of unknown lengths and
accuracy / validity
enabling would be nice too.
Agreed.
...
...
By accuracy / validity enabling I mean, for say,
your example of a 5
step moving average, it is not valid for the first
four points as these
are not an accurate average, so there is a length
modification for the
result.
The value for the first four elements is invalid.  The
user has to 
adjust for this when they iterate over the returned
vector.

Example: (three day moving average)

std::vector<double> out;
std::transform(in.begin(), in.end(),
std::back_inserter(out), 
	moving_average(3));

out[0] = NA
out[1] = NA
out[2] = NA
out[3] = NA

std::transform(out.begin()+4, out.end(), std::cout <<
_1 << "\n");
...
...
Another simple example would be an exponential
moving average
where the number of significant digits required will
determine how far
back you need to go in the series.  At another level,
quality of service
for some functors, the fast or accurate spectrum, is
often appropriate.
...
...
Perhaps expression templates might help here. 
Compile time results for
validity calcs for series of known lengths would be
helpful where
possible.
...
...
It would be nice if the algorithms could be simply
specified, similar to
your approach, and then "mounted" into the appropriate
...
...
Based on what I've done the past you also end up
wanting arbitrary
n-dimensional structures with other data types flowing
...
...
Then soon you want to split the computation amongst
...
...
I think a way approach it would be an expression
template approach with
some adaptability in terms of absolute and incremental
analytics with
some accuracy (at least something like the number of
...
...
You need to ensure that you can easily reuse code
out there.  No one
wants to spend their life writing the zillions of
numerical algorithms
already out there.  If you want a vector ARMA model,
Agreed.  Expression templates could be used very
effecitively here.  I will
look into this.

framework.
Perhaps SFINAE or another technique could be used to
determine an
algorithm's capabilities.  That is, the framework
could juggle the most
appropriate methods to use based on the results
required.  E.g. use
incremental if you can otherwise absolute.

through as well.
Being able to associate ids/names with data items and
keeping them
associated through sorting/ranking.  E.g. tag the
series with data
codes, rank them, and you can get the code for the
best.  You also want
grouping n dimensional constructs and splitting
dimensions and "slicing"
operators that operate across the current dimension,
which, with the
appropriate framework, are one in the same.  E.g.
choose the minimum or
sum the results of a bunch of functions.

processors and
machines and you want treat the computation as a
dataflow graph and
parallelise it some way, which I did a few years ago
with BGL's
precursor the GGCL.  A topological sort gets you most
of the way
there...  Also, you can then use things like Metis to
partition your
computation graph in nice ways.

points being chewed
up and not valid or some such).  Everything else could
come on top of
this.

Sounds very powerful.  Is there any additional web
information to help me incorporate 
these ideas.

there are only a
couple out there I think, and they are in FORTRAN, for
example, wrap it,
don't write it, I'd hope.

Agreed. 

Thank you for your comments and spending the time to
look at the library.

__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Tom Brinkman

tags

participants (1)