
On 12/12/06, Jeff Garland <jeff@crystalclearsoftware.com> wrote:
Eric Niebler wrote:
I'm pleased to announce the availability of a new library for computing with time series (http://en.wikipedia.org/wiki/Time_series). From the documentation:
Looks very nicely thought out. I'm continually re-inventing series containers and would like to stop ;-) I can see how the discretization type could help many type of applications though it doesn't suit most of the styles I'm used to dealing with. This is due to many timeseries having missing points at similar periods of time. For example, your library seems to have been developed with financial daily and above time frames in mind, though it is obviously not limited to this. Holidays in different markets come into play in the sequences even though the discretization would be the same. To solve this I like the concept of "clocks" from intensional programming. Basically, if two series use the same clock then indexed offsets into the sequence make sense, otherwise a matching procedure has to be used of which the most typical is: matched time = most recent time <=reference time some input clock has to be the reference time which is also used for the output. It is not the only way, for example some of the time only what I call correlated matching makes sense, that is the time exists in both (or all if there are more than two) inputs. This way you get the benefit of direct sequencing when clocks are the same and fast lookups when they are not. Fast lookups are based on the discretization. Like looking up a name in the phone book, if it is a V you go near the back. Calculate the density (num points/ period) and make an educated guess as to the location and binary search from there. This scheme mixes microsecond data and annual data quite freely. Algorithms may chew up degrees of freedom and shorten series, but the clocks will remain the same. For example, a simple moving average over 10 days will not be relevant on the first 9 points. You've chewed up 9 points and your output may reflect this. This is just a simple case. Windowing functions can chew up forward and backwards. Some algorithms may have accuracy requirements that may have minimum input requirements. A simple case is determining the number of points you need to get a certain accuracy for an exponential moving average which deals with a weight sum of infinite points. Where this puppy ends up being quite different is that you want times, real times, associated with the series. The obvious thing to do is tuple them, but this messes up passing blocks of data around efficiently to things that only want to deal with sequences and don't care about the time, but sometimes timed tuples make more efficient sense. So you need flexible mappings and alternative types of containers per situation. So, when I look at this lib, it looks as a neat way to capture singly clocked series, but it also appears that perhaps it is meant to handle multiple clocks given it may be handling daily financial data where holidays come into play based on the acknowledgements section crediting Zürcher Kantonalbank. That is, it seems the discretization is a proxy for clock that suits a particular use. I'm not sure if you can see a way to consider a more flexibly "clocked" POV rather than you current discretization scheme. It seems quite different but tantalisingly close to what you have. Thus I'd think of this library more of a series library than a time series library if such a distinction wasn't just made up by me ;-) Regards, Matt.