Re: [boost] [ANN] Boost.Time_series: feedback requested

From: "Matt Hurd" <matthurd@acm.org>
Not really. A clock is a sequence of times, a ptime for example. Each value point in the series has a corresponding time point. Sometimes referred to as the "valid start time" as this the time the data becomes the truth from. Logically it is simply a sequence (Vs, Data).
If series have the same underlying clock you can just use index operations freely which are fast. If they don't have the same underlying clock you have to match based on time points.
Your discretization say for daily data for stock data from 20 different countries would invite trouble as the different clocks, due to different holiday schedules, could not be mixed by index offsets even though they were all daily. Similarly, for stocks at one exchange, data points are missing due to suspensions and other reasons, meaning clocks are often a little different.
Without going to clocks you could simply call them a different discretization in your scheme by assigning a different integer and thus ensuring they don't mix. However you often wish to mix such things.
There's an important difference between *missing data* and *different sampling rates*. In the face of missing data (say, because of holidays or suspensions), but the same sampling rate (say, daily), you should be able to use the same indexes for all the data - some data will just be null-valued. Different application may handle nulls differently. For some, interpolation might be appropriate. But I had an application a while back in which nulls needed to be handled by *accumulating* values on the series that wasn't null until both series had values again. (This is particularly useful in portfolio performance measurement.) So interpolation certainly isn't *always* appropriate. Using discretization w/ interpolation makes sense when analyzing series with different sampling rates, but I'm not convinced that it does just because of nulls in one series. - James Jones Administrative Data Mgmt. Webmaster 375 Raritan Center Pkwy, Suite A Data Architect Edison, NJ 08837

On 14/12/06, james.jones@firstinvestors.com <james.jones@firstinvestors.com> wrote: <snip> So interpolation certainly isn't *always* appropriate.
For sure. My original post referred to being able to choose a what I call a matching algorithm for inputs with different clocks. In spending a few years writing these kinds of apps, >99% of the time for my needs the most recent time <= matching time was the one used if clocks were different. Though others certainly make sense. The next most often used was what I called a correlated match where the output stream reflects only those times where all inputs had a time in common. Some of the philosophy there was to reflect the actions of a reactive dataflow system. Using nulls would be a both complementary or alternative approach. I've typically avoided nulls for missing data as it complicates the integration with third party libraries too much. Most math, scientific or stats libs either don't support nulls or have different ways or representing missing values or nulls. That said, I think it makes sense to allow the support of nulls in a time series library as it is a design that makes sense. I could also imagine a matching algorithm that "injects" nulls for non-matched times. Complicates things a bit when you want include algorithms and representations that are not null aware as you'd like the type system to reject the composition of such at compile time. Regards, Matt.
participants (2)
-
james.jonesīŧ firstinvestors.com
-
Matt Hurd