
On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
Stjepan Rajko wrote:
On a less nit-picky note though, I still can't find a single outside reference in which something that assigns a value to a whole real set interval is called a time series. Eric, you indicated that your choice for using range runs (as opposed to just points, I assume) was that this yielded superior generic algorithms. But in the floating point case, this is causing that most of your structures to represent something that is not really a time series. In rethinking the floating point case, are any of the strategies you are considering looking to put all of your structures in line with what the mathematical notion of what a time series is?
Would you agree that the time series types that use integral offsets are isomorphic to what a time series is, in the mathematical sense? Are
Yes
there any time series (in the math sense) that are not representable using integral offsets?
I think that most time-series, and especially those time series used in practice, could be approximated fairly well using an integral-offset series with the appropriate discretization. The problems are: 1) if you don't know much about the series a priori, you might not know what to set the discretization to initially, and might never get a good idea of what the discretization really should be. 2) say you have a series coming at you, and the time intervals between the samples keep getting smaller and smaller, repetitively beating your discretization no matter how small it is. You would have to keep fine_graining, which doesn't seem efficient. 3) a user simply might not want to use discretization. But this is not really a problem when it comes to your lib, because the sparse series with floating point offsets would do the trick.
In one of your posts, you mentioned something along the lines of making Point concept a first class citizen of the library - IIUTC, that would be a good approach. Furthermore, I think that the RangeRun should be rethought, so that a RangeRun is in effect equivalent to a countable set of Points even in the floating point case (where by "a countable set", I mean "a countable set significantly smaller than the one including every Point indexable by a floating point number between the start offset and end offset"). If not, then I see this as a Time_series+something else library, which is fine. But with time series, I think a continuous interval is much less useful than a way to specify a number of discrete points in an interval.
I'm having a hard time seeing how this is any different that using a series with integral offsets and a floating point discretization. The time series library provides this functionality already. Can you clarify what you're suggesting?
I don't think that the integral offsets + floating point discretization approach always works (mostly given the problems I list above). But again, sparse_series with floating point offsets can be used instead. I gave a slightly more specific example of what I am suggesting in http://tinyurl.com/ywu53v, but also see below.
I agree that the time series types that use floating point offsets are not very time series-ish in the math sense. But some have expressed the strong opinion during this review that the functionality they provide is useful.
Please don't get me wrong - I also think that the floating point offset series are useful as they are. For example, I think it's really useful to be able to multiply a sparse_series with a piecewise_constant_series, to accomplish something like "multiply all the samples in [0, 100) by 10, and all samples in [100, 200) by 20". Also, I agree with Steven in that the floating point offset series can be divided into two categories - sparse/dense (and delta), which are pretty consistent with the mathematical concept of a time series as they are, with the exception of their pre_runs and post_runs, and the rest, which are closer to modeling a piecewise constant function. What keeps nagging me is that all these "others" are not time series. I wouldn't even call them "series", although they are a series of tuples, because they so much better reflect a piecewise constant function. What I do see coming out of the RangeRun concept is a potentially wonderful foundation for Boost.MathFunction - but in order to get there, it would need to grow (for example, somehow supporting all flavors of open/closed/half-open intervals). So, I see most of these floating point. So all these "others", I see in this limbo - they are not time series, but they are useful to have with time series, and they are almost really nice implementations of piecewise constant functions (and with the potential to implement any function I think, using the RangeRun concept) but not quite there either. So what I'm mostly suggesting is: * whatever is supposed to be a time series - make it a true time series. At the end of the day, anything that is a time series should be convertible to a sequence of discrete time points with values attached, and nothing more. Integral offset versions of the series are there. Floating point versions of dense, sparse, and delta series are also there, except for their pre-runs and their post runs. * whatever is not a time-series - call it something else, or make it clear in the documentation that it behaves as something else in certain circumstances (like floating point offsets). I am not disputing the fact that they are useful, and not suggesting they be removed from the library - they are definitely useful in conjunction with time-series. * alternatively - make everything a time series, which would require you to revisit the RangeRun concept so that it is always convertible to a countable set of discrete (value, time) pairs. The utility I see here is the following: from a time series perspective, I can't just say "All samples in [0, 100) have value 10". I have to specify exactly where all these samples lie. Allowing me to do this concisely using a modified RangeRun would be very useful, since I wouldn't have to specify each of the possibly numerous samples separately, nor would they have to be stored separately.
I guess what I'm missing is a use case for non-integral offsets. Any reanalysis of what floating point offsets mean has to start there. Is it
I hope the above makes a case for some of that. I think in a lot of cases, users will not want to deal with discretization or any involved transforms and just want to use their (value, time) pairs as they are.
simply the desire to index into a sequence of points and interpolate between them in some way? If that's the case, then support for floating point offsets can be dropped in favor of a flexible interpolating facade. (IMO, something like that is needed anyway.)
I have to think about that... I wasn't thinking about that case.
If someone really needs a way to say, "This signal really has the value of X in the time interval [Y,Z)" where Y and Z are floating point values, then continuous floating point runs are the way to go. That seems like a reasonable thing to want, even if it doesn't fit the mathematical definition of "time series".
It is a *very* reasonable thing to want. And it fits the mathematical definition of a function with a domain in the real numbers very well ;-) Best regards, Stjepan