
Eric, thanks for all of your responses - they clarify things quite a bit, and your rationale seems more justified. I think most importantly: On 8/7/07, Eric Niebler <eric@boost-consulting.com> wrote:
Stjepan Rajko wrote:
IMO, this library deals with two different problem domains - time_series and picewise constant functions. I also think that these two domains are too different to be stuck in the same bucket. I think that a general time_series does not need to address the "run" concept - perhaps, there can be a notion of a "weight" assigned to each sample, which can represent time duration or other things. That would make it behave eqivalent to the run for the "integral" (which should really just be a sum for time_series, IMO). Also, I think that for the piecewise constant functions, runs should at least have the option of being open/half open intervals. With all this in mind, to some extent, I believe that sparse_series and dense_series (which I see as time series) should be treated differently than the rest of the contaners (which I see as piecewise constant functions).
You've hit on something important -- I agree time_series currently has a split personality, but I don't agree that its the sparse/dense vs. piecewise constant thing. It's the integral vs. floating-point offset thing. And I think those problems are fixable.
Ah! That is a valuable perspective indeed - with integral offsets, the word "run" makes more sense (it is the repetition of the same value), and the library now looks like a perfectly well behaved time series library. But you definitely need floating point offsets for a lot of applications :-( So, what do you do? One strategy that would seem valid to me (conceptually, I don't know what sort of chaos this would bring to the implementation) would be to modify the concept of a run to mean, instead of "a value sample of certain duration" to be "a value regularly sampled between two offsets at a certain period". So, to specify a non-trivial run, you'd need a beginning offset, an end offset, and the period at which the actual samples are taken. So a run (10, 0, 1, 0.5) means there are three samples - (10, 0), (10, 0.5), and (10, 1). The samples would then still be discrete rather than sometimes continuous (which makes me happy :-)), and you no longer run into the open/closed problem because you are dealing with discrete time offsets - if you don't want the (10, 1) sample you can just specify (10, 0, 0.5, 0.5). They could also express the same thing as (10, 0, 0.9, 0.5) which is kind of ugly, so it may even be better to have the user provide the number of samples in a run, rather than the period - so (10, 0, 1, 3) would mean "3 evenly spaced samples between 0 and 1 of value 10". Just ideas. Something like the above, and I'm sold. In any case, I am now definitely in agreement with the integer-offset parts of the library. Best regards, Stjepan