Re: [boost] Time Series review - 7/30-8/8

7 Aug 2007

      Eric, thanks for all of your responses - they clarify things quite a
bit, and your rationale seems more justified.  I think most
importantly:

On 8/7/07, Eric Niebler <eric@boost-consulting.com> wrote:
...
Stjepan Rajko wrote:
...
IMO, this library deals with two different problem domains -
time_series and picewise constant functions.  I also think that these
two domains are too different to be stuck in the same bucket.  I think
that a general time_series does not need to address the "run" concept
- perhaps, there can be a notion of a "weight" assigned to each
sample, which can represent time duration or other things.  That would
make it behave eqivalent to the run for the "integral" (which should
really just be a sum for time_series, IMO).   Also, I think that for
the piecewise constant functions, runs should at least have the option
of being open/half open intervals.  With all this in mind, to some
extent, I believe that sparse_series and dense_series (which I see as
time series) should be treated differently than the rest of the
contaners (which I see as piecewise constant functions).
You've hit on something important -- I agree time_series currently has a
split personality, but I don't agree that its the sparse/dense vs.
piecewise constant thing. It's the integral vs. floating-point offset
thing. And I think those problems are fixable.
Ah!  That is a valuable perspective indeed - with integral offsets,
the word "run" makes more sense (it is the repetition of the same
value), and the library now looks like a perfectly well behaved time
series library.  But you definitely need floating point offsets for a
lot of applications :-(    So, what do you do?  One strategy that
would seem valid to me (conceptually, I don't know what sort of chaos
this would bring to the implementation) would be to modify the concept
of a run to mean, instead of "a value sample of certain duration" to
be "a value regularly sampled between two offsets at a certain
period".  So, to specify a non-trivial run, you'd need a beginning
offset, an end offset, and the period at which the actual samples are
taken.  So a run (10, 0, 1, 0.5) means there are three samples - (10,
0), (10, 0.5), and (10, 1).  The samples would then still be discrete
rather than sometimes continuous (which makes me happy :-)), and you
no longer run into the open/closed problem because you are dealing
with discrete time offsets - if you don't want the (10, 1) sample you
can just specify (10, 0, 0.5, 0.5).  They could also express the same
thing as (10, 0, 0.9, 0.5) which is kind of ugly, so it may even be
better to have the user provide the number of samples in a run, rather
than the period - so (10, 0, 1, 3) would mean "3 evenly spaced samples
between 0 and 1 of value 10".  Just ideas.

Something like the above, and I'm sold.  In any case, I am now
definitely in agreement with the integer-offset parts of the library.

Best regards,

Stjepan