Re: [boost] Time Series review - 7/30-8/8

7 Aug 2007

      On 7/31/07, John Phillips <phillips@delos.mps.ohio-state.edu> wrote:
...
My apologies for the delay in this posting, but the review period for
the Time Series library submitted by Eric Neibler runs from Monday, July
30 until Wednesday, August 8. From the documentation:
I have started to review the library, and at this point I am confused
:-)  So I thought I would share my thoughts in case someone can
de-confuse me and I can produce a more useful review.

To preface, I would be *very* interested in having a time series
library in Boost.  I use time series frequently in my work, although I
wouldn't call myself an expert.

That said, it took me a while to understand the language of the
documentation.  The whole "runs, discretization, coarseness" business
didn't readily make sense to me.  The lightbulb finally went off when
I saw:

"The integral of a series is calculated by multiplying the value of
each run in the series by the run's length, summing all the results,
and multiplying the sum by the series' discretization."

Aha! Now I knew exactly what run and discretization meant - it is as
if you are modeling a piecewise constant function with something like
a scalable "x" axis.

I went on to experiments with some dense, sparse, and piecewise
constant series to see how they behave.  The first thing that struck
me as odd was:

   dense_series<double, daily> dense(start = 0, stop = 2);
   sparse_series< double, daily, double > sparse;

   sparse = dense; // Compilation error! (GCC 4.0.1/darwin)

However, if I change the discretization type to int, all is good.  Is
there a reason?

Before assigning to sparse, I populated the dense series with:

ordered_inserter< dense_series< double > > in_d( dense );
    in_d(10,0)(11,1)(12,2).commit();

After assigning to sparse, I was a little surprised at what sparse was
spitting back out via the [] operator.  Basically, sparse[0] == 10,
sparse[1] == 11, sparse[2] == 12, and seemingly 0 everywhere else, but
the docs say:

sparse_series<>
    A sparse series where runs have unit length.

So how come I get sparse[0.1] == sparse[-0.1] == 0?  Shouldn't there
be a unit length interval around 0 where sparse is 10?  I think you
may have commented on this recently, but I found it rather unexpected,
given what the docs say (otherwise, perfectly good behavior).

I moved on to:

piecewise_constant_series< double, int, double > pc;
pc = dense;

And now more weirdness... pc[x] == 10 for x \in [0, 1] (inclusive)!!!
i.e., pc[0] == pc[1] == 10, but pc[1.01] == 11.

The first reason why this strikes me as odd is, if I start with a
discrete time series (as dense_series models very nicely), and put it
into something that expands it to a piecewise constant function, the
choice of translating "value 10 at time 0" to "value 10 at interval
[0, 1]" is not obvious... if I wanted approximation, I would be more
likely to assign to a particular pc[x] the value of the closest point
in dense - i.e. I'd be more likely to set pc[0.9]==11 because 0.9 is
closer to 1 than to 0.  That strategy would make it difficult to deal
with the infinite pre-run and post-run, though, and you could argue
that assignment is not about this kind of approximation but about
something else - so you may want to specify in the docs what the
semantics of conversion through assignment is.

The second reason why I find this odd is, I would at least expect that
for an explicitly specified point such as dense[1], pc[1] would have
the same value after assignment.

Anyway, at this point I stopped experimenting as it seemed to me that
there was something fundamental that I wasn't getting.
...
From reading the docs, I do have some other comments:
"It may seem odd that even the unit series have a value parameter.
This is because 1 is not always convertible to the series' value type.
Consider a delta_unit_series< std::vector< int > >. You may decide
that for this series, the unit value should be std::vector< int >(3,
1). For unit series with scalar value types such as int, double and
std::complex<>, the value parameter is ignored."

Question: if I was dealing with something that needs to have the "1"
specified, couldn't I just use delta_series instead of
delta_unit_series?

About discretization:  your prenamed discretizations start with with
daily as mpl::int<1>.  That's great if that's how the integrals need
to be calculated.  However, in a lot of cases one would want to have
"secondly" as mpl::int<1>.  Could you provide another set of prenamed
discretizations that provide that option?  Although, if someone used
both of them in the same app, then compile-time checking would
consider them equivalent, no?  Could the units lib be used for this
instead?

About the docs:  time series can be expressed very clearly visually -
I think that the docs would benefit greatly from some visual examples.

I guess that's it for now.  Sorry about my confusion :-)  I can try to
give a little more feedback if I start to understand the fundamentals
of the library better.  Back to bed for me... I actually tried to go
to bed earlier but couldn't sleep because I was thinking about this
lib.  So I woke up to write this.  Weird.

Stjepan

Re: [boost] Time Series review - 7/30-8/8

Stjepan Rajko