Re: [boost] Time Series review

9 Aug 2007

      Paul A Bristow wrote:
...
...
-----Original Message-----
From: boost-bounces@lists.boost.org 
[mailto:boost-bounces@lists.boost.org] On Behalf Of John Phillips
Sent: 31 July 2007 13:31
To: boost@lists.boost.org
Cc: boost-users@lists.boost.org
Subject: [boost] Time Series review - 7/30-8/8
* What is your evaluation of the design?
Fundamentally sound and potentially useful in many domains, not just the finance one for which it was designed.
* What is your evaluation of the implementation?
Looks OK.  Some details of difficult areas like FP that can almost certainly be improved. (See also some detailed notes and
questions below)
Agreed.
...
* What is your evaluation of the documentation?
Good, but far too steep to tempt most potential users.  Needs an outline, a tutorial, and more examples, especially several outside
finance.
Yes, I'm getting that message loud and clear. :-)
...
* What is your evaluation of the potential usefulness of the library?
Although there are many problem for which a plain C array of doubles will suffice, it is surprisingly how soon more complicated data
structures emerge, and that is where I see this package making the difficult much easier.
* Did you try to use the library?  With what compiler?  Did you have any problems?
No, time/ priorities did not permit, but I hope to.
* How much effort did you put into your evaluation?
A quicker reading than I would wish.
* Are you knowledgeable about the problem domain?
Somewhat, but with physical measurements, not finance.
* Am I confident that the library will be maintained.
Yes, definitely.
* Do you think the library should be accepted as a Boost library?
Definitely.
I note that a number of critical comments have been made of the library:  I don't think any of these are showstoppers and I think it
would be unfortunate if this prevented acceptance of the library.  I am confident that these criticisms can be addressed.
Thank you. I also am confident the FP problem can be dealt with.
...
(As an aside, I think it would also set an unfortunate example to reject a high quality library that has been kindly 'donated' by a
commercial organisation.  No doubt it took some effort to persuade them to do this, and that it took a *lot of effort for them to
make a decision to do so*! I believe we want to encourage other organisations to make freely available, through Boost, work that
they have funded.)
IMHO, if time_series is accepted, I hope it's for its merits and not 
because it was donated by anyone.
...
Some detailed observations:
1  Is the Time series is restricted to time? - the series time unit could be anything, voltage, age, - so not an ideal name?
The data structures are fairly general, that's true, which is why in 
another message I suggested moving them into the range_run_storage 
namespace so they can be reused. A "TimeSeries" is an 
InfiniteRangeRunStorage with a discretization, and the Time_series 
library includes a bunch of algorithms that are specific to the time 
series domain.
...
So I am tempted to suggest Data_series or even Boost.Series?
But a rose by any other name would smell as sweet ;-)
(Can I claim a brownie point for the first Shakespeare quote in a Boost review? ;-)
+1 brownie point ;-)
...
However there is plenty of precedent as you give http://en.wikipedia.org/wiki/Time_series
and time is the most common 'variable'.
2  How would one signify missing data???  Use NaN? as zero?
NaN works if your data is floating point, some magic number otherwise. 
You could even use optional<T> as your value type if you were so 
inclined, so long as you define the appropriate operations on that type.
...
3  The series types are rather tersely defined.
The learning curve would be flattened by more explanation
and examples at the point of definition, aiming for lucidity as well as accuracy.
Agreed.
...
4  The delta_series should at least mention Dirac! Why does Heaviside get his name on a series and Dirac not? Wikipedia link would
be good.
A link to wikipedia would be good. I wouldn't want Dirac to get jealous 
of Heaviside.
...
3  Despite the warning, I am sure many will make the mistake of not committing.
A stronger admonishment might be useful?
"You must .commit()  the returned inserter when you are done with it."
I could do that, but I've been getting lots of negative feedback about 
.commit(), and I think it might make sense to provide a higher-level 
wrapper. I could imagine something like:

   sparse_series<int> s;
   assign_ordered(s)
       (1, 2) (2, 4) (3, 6) (4, 8);

Here, assign_ordered() returns a type that is movable and automatically 
.commit()'s in its destructor.

I might also rip out the inserter-as-an-output-iterator duality, as it 
may end up leading people astray.
...
4  I'd like to see examples with each definition, and preferably not all from finance, lest people imagin that it is only suitable
for finance when it is actually at least as useful for all things physical and even sociological.
5  It may astonish Boosters to know that there are very many potential users who will not understand "This operation is amortized
O(1)".  Links like [@http://en.wikipedia.org/wiki/Amortized_analysis amortized 0(1)] would help them.
Good point.
...
6  A proof of application using circular buffer would be most valuable.  There are very many people with an endless torrent of data
hitting them, without an infinite supply of memory.  Showing how to get information from the data flood could sell time_series to
them.
7  A single example, in a single finance arena (nice and commented) is really not enough.
8  The docs really needed a much more gentle tutorial introduction, so that users can see the wood for the metaprocessing-trees.
Right.
...
Typos - I only spotted one in Zeros and Sparse Data
accomlish should be accomplish.
Thanks for your feedback.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

The Astoria Seminar ==> http://www.astoriaseminar.com