
I want to say at the outset that I work with a lot of people that do signal processing, and so that is the main use case my colleagues and I would have for using Boost.TimeSeries. That has definitely colored my view of Boost.TimeSeries, for what that's worth. * What is your evaluation of the design? Why is a series with only one nonzero element called a delta series? I would expect a delta series to be the elementwise differences between two series or some such. In fact, a little Googling revealed many references to delta series that were the differences of two series, but none like delta_series<>. Is this just my ignorance of the financial uses of time series? I'd like to see an overload of the coarse_grain() function allowing the specification of a DownSampler. I have a need for coarse resampling that preserves the largest value in the original set of finer samples, and I'm sure many others would like to use their favorite convolution-of-oversamples technique to move from high to low sample frequency. In fact, the ability to specify the method used when moving up or down in sample frequency should be a first-class feature of the coarse_grain() and fine_grain() functions. The default behavior is a gross simplification for many users' purposes. Please see my note elsewhere on the lack of UpSampler concept docs, making writing a custom one more difficult than it needs to be. The TimeSeries concept docs state that range_run_storage::pre_value( s ) returns "The value associated with the pre_run, if there is one." Does it return TimeSeries< S >::zero_type() otherwise? Is there any way for the user to tell the difference between a return value that happens to be equal to zero_type() and one that was never set? These same questions apply to post_value. I'm not really sure why dense_series<> does not allow floating point offsets. I understand that it is supposed to resemble std::vector. However, there is implicitly a relationship between the indices of the underlying vector and the times represented by the indices. It is possible (and quite convenient) for dense_series<> to hold a discretization_type that represents the start offset, and an interval, and perform the mapping for me. The lack of this mapping means that for dense time series of arbitrary discretization (say 0.5), I have to multiply all my times by 0.5 in user code when using the time series. I feel the library should take care of this for me; the fact that the underlying storage is vector-like should not force me to treat dense_series<> discretizations differently from all other series' discretizations. I find it a little confusing that discretization is both a template parameter and a ctor parameter. It would perhaps be a bit more clear if template-parameter Discretization were DiscretizationType or something. In general, I think that this library would be far more usable if there were more clarification of the relationships among discretization, offsets, and runs/intervals. What I'm getting at here is that the current structure makes it really unclear what DiscretizationType=double, discretization=2.1, offset=3.2 means for a run. Where is the run? At 3.2, or at 2.1 * 3.2? The fact that I have to think about this is probably simply a failure of documentation. Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series. In addition, a sample should be representable as a point like 3.14 or a run like [3.14, 4.2). * What is your evaluation of the implementation? I didn't look too closely at the implementation, due to lack of time, but I would like to know why the functors in boost::range_run_storage are const references to singletons in an anonymous namespace. What is that supposed to accomplish that other techniques do not? One of the great things about boost is learning new advanced techniques and discovering best practices for sticky coding situations. Without the why, this technique is lost on me, and probably others. Note that I'm not raising any issue with this implementation choice -- I'm just looking to understand why this was done. * What is your evaluation of the documentation? The docs could use a gentler intro. What I mean by this is that the docs appear to be somewhat inverted. Moving from foundations to things that build upon those foundations, and from the general to the specific would be better. For example, delta_series<> has the following description on the first page of the user manual: "A series which has exactly one run of unit length." I have no precise idea at this point what a run is, and moreover following the link gives a much more comprehensible definition in the reference section: "boost::time_series::delta_series -- A Mutable_TimeSeries that has a distinct value at some unique offset, and zero elsewhere". After reading the Range-Run Abstraction section, I think it would be appropriate to place that before mentions of runs, specific time series models, etc. It was only on the second pass that I fully understood the material presented up front in Series Containers. In general, the documentation is skewed towards financial use, but doesn't need to be -- the library is useful for other purposes as well. For instance, when the predefined resolution types are presented, it seems that these are somehow necessary, or that the lack of a "seconds" or "milliseconds" resolution typedef might be a concern. Further investigation indicates that in fact the Discretization can be virtually any numeric type (is that correct?). Placing these at the beginning of the documentation made several of my signal processing colleagues think this library would only be good for financial users. I suggest a more general emphasis, and a specific subsection that says something like "this was originally written for financial use, and here are some features that facilitate such use ...". For example, here is what I wrote on my first pass through the docs; I realize what the situation is now, but here is how the docs led me astray a bit: "Discretization is only done on days on up. What about seconds, minutes, hours, milliseconds (the one I need) discretization? Are series with different discretizations usable together without explicit user conversion? Also, if mpl::int_<360> is used as a conversion factor for yearly, it's wrong for many users' purposes. If it's an arbitrary number, how about using mpl::int_<0>, mpl::int_<1>, etc., and adding some other time increments?" In period_sums() description in the manual section, "summed" is misspelled "summer". I wanted to figure out whether I could supply my own sampler functor to fine_grain(), but it is unclear from the documentation what an UpSampler template parameter is supposed to look like. Looking at the fine_grain() reference is no help, due to lack of concept documentation. Following the samplers::sampler_base<> link was no help either, since it appears not to be documented other than listing its API. I finally found out what I need to do to write a sampler only by looking at the piecewise_upsample implementation. The coarse_grain() function says that it requires the coarser grain to be a multiple of the finer grain. I assume that should read "integer multiple"; is that right? After finding the above doc issues, I did a quick survey of the functions in the manual's Algorithms section. Most of them appear to have complete concept docs, but there are some deficiencies: Partial concept docs: integrate (missing Series requirements) Low-to-no concept docs: coarse_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) fine_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) invert_heaviside (no requirements at all) The rest of the algorithm detailed docs have concept requirements, but it would be much easier to use them if the concepts were links to the relevant concept docs; as it is now, I have to do some bit of searching to find each one listed. This applies generally to all references to concepts throughout the docs -- even in the concepts docs, I find names of concepts that I must then look up by going back to the TOC, since they are not links. The fact that "The only series type that does not support floating-point offsets is dense_series<>" should not just be mentioned in the manual, but also in the Discretization requirements in the dense_series<> reference docs. There appears not to be a rationale section. Even listing the usage cases alluded to in the Acknowledgements section may help illuminate the choices that were made. Specific things I'd like to see in a rationale: - Why was commit() chosen, instead of simpler self-contained insertions? - Why were ordered_inserters chosen instead of more STL-like insertion via iterators and member functions? - Why do ordered_inserters wipe out the previous contents of a time series, instead of appending/inserting? - Why can't elements and/or ranges of elements be removed from or added to an existing time series? - Why is the TimeSeries concept interface split between boost::sequence namespace and boost::range_run_storage namespace accessors? * What is your evaluation of the potential usefulness of the library? I think it is potentially quite useful. However, I think its usefulness is not primarily as a financial time series library, but as I mentioned earlier, its current docs make it sound as if it is mainly only useful for that. In addition, I am forced to ask how a time series library is more useful for signal processing than a std::vector and an extrinsic discretization value. The answer I came up with is that Boost.TimeSeries is really only advantageous when you have arbitrary spacing between elements, or when you want to use two representations of time series in an algorithm. That is, using Boost.TimeSeries' two-series for_each() is almost certainly better than a custom -- and probably complicated -- loop everywhere I need to operate on two time series. However, these cases are relatively rare in signal processing; it is much more common to simply loop over all the samples and do some operation on each element. This can be accomplished just as well with std::for_each or std::transform. The question then becomes, "Does using Boost.TimeSeries introduce clarifying abstractions, or conceptual noise?". The concensus among my colleagues is that the latter is the case. Some specific signal-processing usability concerns: - For many signal processing tasks, the time series used is too large to fit in memory. The solution is usually to use a circular buffer or similar structure to keep around just the part you need at the moment. The Boost.TimeSeries series types seem unable to accommodate this mode of operation. - Two of the most potentially useful bits of Boost.TimeSeries for certain kinds of signal processing are the coarse_grain() and fine_grain(). These do not allow in the case of coarse grain, and make difficult in the case of fine grain, the use of an arbitrary functor to do downsampling/upsampling. - It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing. If problems like that were looked at by the developers, it may well inform their design by helping them finding new generalizations and so forth. Likewise, if methods for using Boost.TimeSeries to do this kind of work were present in the documentation, the library would have immediate appeal to a whole new range of folks in the scientific computing field. * Did you try to use the library? With what compiler? Did you have any problems? No. * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? In-depth reading of the docs, API, and some of the implementation. I ran out of time to evaluate much code before the review ended. * Are you knowledgeable about the problem domain? Somewhat. I work at a place that does a lot of signal processing using time series, and though I work with these time series from time to time, it's not my core area. * Do you think the library should be accepted as a Boost library? As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote. Zach Laine