
I want to say at the outset that I work with a lot of people that do signal processing, and so that is the main use case my colleagues and I would have for using Boost.TimeSeries. That has definitely colored my view of Boost.TimeSeries, for what that's worth. * What is your evaluation of the design? Why is a series with only one nonzero element called a delta series? I would expect a delta series to be the elementwise differences between two series or some such. In fact, a little Googling revealed many references to delta series that were the differences of two series, but none like delta_series<>. Is this just my ignorance of the financial uses of time series? I'd like to see an overload of the coarse_grain() function allowing the specification of a DownSampler. I have a need for coarse resampling that preserves the largest value in the original set of finer samples, and I'm sure many others would like to use their favorite convolution-of-oversamples technique to move from high to low sample frequency. In fact, the ability to specify the method used when moving up or down in sample frequency should be a first-class feature of the coarse_grain() and fine_grain() functions. The default behavior is a gross simplification for many users' purposes. Please see my note elsewhere on the lack of UpSampler concept docs, making writing a custom one more difficult than it needs to be. The TimeSeries concept docs state that range_run_storage::pre_value( s ) returns "The value associated with the pre_run, if there is one." Does it return TimeSeries< S >::zero_type() otherwise? Is there any way for the user to tell the difference between a return value that happens to be equal to zero_type() and one that was never set? These same questions apply to post_value. I'm not really sure why dense_series<> does not allow floating point offsets. I understand that it is supposed to resemble std::vector. However, there is implicitly a relationship between the indices of the underlying vector and the times represented by the indices. It is possible (and quite convenient) for dense_series<> to hold a discretization_type that represents the start offset, and an interval, and perform the mapping for me. The lack of this mapping means that for dense time series of arbitrary discretization (say 0.5), I have to multiply all my times by 0.5 in user code when using the time series. I feel the library should take care of this for me; the fact that the underlying storage is vector-like should not force me to treat dense_series<> discretizations differently from all other series' discretizations. I find it a little confusing that discretization is both a template parameter and a ctor parameter. It would perhaps be a bit more clear if template-parameter Discretization were DiscretizationType or something. In general, I think that this library would be far more usable if there were more clarification of the relationships among discretization, offsets, and runs/intervals. What I'm getting at here is that the current structure makes it really unclear what DiscretizationType=double, discretization=2.1, offset=3.2 means for a run. Where is the run? At 3.2, or at 2.1 * 3.2? The fact that I have to think about this is probably simply a failure of documentation. Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series. In addition, a sample should be representable as a point like 3.14 or a run like [3.14, 4.2). * What is your evaluation of the implementation? I didn't look too closely at the implementation, due to lack of time, but I would like to know why the functors in boost::range_run_storage are const references to singletons in an anonymous namespace. What is that supposed to accomplish that other techniques do not? One of the great things about boost is learning new advanced techniques and discovering best practices for sticky coding situations. Without the why, this technique is lost on me, and probably others. Note that I'm not raising any issue with this implementation choice -- I'm just looking to understand why this was done. * What is your evaluation of the documentation? The docs could use a gentler intro. What I mean by this is that the docs appear to be somewhat inverted. Moving from foundations to things that build upon those foundations, and from the general to the specific would be better. For example, delta_series<> has the following description on the first page of the user manual: "A series which has exactly one run of unit length." I have no precise idea at this point what a run is, and moreover following the link gives a much more comprehensible definition in the reference section: "boost::time_series::delta_series -- A Mutable_TimeSeries that has a distinct value at some unique offset, and zero elsewhere". After reading the Range-Run Abstraction section, I think it would be appropriate to place that before mentions of runs, specific time series models, etc. It was only on the second pass that I fully understood the material presented up front in Series Containers. In general, the documentation is skewed towards financial use, but doesn't need to be -- the library is useful for other purposes as well. For instance, when the predefined resolution types are presented, it seems that these are somehow necessary, or that the lack of a "seconds" or "milliseconds" resolution typedef might be a concern. Further investigation indicates that in fact the Discretization can be virtually any numeric type (is that correct?). Placing these at the beginning of the documentation made several of my signal processing colleagues think this library would only be good for financial users. I suggest a more general emphasis, and a specific subsection that says something like "this was originally written for financial use, and here are some features that facilitate such use ...". For example, here is what I wrote on my first pass through the docs; I realize what the situation is now, but here is how the docs led me astray a bit: "Discretization is only done on days on up. What about seconds, minutes, hours, milliseconds (the one I need) discretization? Are series with different discretizations usable together without explicit user conversion? Also, if mpl::int_<360> is used as a conversion factor for yearly, it's wrong for many users' purposes. If it's an arbitrary number, how about using mpl::int_<0>, mpl::int_<1>, etc., and adding some other time increments?" In period_sums() description in the manual section, "summed" is misspelled "summer". I wanted to figure out whether I could supply my own sampler functor to fine_grain(), but it is unclear from the documentation what an UpSampler template parameter is supposed to look like. Looking at the fine_grain() reference is no help, due to lack of concept documentation. Following the samplers::sampler_base<> link was no help either, since it appears not to be documented other than listing its API. I finally found out what I need to do to write a sampler only by looking at the piecewise_upsample implementation. The coarse_grain() function says that it requires the coarser grain to be a multiple of the finer grain. I assume that should read "integer multiple"; is that right? After finding the above doc issues, I did a quick survey of the functions in the manual's Algorithms section. Most of them appear to have complete concept docs, but there are some deficiencies: Partial concept docs: integrate (missing Series requirements) Low-to-no concept docs: coarse_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) fine_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) invert_heaviside (no requirements at all) The rest of the algorithm detailed docs have concept requirements, but it would be much easier to use them if the concepts were links to the relevant concept docs; as it is now, I have to do some bit of searching to find each one listed. This applies generally to all references to concepts throughout the docs -- even in the concepts docs, I find names of concepts that I must then look up by going back to the TOC, since they are not links. The fact that "The only series type that does not support floating-point offsets is dense_series<>" should not just be mentioned in the manual, but also in the Discretization requirements in the dense_series<> reference docs. There appears not to be a rationale section. Even listing the usage cases alluded to in the Acknowledgements section may help illuminate the choices that were made. Specific things I'd like to see in a rationale: - Why was commit() chosen, instead of simpler self-contained insertions? - Why were ordered_inserters chosen instead of more STL-like insertion via iterators and member functions? - Why do ordered_inserters wipe out the previous contents of a time series, instead of appending/inserting? - Why can't elements and/or ranges of elements be removed from or added to an existing time series? - Why is the TimeSeries concept interface split between boost::sequence namespace and boost::range_run_storage namespace accessors? * What is your evaluation of the potential usefulness of the library? I think it is potentially quite useful. However, I think its usefulness is not primarily as a financial time series library, but as I mentioned earlier, its current docs make it sound as if it is mainly only useful for that. In addition, I am forced to ask how a time series library is more useful for signal processing than a std::vector and an extrinsic discretization value. The answer I came up with is that Boost.TimeSeries is really only advantageous when you have arbitrary spacing between elements, or when you want to use two representations of time series in an algorithm. That is, using Boost.TimeSeries' two-series for_each() is almost certainly better than a custom -- and probably complicated -- loop everywhere I need to operate on two time series. However, these cases are relatively rare in signal processing; it is much more common to simply loop over all the samples and do some operation on each element. This can be accomplished just as well with std::for_each or std::transform. The question then becomes, "Does using Boost.TimeSeries introduce clarifying abstractions, or conceptual noise?". The concensus among my colleagues is that the latter is the case. Some specific signal-processing usability concerns: - For many signal processing tasks, the time series used is too large to fit in memory. The solution is usually to use a circular buffer or similar structure to keep around just the part you need at the moment. The Boost.TimeSeries series types seem unable to accommodate this mode of operation. - Two of the most potentially useful bits of Boost.TimeSeries for certain kinds of signal processing are the coarse_grain() and fine_grain(). These do not allow in the case of coarse grain, and make difficult in the case of fine grain, the use of an arbitrary functor to do downsampling/upsampling. - It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing. If problems like that were looked at by the developers, it may well inform their design by helping them finding new generalizations and so forth. Likewise, if methods for using Boost.TimeSeries to do this kind of work were present in the documentation, the library would have immediate appeal to a whole new range of folks in the scientific computing field. * Did you try to use the library? With what compiler? Did you have any problems? No. * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? In-depth reading of the docs, API, and some of the implementation. I ran out of time to evaluate much code before the review ended. * Are you knowledgeable about the problem domain? Somewhat. I work at a place that does a lot of signal processing using time series, and though I work with these time series from time to time, it's not my core area. * Do you think the library should be accepted as a Boost library? As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote. Zach Laine

AMDG Zach Laine <whatwasthataddress <at> gmail.com> writes:
I didn't look too closely at the implementation, due to lack of time, but I would like to know why the functors in boost::range_run_storage are const references to singletons in an anonymous namespace. What is that supposed to accomplish that other techniques do not?
Just putting namespace { T t; } in a header causes ODR violations. The reference mechanism avoids that problem. In Christ, Steven Watanabe

Thanks for taking the time to review the library. Zach Laine wrote:
I want to say at the outset that I work with a lot of people that do signal processing, and so that is the main use case my colleagues and I would have for using Boost.TimeSeries. That has definitely colored my view of Boost.TimeSeries, for what that's worth.
* What is your evaluation of the design?
Why is a series with only one nonzero element called a delta series? I would expect a delta series to be the elementwise differences between two series or some such. In fact, a little Googling revealed many references to delta series that were the differences of two series, but none like delta_series<>. Is this just my ignorance of the financial uses of time series?
The name comes from Dirac delta function: http://en.wikipedia.org/wiki/Dirac_delta_function
I'd like to see an overload of the coarse_grain() function allowing the specification of a DownSampler. I have a need for coarse resampling that preserves the largest value in the original set of finer samples, and I'm sure many others would like to use their favorite convolution-of-oversamples technique to move from high to low sample frequency. In fact, the ability to specify the method used when moving up or down in sample frequency should be a first-class feature of the coarse_grain() and fine_grain() functions. The default behavior is a gross simplification for many users' purposes. Please see my note elsewhere on the lack of UpSampler concept docs, making writing a custom one more difficult than it needs to be.
Agreed, there should be a general and extensible interface for up- and down-sampling.
The TimeSeries concept docs state that range_run_storage::pre_value( s ) returns "The value associated with the pre_run, if there is one." Does it return TimeSeries< S >::zero_type() otherwise? Is there any way for the user to tell the difference between a return value that happens to be equal to zero_type() and one that was never set? These same questions apply to post_value.
From the InfiniteRangeRunStorage concept requirements about pre_value(), the docs say, "Returns the value of the pre-run. If the pre-run is empty, the return value of pre_value() is undefined." http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/InfiniteRange... Basically, you call pre_run() to see if the pre-run is empty. If so, you don't call pre_value().
I'm not really sure why dense_series<> does not allow floating point offsets. I understand that it is supposed to resemble std::vector. However, there is implicitly a relationship between the indices of the underlying vector and the times represented by the indices. It is possible (and quite convenient) for dense_series<> to hold a discretization_type that represents the start offset, and an interval, and perform the mapping for me. The lack of this mapping means that for dense time series of arbitrary discretization (say 0.5), I have to multiply all my times by 0.5 in user code when using the time series. I feel the library should take care of this for me; the fact that the underlying storage is vector-like should not force me to treat dense_series<> discretizations differently from all other series' discretizations.
How does it force you to treat discretizations differently? Whether your discretization is 1 or 0.5 or whether your underlying storage is dense or sparse, it doesn't affect how you index into the series, does it? I'm afraid I've missed your point. As to whether a dense series can be indexed with a floating point value, and whether some on-the-fly time-dilating transformation can be applied to the index, those are all very interesting questions. There is a scaled view, a clipped view and a shifted view, but no view which modified indices by a multiplicative factor.
I find it a little confusing that discretization is both a template parameter and a ctor parameter. It would perhaps be a bit more clear if template-parameter Discretization were DiscretizationType or something.
OK.
In general, I think that this library would be far more usable if there were more clarification of the relationships among discretization, offsets, and runs/intervals. What I'm getting at here is that the current structure makes it really unclear what DiscretizationType=double, discretization=2.1, offset=3.2 means for a run. Where is the run? At 3.2, or at 2.1 * 3.2? The fact that I have to think about this is probably simply a failure of documentation.
Possibly. Documentation always seems to be the hardest part.
Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series.
Discretizations are useful for coarse- and fine-graining operations that resample the data at different intervals. This can be useful even for time series that are initially arbitrarily-spaced. Sometimes you don't care to resampmle your data at a different discretization, or call the integrate() algorithm. In those cases, the discretization parameter can be completely ignored. It does tend to clutter up the docs, but no more than, say, the allocator parameter clutters up std::vector's docs.
In addition, a sample should be representable as a point like 3.14 or a run like [3.14, 4.2).
A zero-width point, like [3.14, 3.14)? What that would mean in the context of the time_series library is admittedly still an outstanding design issue.
* What is your evaluation of the implementation?
I didn't look too closely at the implementation, due to lack of time, but I would like to know why the functors in boost::range_run_storage are const references to singletons in an anonymous namespace. What is that supposed to accomplish that other techniques do not? One of the great things about boost is learning new advanced techniques and discovering best practices for sticky coding situations. Without the why, this technique is lost on me, and probably others. Note that I'm not raising any issue with this implementation choice -- I'm just looking to understand why this was done.
It's to avoid violations of the one-definition rule. If they were just global constants, then each translation unit would have their own copy of the globals (try it and see), and templates that refer to them would end up with different references when instantiated in different translation units. It's a nit-picky and rather unimportant implementation detail, but probably worth a mention in the (not there) rationale section.
* What is your evaluation of the documentation?
The docs could use a gentler intro. What I mean by this is that the docs appear to be somewhat inverted. Moving from foundations to things that build upon those foundations, and from the general to the specific would be better. For example, delta_series<> has the following description on the first page of the user manual: "A series which has exactly one run of unit length." I have no precise idea at this point what a run is, and moreover following the link gives a much more comprehensible definition in the reference section: "boost::time_series::delta_series -- A Mutable_TimeSeries that has a distinct value at some unique offset, and zero elsewhere". After reading the Range-Run Abstraction section, I think it would be appropriate to place that before mentions of runs, specific time series models, etc. It was only on the second pass that I fully understood the material presented up front in Series Containers.
This is very good feedback, thanks.
In general, the documentation is skewed towards financial use, but doesn't need to be -- the library is useful for other purposes as well. For instance, when the predefined resolution types are presented, it seems that these are somehow necessary, or that the lack of a "seconds" or "milliseconds" resolution typedef might be a concern. Further investigation indicates that in fact the Discretization can be virtually any numeric type (is that correct?).
Correct, and the consensus is that these should simply be removed from the library as not useful and actually misleading.
Placing these at the beginning of the documentation made several of my signal processing colleagues think this library would only be good for financial users. I suggest a more general emphasis, and a specific subsection that says something like "this was originally written for financial use, and here are some features that facilitate such use ...". For example, here is what I wrote on my first pass through the docs; I realize what the situation is now, but here is how the docs led me astray a bit: "Discretization is only done on days on up. What about seconds, minutes, hours, milliseconds (the one I need) discretization? Are series with different discretizations usable together without explicit user conversion? Also, if mpl::int_<360> is used as a conversion factor for yearly, it's wrong for many users' purposes. If it's an arbitrary number, how about using mpl::int_<0>, mpl::int_<1>, etc., and adding some other time increments?"
Another suggestion has been to make boost.units usable as the discretization parameter, and I think that's a promising direction.
In period_sums() description in the manual section, "summed" is misspelled "summer".
Thanks.
I wanted to figure out whether I could supply my own sampler functor to fine_grain(), but it is unclear from the documentation what an UpSampler template parameter is supposed to look like. Looking at the fine_grain() reference is no help, due to lack of concept documentation. Following the samplers::sampler_base<> link was no help either, since it appears not to be documented other than listing its API. I finally found out what I need to do to write a sampler only by looking at the piecewise_upsample implementation. The coarse_grain() function says that it requires the coarser grain to be a multiple of the finer grain. I assume that should read "integer multiple"; is that right?
Right. I could provide more general up-sample and down-sample interfaces.
After finding the above doc issues, I did a quick survey of the functions in the manual's Algorithms section. Most of them appear to have complete concept docs, but there are some deficiencies: Partial concept docs: integrate (missing Series requirements) Low-to-no concept docs: coarse_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) fine_grain (missing all requirements except that the discretizations need to be integer multiples of one another, but doesn't use the word integer) invert_heaviside (no requirements at all)
Noted.
The rest of the algorithm detailed docs have concept requirements, but it would be much easier to use them if the concepts were links to the relevant concept docs; as it is now, I have to do some bit of searching to find each one listed. This applies generally to all references to concepts throughout the docs -- even in the concepts docs, I find names of concepts that I must then look up by going back to the TOC, since they are not links.
Yeah, it's a limitation of our BoostBook took chain. Doxygen actually emits this documentation with cross-links, but out doxygen2boostbook XSL transform actually ignores them. Very frustrating.
The fact that "The only series type that does not support floating-point offsets is dense_series<>" should not just be mentioned in the manual, but also in the Discretization requirements in the dense_series<> reference docs.
OK.
There appears not to be a rationale section. Even listing the usage cases alluded to in the Acknowledgements section may help illuminate the choices that were made. Specific things I'd like to see in a rationale: - Why was commit() chosen, instead of simpler self-contained insertions?
I just answered this one in response to a msg by Steven Watanabe.
- Why were ordered_inserters chosen instead of more STL-like insertion via iterators and member functions?
Also answered in that msg.
- Why do ordered_inserters wipe out the previous contents of a time series, instead of appending/inserting?
Because I picked a bad name. Should be called ordered_builder or ordered_assign, or something.
- Why can't elements and/or ranges of elements be removed from or added to an existing time series?
They can. See range_run_storage::set_at(). To remove the elements, you can set_at() with zeros. I should mention this in the user docs though, because you're not the only person who thought they couldn't do it.
- Why is the TimeSeries concept interface split between boost::sequence namespace and boost::range_run_storage namespace accessors?
Hmm. The idea is that the Sequence concepts can be reused -- there's nothing Time_series specific to it. Ditto for RangeRunStorage. Other libraries can use them and not have to drag in a bunch of time series-specific cruft. That's why they're in their own directories, and in their own namespaces.
* What is your evaluation of the potential usefulness of the library?
I think it is potentially quite useful. However, I think its usefulness is not primarily as a financial time series library, but as I mentioned earlier, its current docs make it sound as if it is mainly only useful for that. In addition, I am forced to ask how a time series library is more useful for signal processing than a std::vector and an extrinsic discretization value.
It's for when you want to many options for the in-memory representation of a series, and efficient and reusable algorithms that work equally well on all those different representations.
The answer I came up with is that Boost.TimeSeries is really only advantageous when you have arbitrary spacing between elements, or when you want to use two representations of time series in an algorithm. That is, using Boost.TimeSeries' two-series for_each() is almost certainly better than a custom -- and probably complicated -- loop everywhere I need to operate on two time series. However, these cases are relatively rare in signal processing; it is much more common to simply loop over all the samples and do some operation on each element. This can be accomplished just as well with std::for_each or std::transform.
If std::vector and std::for_each meet your needs, then yes I agree Time_series is overkill for you. That's not the case for everyone.
The question then becomes, "Does using Boost.TimeSeries introduce clarifying abstractions, or conceptual noise?". The concensus among my colleagues is that the latter is the case.
Sorry you feel that way.
Some specific signal-processing usability concerns: - For many signal processing tasks, the time series used is too large to fit in memory. The solution is usually to use a circular buffer or similar structure to keep around just the part you need at the moment. The Boost.TimeSeries series types seem unable to accommodate this mode of operation.
Not "unable to accommodate" -- making a circular buffer model the time series concept would be fairly straightforward, and then all the existing algorithms would work for it. But no, there is no such type in the library at present.
- Two of the most potentially useful bits of Boost.TimeSeries for certain kinds of signal processing are the coarse_grain() and fine_grain(). These do not allow in the case of coarse grain, and make difficult in the case of fine grain, the use of an arbitrary functor to do downsampling/upsampling.
True.
- It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing.
I wonder why you say that. The library provides a 2-series transform() algorithm that is for just this purpose. As for the rolling window calculations, I have code that does that, and sent it around on this list just a few weeks ago. I hope to add the rolling average algorithm soon. It uses a circular buffer, and would make a good example for the docs.
If problems like that were looked at by the developers, it may well inform their design by helping them finding new generalizations and so forth. Likewise, if methods for using Boost.TimeSeries to do this kind of work were present in the documentation, the library would have immediate appeal to a whole new range of folks in the scientific computing field.
* Did you try to use the library? With what compiler? Did you have any problems?
No.
* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
In-depth reading of the docs, API, and some of the implementation. I ran out of time to evaluate much code before the review ended.
* Are you knowledgeable about the problem domain?
Somewhat. I work at a place that does a lot of signal processing using time series, and though I work with these time series from time to time, it's not my core area.
* Do you think the library should be accepted as a Boost library?
As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote.
I'm still not clear on what you mean by "clearly-defined relationships between samples and their extents and offsets." The rest is all fair. Rolling-window is already implemented, but not yet included. Thanks for taking the time. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
I'm not really sure why dense_series<> does not allow floating point offsets. I understand that it is supposed to resemble std::vector. However, there is implicitly a relationship between the indices of the underlying vector and the times represented by the indices. It is possible (and quite convenient) for dense_series<> to hold a discretization_type that represents the start offset, and an interval, and perform the mapping for me. The lack of this mapping means that for dense time series of arbitrary discretization (say 0.5), I have to multiply all my times by 0.5 in user code when using the time series. I feel the library should take care of this for me; the fact that the underlying storage is vector-like should not force me to treat dense_series<> discretizations differently from all other series' discretizations.
How does it force you to treat discretizations differently? Whether your discretization is 1 or 0.5 or whether your underlying storage is dense or sparse, it doesn't affect how you index into the series, does it? I'm afraid I've missed your point.
I wrote "discretization" when I meant to say "offset". Perhaps it's even that I'm merely confused, but this is actually making my point -- I see a need for clarifying the relationships among disctretization, offset, and run. My whole point was that (as I understand it) in order to represent an offset of 3.14 I need to keep an extrinsic value somewhere that tells me how to convert between 3.14 and the integer offset used in dense_series<>' runs. Is that accurate? If so, isn't this at odds with the other series types, which let me specify double values for offsets directly?
Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series.
Discretizations are useful for coarse- and fine-graining operations that resample the data at different intervals. This can be useful even for time series that are initially arbitrarily-spaced.
Sometimes you don't care to resampmle your data at a different discretization, or call the integrate() algorithm. In those cases, the discretization parameter can be completely ignored. It does tend to clutter up the docs, but no more than, say, the allocator parameter clutters up std::vector's docs.
Is discretization then properly a property of the series itself? If the offsets of each sample are not related to the discretization, why have both in the same container? I find this very confusing. To accomodate the algorithms you mention above, would it be possible to simply say that I want to resample using a scale factor instead? What I'm getting at here is that discretization and offset seem to have a very muddy relationship. Doing everything in terms of offset seems clearer to me, and I don't yet see how this simplification loses anything useful.
In addition, a sample should be representable as a point like 3.14 or a run like [3.14, 4.2).
A zero-width point, like [3.14, 3.14)? What that would mean in the context of the time_series library is admittedly still an outstanding design issue.
Fair enough.
The rest of the algorithm detailed docs have concept requirements, but it would be much easier to use them if the concepts were links to the relevant concept docs; as it is now, I have to do some bit of searching to find each one listed. This applies generally to all references to concepts throughout the docs -- even in the concepts docs, I find names of concepts that I must then look up by going back to the TOC, since they are not links.
Yeah, it's a limitation of our BoostBook took chain. Doxygen actually emits this documentation with cross-links, but out doxygen2boostbook XSL transform actually ignores them. Very frustrating.
That's too bad.
* What is your evaluation of the potential usefulness of the library?
I think it is potentially quite useful. However, I think its usefulness is not primarily as a financial time series library, but as I mentioned earlier, its current docs make it sound as if it is mainly only useful for that. In addition, I am forced to ask how a time series library is more useful for signal processing than a std::vector and an extrinsic discretization value.
It's for when you want to many options for the in-memory representation of a series, and efficient and reusable algorithms that work equally well on all those different representations.
This is very true, and that's what I was alluding to below, if a bit unclearly.
The answer I came up with is that Boost.TimeSeries is really only advantageous when you have arbitrary spacing between elements, or when you want to use two representations of time series in an algorithm. That is, using Boost.TimeSeries' two-series for_each() is almost certainly better than a custom -- and probably complicated -- loop everywhere I need to operate on two time series. However, these cases are relatively rare in signal processing; it is much more common to simply loop over all the samples and do some operation on each element. This can be accomplished just as well with std::for_each or std::transform.
If std::vector and std::for_each meet your needs, then yes I agree Time_series is overkill for you. That's not the case for everyone.
The question then becomes, "Does using Boost.TimeSeries introduce clarifying abstractions, or conceptual noise?". The concensus among my colleagues is that the latter is the case.
Sorry you feel that way.
I think this feeling would change rapidly if there were more features directly applicable to signal processing, as mentioned below.
Some specific signal-processing usability concerns: - For many signal processing tasks, the time series used is too large to fit in memory. The solution is usually to use a circular buffer or similar structure to keep around just the part you need at the moment. The Boost.TimeSeries series types seem unable to accommodate this mode of operation.
Not "unable to accommodate" -- making a circular buffer model the time series concept would be fairly straightforward, and then all the existing algorithms would work for it. But no, there is no such type in the library at present.
I'm glad to hear that this would be straightforward to do, and I think it's a must-have for signal processing folks.
- It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing.
I wonder why you say that. The library provides a 2-series transform() algorithm that is for just this purpose.
That's why I asked about "more than two time series". Such convolutions of multiple time series can be done in one pass, and Boost.TimeSeries does this admirably for N=2, but rewriting transform() for N>2 is a lot for most users to bite off.
As for the rolling window calculations, I have code that does that, and sent it around on this list just a few weeks ago. I hope to add the rolling average algorithm soon. It uses a circular buffer, and would make a good example for the docs.
I agree. This would be a great addition to the docs.
As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote.
I'm still not clear on what you mean by "clearly-defined relationships between samples and their extents and offsets." The rest is all fair. Rolling-window is already implemented, but not yet included.
I was alluding to my issue with the relationships among discretization, offset, and run that I mentioned earlier. Zach Laine

Zach Laine wrote:
On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
I'm not really sure why dense_series<> does not allow floating point offsets. I understand that it is supposed to resemble std::vector. However, there is implicitly a relationship between the indices of the underlying vector and the times represented by the indices. It is possible (and quite convenient) for dense_series<> to hold a discretization_type that represents the start offset, and an interval, and perform the mapping for me. The lack of this mapping means that for dense time series of arbitrary discretization (say 0.5), I have to multiply all my times by 0.5 in user code when using the time series. I feel the library should take care of this for me; the fact that the underlying storage is vector-like should not force me to treat dense_series<> discretizations differently from all other series' discretizations.
How does it force you to treat discretizations differently? Whether your discretization is 1 or 0.5 or whether your underlying storage is dense or sparse, it doesn't affect how you index into the series, does it? I'm afraid I've missed your point.
I wrote "discretization" when I meant to say "offset". Perhaps it's even that I'm merely confused, but this is actually making my point -- I see a need for clarifying the relationships among disctretization, offset, and run.
OK, I understand. Yes, the docs can certainly be improved.
My whole point was that (as I understand it) in order to represent an offset of 3.14 I need to keep an extrinsic value somewhere that tells me how to convert between 3.14 and the integer offset used in dense_series<>' runs. Is that accurate?
Yes, that's right because as it stands, dense_series is the only one that doesn't allow FP offsets. That was a conservative design choice because I wasn't sure at the time that I understood what it meant. It's clear now that I need to have a rethink about FP offsets in general.
If so, isn't this at odds with the other series types, which let me specify double values for offsets directly?
Correct.
Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series.
Discretizations are useful for coarse- and fine-graining operations that resample the data at different intervals. This can be useful even for time series that are initially arbitrarily-spaced.
Sometimes you don't care to resampmle your data at a different discretization, or call the integrate() algorithm. In those cases, the discretization parameter can be completely ignored. It does tend to clutter up the docs, but no more than, say, the allocator parameter clutters up std::vector's docs.
Is discretization then properly a property of the series itself?
You can think of discretization as an "SI unit" for series offsets. The analogy isn't perfect because the discretization is more than just type information -- it's a multiplicative factor that is logically applied to offsets. More below.
If the offsets of each sample are not related to the discretization, why have both in the same container? I find this very confusing.
Think of a dense series D with integral offsets and values representing a quantity polled at 5ms intervals. D[0] represents the value of the quantity at T=0ms, D[1] represents the value at 5ms, etc.... In this case, the discretization is 5ms. In series types that are not dense, having a non-unit discretization is not as compelling. But its useful for consistency. If I replace a dense series with a sparse series, I don't want to change how I index it. And if I am given two series -- one dense and one sparse -- as long as their discretizations are the same, I can traverse the two in parallel, confident that their offsets represent the same position in the series.
To accomodate the algorithms you mention above, would it be possible to simply say that I want to resample using a scale factor instead? What I'm getting at here is that discretization and offset seem to have a very muddy relationship. Doing everything in terms of offset seems clearer to me, and I don't yet see how this simplification loses anything useful.
Would you agree that, although you can do arithmetic on untyped numbers, it's often a bad idea? Yes, you can resample an untyped series with an untyped scale factor. But guarding against Mars-lander type units mash-ups is just one use for discretizations. See above. If discretization really doesn't matter for your application, you can just not specify it. It will default to int(1). Or you can use the lower-level range_run_storage classes. I have suggested elsewhere that they can be pulled out into their own sub-library. They lack any notion of discretization. I'm still convinced discretizations are useful for a time series library. This whole discussion should be cleaned up and put in the docs. <snip>
Some specific signal-processing usability concerns: - For many signal processing tasks, the time series used is too large to fit in memory. The solution is usually to use a circular buffer or similar structure to keep around just the part you need at the moment. The Boost.TimeSeries series types seem unable to accommodate this mode of operation.
Not "unable to accommodate" -- making a circular buffer model the time series concept would be fairly straightforward, and then all the existing algorithms would work for it. But no, there is no such type in the library at present.
I'm glad to hear that this would be straightforward to do, and I think it's a must-have for signal processing folks.
If I read into your request a bit, I think a series implemented as a circular buffer isn't actually the right approach for you. Rather, I envision a single-pass series type -- one that spits out data points and offsets as they are pulled from some stream. The circular buffer rightfully belongs in a time series algorithm, because only the algorithm knows how much history it needs to do its job. This is how the rolling average algorithm I wrote works. After all, the series may be a terrabyte, but only N samples are needed at any time to compute the rolling average. I think the circular-buffer-in-an-algorithm approach is very important. I could imagine generalizing the rolling average algorithm so that it's useful for all kinds of computation.
- It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing. I wonder why you say that. The library provides a 2-series transform() algorithm that is for just this purpose.
That's why I asked about "more than two time series". Such convolutions of multiple time series can be done in one pass, and Boost.TimeSeries does this admirably for N=2, but rewriting transform() for N>2 is a lot for most users to bite off.
Oh, yes. It gets pretty hairy. If we restrict ourselves to non-infinite series (ones without pre- and post-runs) it is straightforward to traverse N series in parallel. Even for infinite series it's doable with a thin abstraction layer. The trouble comes when you want the extreme performance that comes from taking advantage of algorithm specialization for things like denseness or unit runs. Choosing an optimal parallel series traversal becomes a combinatorial explosion. In these cases, I think picking a traversal strategy that is merely Good instead of The Best is probably the way forward.
As for the rolling window calculations, I have code that does that, and sent it around on this list just a few weeks ago. I hope to add the rolling average algorithm soon. It uses a circular buffer, and would make a good example for the docs.
I agree. This would be a great addition to the docs.
Not just for the docs. It should be a reusable algorithm in the library.
As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote.
I'm still not clear on what you mean by "clearly-defined relationships between samples and their extents and offsets." The rest is all fair. Rolling-window is already implemented, but not yet included.
I was alluding to my issue with the relationships among discretization, offset, and run that I mentioned earlier.
Is it any clearer now? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/9/07, Eric Niebler <eric@boost-consulting.com> wrote:
Nonetheless, it would be best if it were possible to specify that a sample exists at offset X, where X is double, int, or units::seconds, without worrying about any other details, including discretization. That is, discretization seems useful to me only for regularly-spaced time series, and seems like noise for arbitrarily-spaced time series.
Discretizations are useful for coarse- and fine-graining operations that resample the data at different intervals. This can be useful even for time series that are initially arbitrarily-spaced.
Sometimes you don't care to resampmle your data at a different discretization, or call the integrate() algorithm. In those cases, the discretization parameter can be completely ignored. It does tend to clutter up the docs, but no more than, say, the allocator parameter clutters up std::vector's docs.
Is discretization then properly a property of the series itself?
You can think of discretization as an "SI unit" for series offsets. The analogy isn't perfect because the discretization is more than just type information -- it's a multiplicative factor that is logically applied to offsets. More below.
It's the second part I have a problem with. More below.
If the offsets of each sample are not related to the discretization, why have both in the same container? I find this very confusing.
Think of a dense series D with integral offsets and values representing a quantity polled at 5ms intervals. D[0] represents the value of the quantity at T=0ms, D[1] represents the value at 5ms, etc.... In this case, the discretization is 5ms.
In series types that are not dense, having a non-unit discretization is not as compelling. But its useful for consistency. If I replace a dense series with a sparse series, I don't want to change how I index it. And if I am given two series -- one dense and one sparse -- as long as their discretizations are the same, I can traverse the two in parallel, confident that their offsets represent the same position in the series.
To accomodate the algorithms you mention above, would it be possible to simply say that I want to resample using a scale factor instead? What I'm getting at here is that discretization and offset seem to have a very muddy relationship. Doing everything in terms of offset seems clearer to me, and I don't yet see how this simplification loses anything useful.
Would you agree that, although you can do arithmetic on untyped numbers, it's often a bad idea? Yes, you can resample an untyped series with an untyped scale factor. But guarding against Mars-lander type units mash-ups is just one use for discretizations. See above.
If discretization really doesn't matter for your application, you can just not specify it. It will default to int(1). Or you can use the lower-level range_run_storage classes. I have suggested elsewhere that they can be pulled out into their own sub-library. They lack any notion of discretization.
You misunderstand me. I'm all for DiscretizationType (the template parameter -- yay typesafety), and I'm all for discretization (the value and associated fuctions) for dense series. What I'm against is using discretization (the value) for non-dense series. (Note that we have run into the Discretization/discretization naming ambiguity again here.) I find it a confusing value to keep around, especially since it can be simply ignored, as you pointed out in a previous email. A data value that you can access and that is put forward as a first-class element of the design -- that is also ignored -- suggests a better design is possible. Here's what I suggest: The Discretization template parameter becomes OffsetType, which is IMO a more accurate name, and follows the RangeRun concepts. I will be using that name below. The "OffsetType discretization" ctor parameter, the "OffsetType discretization()" accessors, and "void discretization(OffsetType d)" mutators should only be applied to the dense_series<> type. The user should be able to access any data in any type exclusively by using offsets. This means that dense_series<> seamlessly handles the mapping between (possibly floating point) offset and sample index I requested initially. This has these advantages: The notion of discretization is not introduced into types for which it has questionable meaning, e.g. piecewise_constant_series<>. Offsets can be used exclusively, both on input (as before) and output (as before, but now including dense_series<> with floating point offsets). Floating point and integral offsets are now treated much more uniformly. Is this reasonable? Have I perhaps missed something fundamental?
- It might be instructive to both the Boost.TimeSeries developers and some of its potential users if certain common signal-processing algorithms were implemented with the library, even if just in the documentation. For example, how might one implement a sliding-window normalizer over densely populated, millisecond resolution data? What if this normalization used more than two time series to do it's work? It may well be possible with the current framework, but a) it's not really clear how to do it based on the documentation and b) the documenation almost seems to have a bias against that kind of processing. I wonder why you say that. The library provides a 2-series transform() algorithm that is for just this purpose.
That's why I asked about "more than two time series". Such convolutions of multiple time series can be done in one pass, and Boost.TimeSeries does this admirably for N=2, but rewriting transform() for N>2 is a lot for most users to bite off.
Oh, yes. It gets pretty hairy. If we restrict ourselves to non-infinite series (ones without pre- and post-runs) it is straightforward to traverse N series in parallel. Even for infinite series it's doable with a thin abstraction layer. The trouble comes when you want the extreme performance that comes from taking advantage of algorithm specialization for things like denseness or unit runs. Choosing an optimal parallel series traversal becomes a combinatorial explosion. In these cases, I think picking a traversal strategy that is merely Good instead of The Best is probably the way forward.
Does this mean that an N-series transform may be in the offing?
As for the rolling window calculations, I have code that does that, and sent it around on this list just a few weeks ago. I hope to add the rolling average algorithm soon. It uses a circular buffer, and would make a good example for the docs.
I agree. This would be a great addition to the docs.
Not just for the docs. It should be a reusable algorithm in the library.
Good to hear.
As it stands, no. If there were clearly-defined relationships between samples and their extents and offsets; better support for large and/or piecewise-mutable time series; a rolling-window algorithm; and better customizability of coarse_grain() and fine_grain(), I would probably change my vote.
I'm still not clear on what you mean by "clearly-defined relationships between samples and their extents and offsets." The rest is all fair. Rolling-window is already implemented, but not yet included.
I was alluding to my issue with the relationships among discretization, offset, and run that I mentioned earlier.
Is it any clearer now?
It was always clear to me, I just disagreed with the design; I hope my objections are clearer now. Zach Laine

Zach Laine wrote:
You misunderstand me. I'm all for DiscretizationType (the template parameter -- yay typesafety), and I'm all for discretization (the value and associated fuctions) for dense series. What I'm against is using discretization (the value) for non-dense series. (Note that we have run into the Discretization/discretization naming ambiguity again here.) I find it a confusing value to keep around, especially since it can be simply ignored, as you pointed out in a previous email. A data value that you can access and that is put forward as a first-class element of the design -- that is also ignored -- suggests a better design is possible. Here's what I suggest:
The Discretization template parameter becomes OffsetType, which is IMO a more accurate name, and follows the RangeRun concepts. I will be using that name below. The "OffsetType discretization" ctor parameter, the "OffsetType discretization()" accessors, and "void discretization(OffsetType d)" mutators should only be applied to the dense_series<> type. The user should be able to access any data in any type exclusively by using offsets. This means that dense_series<> seamlessly handles the mapping between (possibly floating point) offset and sample index I requested initially.
This has these advantages: The notion of discretization is not introduced into types for which it has questionable meaning, e.g. piecewise_constant_series<>. Offsets can be used exclusively, both on input (as before) and output (as before, but now including dense_series<> with floating point offsets). Floating point and integral offsets are now treated much more uniformly.
Is this reasonable? Have I perhaps missed something fundamental?
Reasonable, but problematic for reasons of unclear semantics and sub-par performance. A similar design was considered early on. Semantically, it makes it unclear what indexing into a dense series means. Consider: dense_series<int> d( discretization = 5 ); In your design as I understand, d[0], d[1], d[2], d[3] and d[4] all refer to the same element in the dense series. To me, this interface changes the meaning of a dense series. It is no longer a dense sampling of points at regular intervals. Now it looks piecewise constant. To illustrate the sorts of confusion this can cause, what happens now if I use set_at() to try to change the value at offset 2? It's nonsensical. As another example, say I have a dense series with discretization of 5ms and samples at 0ms, 5ms, and 10ms. Now I also have a sparse series (no discretization) with samples at 0, 5 and 10. Now I add the two. What does that mean? Does the dense series have a value at offset 3? If I understand your design, yes. Does the sparse series? No. Will the result of adding the two? It seems the answer is yes, but it should be no. Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization. There are performance implications too. Indexing into a dense series is currently not much more expensive than indexing into an array. With this change, every access to the dense series would incur an integer division. And traversing a dense series would incur an integer multiplication at each iteration. But the real performance problems creep in when a dense series has a discretization, but other series do not. In the current design, algorithms check the discretization of two series once for compatibility, and then it can be safely ignored for the rest of the computation. Traversing two series in parallel involves conditionally bumping one or both cursors depending on if and how the current runs overlap. Dense and sparse both currently have "indivisible" runs. If they start at the same offset, I know they overlap perfectly and can bump both cursors. If I don't know that both series are using the same discretization, I can't do this, because the notion of indivisibility is intricately tied to discretization. What constitutes an indivisible run for the dense series [0ms,5ms) is different than an indivisible run for sparse series [0ms,1ms). The only way to fix these problems would be to add back a discretization to all the series types. Then the indexing thing (whether it indices are dense and logically multiplied by the discretization, or sparse and actually multiplied) becomes a matter of convention, convenience and efficiency. And on those counts I think the current design is a winner. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/10/07, Eric Niebler <eric@boost-consulting.com> wrote:
Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization.
Does this apply to floating point series as well? If so, then it makes the discretized floating point case behave similarly to the integer offset case, i.e. the continuous range run is / can be reduced to a discrete set of points. If so, great. Best regards, Stjepan

Stjepan Rajko wrote:
On 8/10/07, Eric Niebler <eric@boost-consulting.com> wrote:
Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization.
Does this apply to floating point series as well? If so, then it makes the discretized floating point case behave similarly to the integer offset case, i.e. the continuous range run is / can be reduced to a discrete set of points. If so, great.
Sorry, no. I was referring to series types with integral offsets, which are by their nature discrete. Floating point offsets are not discrete. For representing discrete data, a series with integral offsets is thee way to go, perhaps with an interpolating facade for floating point indexed access. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/10/07, Eric Niebler <eric@boost-consulting.com> wrote:
Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization.
I was reading the rest of your email and having a hard time understand how we were talking past each other so completely. Here is where I could first figure out how we'd gotten out of phase. I was operating under the impression that sparse series could have values at arbitrary offsets (not integer multiples of the discretization). It was this misunderstanding on my part that lead to my suggestion to remove discretization from some types. Consider it withdrawn. Though you're probably sick of hearing this, the use of discretization should be spelled out more explicitly in the documentation to make this clear. Zach Laine

Zach Laine wrote:
On 8/10/07, Eric Niebler <eric@boost-consulting.com> wrote:
Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization.
I was reading the rest of your email and having a hard time understand how we were talking past each other so completely. Here is where I could first figure out how we'd gotten out of phase. I was operating under the impression that sparse series could have values at arbitrary offsets (not integer multiples of the discretization). It was this misunderstanding on my part that lead to my suggestion to remove discretization from some types. Consider it withdrawn. Though you're probably sick of hearing this, the use of discretization should be spelled out more explicitly in the documentation to make this clear.
OK, I think we're on the same page now, but just to clarify the situation, the following two series are identical: sparse_series<int> s( discretization = 5 ); dense_series<int> d( discretization = 5 ); make_ordered_inserter(s)(1,1)(2,2)(3,3).commit(); make_ordered_inserter(d)(1,1)(2,2)(3,3).commit(); These both have a 1 at offset 1, a 2 at offset 2 and a 3 at offset 3. The offsets represent T=0,5,10 respectively due to the discretization. Because the discretization is a logical multiplicative factor of the offsets, the sparse series is guaranteed to have values only at points that are multiples of the discretization; that is, T=0,5,10. The docs have a ways to go to make this clear, I agree. But now that its clear and your suggestion is withdrawn, is your objection to the time series library also withdrawn? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
Zach Laine wrote:
On 8/10/07, Eric Niebler <eric@boost-consulting.com> wrote:
Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization.
I was reading the rest of your email and having a hard time understand how we were talking past each other so completely. Here is where I could first figure out how we'd gotten out of phase. I was operating under the impression that sparse series could have values at arbitrary offsets (not integer multiples of the discretization). It was this misunderstanding on my part that lead to my suggestion to remove discretization from some types. Consider it withdrawn. Though you're probably sick of hearing this, the use of discretization should be spelled out more explicitly in the documentation to make this clear.
OK, I think we're on the same page now, but just to clarify the situation, the following two series are identical:
sparse_series<int> s( discretization = 5 ); dense_series<int> d( discretization = 5 );
make_ordered_inserter(s)(1,1)(2,2)(3,3).commit(); make_ordered_inserter(d)(1,1)(2,2)(3,3).commit();
These both have a 1 at offset 1, a 2 at offset 2 and a 3 at offset 3. The offsets represent T=0,5,10 respectively due to the discretization. Because the discretization is a logical multiplicative factor of the offsets, the sparse series is guaranteed to have values only at points that are multiples of the discretization; that is, T=0,5,10.
I think I've got it; thanks for clearing this up.
The docs have a ways to go to make this clear, I agree. But now that its clear and your suggestion is withdrawn, is your objection to the time series library also withdrawn?
I guess my vote is now a provisional yes. I like the library overall, but there are too many details that appear not to be solidified for me to vote yes without seeing them addressed: The issues Steven has raised wrt the use of commit(). If there really are no efficiency gains from the commit() technique, it is unidiomatic enough that I think it should go away. The documentation is pretty far enough from where I think it should be. I used xpressive for the first time over the weekend, so I now know you know how to write great docs. :) The rolling-window algorithm should be added. I'm still hazy on whether adding data to the high end and dropping data off the low end is a reasonable and efficient usage of a series, or, if not, whether there are methods users can employ to efficiently process series that are too large to keep in memory all at once. If this is easy to do with the library, I think it's important enough as a use case that it deserves its own example. coarse_grain() and fine_grain() need to be customizable. I would still like to see a int/floating point mapping from sample space to index space for dense_series<> that you alluded to in a previous email. So, with the understanding that these issues will be addressed post-review, I vote yes. I leave it up to the review manager to determine whether this volume of provisions means that my vote should count as "yes, but please address this", or "no, it needs another pass." Zach Laine

Zach Laine wrote:
On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
The docs have a ways to go to make this clear, I agree. But now that its clear and your suggestion is withdrawn, is your objection to the time series library also withdrawn?
I guess my vote is now a provisional yes.
Woo-hoo! :-)
I like the library overall, but there are too many details that appear not to be solidified for me to vote yes without seeing them addressed:
OK.
The issues Steven has raised wrt the use of commit(). If there really are no efficiency gains from the commit() technique, it is unidiomatic enough that I think it should go away.
Have you followed recent messages in that thread? I was able to explain the rationale for the inserters in a way that made sense to Steven. This is the key message: http://lists.boost.org/Archives/boost/2007/08/125963.php
The documentation is pretty far enough from where I think it should be. I used xpressive for the first time over the weekend, so I now know you know how to write great docs. :)
Thanks!
The rolling-window algorithm should be added.
Consider it done.
I'm still hazy on whether adding data to the high end and dropping data off the low end is a reasonable and efficient usage of a series, or, if not, whether there are methods users can employ to efficiently process series that are too large to keep in memory all at once. If this is easy to do with the library, I think it's important enough as a use case that it deserves its own example.
I think there should be an example of a "single-pass" time series. (This would be like an istream_iterator, but it could be modeled by a time series that memory maps segments from a huge file, for instance.) Then, I can show how such a series can be used with the rolling average algorithm (which internally uses a circular buffer).
coarse_grain() and fine_grain() need to be customizable. I would still like to see a int/floating point mapping from sample space to index space for dense_series<> that you alluded to in a previous email.
Are you referring to an interpolating facade? Yes, that is a must have.
So, with the understanding that these issues will be addressed post-review, I vote yes. I leave it up to the review manager to determine whether this volume of provisions means that my vote should count as "yes, but please address this", or "no, it needs another pass."
These are all on the ToDo list, or are already done. Thanks for all your feedback. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/13/07, Eric Niebler <eric@boost-consulting.com> wrote:
The issues Steven has raised wrt the use of commit(). If there really are no efficiency gains from the commit() technique, it is unidiomatic enough that I think it should go away.
Have you followed recent messages in that thread? I was able to explain the rationale for the inserters in a way that made sense to Steven. This is the key message:
Yes, but rereading that one indicates that I may have misunderstood some things. Is it the case that the ordered_inserter/commit() design is an optimization for the general case of sequence insertion? If so, I question whether the user should be forced to use this when there is an efficient iterator-based technique that works as well. In fact, many people will choose an iterator-based insertion technique even if it's less efficient, when the efficiency gains are insignificant, simply for the readability and maintenance benefits.
coarse_grain() and fine_grain() need to be customizable. I would still like to see a int/floating point mapping from sample space to index space for dense_series<> that you alluded to in a previous email.
Are you referring to an interpolating facade? Yes, that is a must have.
That would be great to have too. I was refering to the use of arbitrary up-sample and down-sample functors in these functions. Zach Laine

Zach Laine wrote:
On 8/13/07, Eric Niebler <eric@boost-consulting.com> wrote:
The issues Steven has raised wrt the use of commit(). If there really are no efficiency gains from the commit() technique, it is unidiomatic enough that I think it should go away. Have you followed recent messages in that thread? I was able to explain the rationale for the inserters in a way that made sense to Steven. This is the key message:
Yes, but rereading that one indicates that I may have misunderstood some things. Is it the case that the ordered_inserter/commit() design is an optimization for the general case of sequence insertion? If so, I question whether the user should be forced to use this when there is an efficient iterator-based technique that works as well.
No, they shouldn't. That message presented the rationale for basing the low-level interface on inserters. There is not reason why there shouldn't also be a higher-level interface that is easier to use.
In fact, many people will choose an iterator-based insertion technique even if it's less efficient, when the efficiency gains are insignificant, simply for the readability and maintenance benefits.
My current thinking is to provide an ordered appender and then add a push_back(), implemented in terms of the ordered appender, to the time series types which can efficiently support it. Then std::back_inserter() would Just Work. I wouldn't add push_back() to any type which couldn't efficiently support it, though, for the same reasons that std::list doesn't have operator[].
coarse_grain() and fine_grain() need to be customizable. I would still like to see a int/floating point mapping from sample space to index space for dense_series<> that you alluded to in a previous email. Are you referring to an interpolating facade? Yes, that is a must have.
That would be great to have too. I was refering to the use of arbitrary up-sample and down-sample functors in these functions.
Ah, right. Also on the ToDo list. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com
participants (4)
-
Eric Niebler
-
Steven Watanabe
-
Stjepan Rajko
-
Zach Laine