
With all the work for the release, there haven't been many reviews for a bit. Here is everyone's chance to get back into the reviewing swing. Next Monday (7/30/07) the review begins for the Time Series library from Eric Neibler. It is available for download now, of course using http://boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics and scroll to the Time Series library. It is worth noting that the library uses components not available in the 1.34 release, and so should be tested against CVS Head. This is in the README file, but can be missed if you are just fiddling around. There is also a backport file to modify the 1.34 release if prefered. The review starts next week, but the thinking is welcome any time. Thanks for your time and effort. John Phillips Review Manager

John Phillips wrote:
With all the work for the release, there haven't been many reviews for a bit. Here is everyone's chance to get back into the reviewing swing. Next Monday (7/30/07) the review begins for the Time Series library from Eric Neibler. It is available for download now, of course using
http://boost-consulting.com/vault/index.php?directory=Math%20-%20Numerics
and scroll to the Time Series library.
It is worth noting that the library uses components not available in the 1.34 release, and so should be tested against CVS Head. This is in the README file, but can be missed if you are just fiddling around. There is also a backport file to modify the 1.34 release if prefered.
The review starts next week, but the thinking is welcome any time. Thanks for your time and effort.
And if anybody would like to peruse the documentation online, you can find it here: http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/ -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Hi,
And if anybody would like to peruse the documentation online, you can find it here:
http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/
Since I am currently doinng some time series work, I thought I would have a look. First glance shows an impressive amount of clear thinking and extensability. Below are some use cases that are important to me, and which I can not immeadiatley see as possible within the framework. I see no mention of how to handle large time series that need processing incrementally. There are a lot of concepts in there, so I may well have missed something. At the moment I handle these by reading the data into a circular buffer with enough capacity to provide as much time history as I need for the algorithms to run at each time point. Using some imagination, I can see using the library to do this in either of two ways. The first would be to use the existing series types and update the series values for each new datapoint - but I don't see a way of discarding data points. The second would be to define a new series type that accepted new values, discarded old values, and generally did all the book keeping - is that possible within the confines of the framework? I also have data that has non-constant sampling periods. At the moment I handle these by piecewise sampling to another (constant period) timebase. Can this fitted into the framework? Finally, I don't see a convolution algorithm for applying filters. Probably easy enough to implement, and is in my view important enough to include in the core algorithms. Hugo

Hugo Duncan wrote:
Hi,
And if anybody would like to peruse the documentation online, you can find it here:
http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/
Since I am currently doinng some time series work, I thought I would have a look. First glance shows an impressive amount of clear thinking and extensability.
Thanks.
Below are some use cases that are important to me, and which I can not immeadiatley see as possible within the framework.
I see no mention of how to handle large time series that need processing incrementally. There are a lot of concepts in there, so I may well have missed something. At the moment I handle these by reading the data into a circular buffer with enough capacity to provide as much time history as I need for the algorithms to run at each time point. Using some imagination, I can see using the library to do this in either of two ways.
The first would be to use the existing series types and update the series values for each new datapoint - but I don't see a way of discarding data points.
The second would be to define a new series type that accepted new values, discarded old values, and generally did all the book keeping - is that possible within the confines of the framework?
I don't think either of these is the right approach. It shouldn't be the series' job to keep a circular buffer of data for the algorithm to use. Rather, if the algorithms requires a buffer of previously seen data, it should cache the data itself, as in the rolling average implementation I sent around a few days ago. The Sequence concept on which all the time series' are built requires readable and incrementable cursors. That means the time series algorithms *should* all work with an "input" or single-pass series types -- that is, one with a destructive read. That would be the way to go IMO. I could see a time series type implemented in terms of std::istream that reads runs from std::cin, for instance. Or more practially, one that memory-maps parts of a huge file and traverses it with single pass cursors. This would be a very interesting time series! The algorithms haven't been tested with such a single pass series, but I don't see a fundamental problem with it.
I also have data that has non-constant sampling periods. At the moment I handle these by piecewise sampling to another (constant period) timebase. Can this fitted into the framework?
I'm not 100% sure I understand your use case. But most of the series types and algorithms allow non-discrete sequences. That is, the offsets can be floating point. Could that help?
Finally, I don't see a convolution algorithm for applying filters. Probably easy enough to implement, and is in my view important enough to include in the core algorithms.
Yup, no convolution yet. Sure would be nice. Patches welcome! :-) -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Eric,
It shouldn't be the series' job to keep a circular buffer of data for the algorithm to use. Rather, if the algorithms requires a buffer of previously seen data, it should cache the data itself,
Might it be worth a comment to the above affect in the documentation? It seems to be a fundamental principal for the library, which, at least to me, wasn't clear.
as in the rolling average implementation I sent around a few days ago.
I missed that until after I had posted.
The Sequence concept on which all the time series' are built requires readable and incrementable cursors. That means the time series algorithms *should* all work with an "input" or single-pass series types -- that is, one with a destructive read. That would be the way to go IMO. I could see a time series type implemented in terms of std::istream that reads runs from std::cin, for instance. Or more practially, one that memory-maps parts of a huge file and traverses it with single pass cursors. This would be a very interesting time series! The algorithms haven't been tested with such a single pass series, but I don't see a fundamental problem with it.
Excellent. Files normally contain multivariate data though, so presumabley it would require multiple series backed by a common object to do the memory mapping?
I'm not 100% sure I understand your use case. But most of the series types and algorithms allow non-discrete sequences. That is, the offsets can be floating point. Could that help?
Yes I had seen that, but wasn't sure how it worked for sampled data. In my case I have a multiple time series with a (common) sample time that varies stochastically between 40-60ms. It wasn't clear to me that the offsets could be non-constant stride (whether integer or floating point). Even the sparse series seems to require a constant discretisation.
Yup, no convolution yet. Sure would be nice. Patches welcome! :-)
:-) Hugo

The Sequence concept on which all the time series' are built requires readable and incrementable cursors. That means the time series algorithms *should* all work with an "input" or single-pass series types -- that is, one with a destructive read. That would be the way to go IMO. I could see a time series type implemented in terms of std::istream that reads runs from std::cin, for instance. Or more practially, one that memory-maps parts of a huge file and traverses it with single pass cursors. This would be a very interesting time series! The algorithms haven't been tested with such a single pass series, but I don't see a fundamental problem with it.
Excellent. Files normally contain multivariate data though, so presumabley it would require multiple series backed by a common object to do the memory mapping?
Reading further into the documentation, I see another misconception on my part. If I now understand correctly all the algorithms pull data from series. In the above case, with a destructive read, I think that means I could only run a single algorithm on each of the series? Would it be worth considering exposing an interface for pushing data to the algorithms? It seems from your comments at the start of "Defining New TimeSeries algorithms" and the display_run example, that most of the algorithms are implemented that way anyway.

Hugo Duncan wrote:
Reading further into the documentation, I see another misconception on my part. If I now understand correctly all the algorithms pull data from series. In the above case, with a destructive read, I think that means I could only run a single algorithm on each of the series? Would it be worth considering exposing an interface for pushing data to the algorithms?
My comment about "destructive" read shouldn't be taken to imply that it actually obliterates the data after its read. E.g., if you are using a memory mapped file as a back end, you can run a different algorithm on the same data by "resetting" the series by remapping the file. If you want to push the data, and calculate many things in one pass, the library you want to use is the Accumulators library. http://boost-sandbox.sourceforge.net/libs/accumulators/doc/html/ It's been accepted into Boost, but I haven't committed it to CVS yet.
It seems from your comments at the start of "Defining New TimeSeries algorithms" and the display_run example, that most of the algorithms are implemented that way anyway.
I don't follow you. What do you mean? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

My comment about "destructive" read shouldn't be taken to imply that it actually obliterates the data after its read. E.g., if you are using a memory mapped file as a back end, you can run a different algorithm on the same data by "resetting" the series by remapping the file.
But doesn't it imply (possible) multiple physical reads of the data?
If you want to push the data, and calculate many things in one pass, the library you want to use is the Accumulators library.
Already using it! and very well it works too. I suppose I was thinking this was more for descriptive statistics. Care to define the difference in scope? When I saw "time series" I thought "signal processing" - is that wrong? My use for this is essentially this: Read data from multiple data sources, each with possibly different sampling periods. Resample the data to a common discretisation. Feed the latest value of multiple series as inputs into a model. Derive a series as the difference between model outputs and other input series. Derive filtered versions of some of the signals. Write everything out to disk. I am doing this either after the fact, or in "real time" as the data is being logged. I suppose I could summarise my requirements as "labview or simulink for c++" :-) Maybe this is not what you are aiming for.
It seems from your comments at the start of "Defining New TimeSeries algorithms" and the display_run example, that most of the algorithms are implemented that way anyway.
I don't follow you. What do you mean?
Consider your example, struct display_run { template< class Value, class Offset > void operator()( Value const & value, Offset start, Offset stop ) const; }; or your adjacent_difference_functor which has the same method, then data is pushed to these functors (if I have understood correctly). So your top level algorithms, print_series or adjacent_difference pull data from the series and push it to the functors. I was therefore concluding that the "push interfaces" already existed, but were being viewed as undocumented implementation details rather constructs for library users. Hugo

Hugo Duncan wrote:
Eric,
It shouldn't be the series' job to keep a circular buffer of data for the algorithm to use. Rather, if the algorithms requires a buffer of previously seen data, it should cache the data itself,
Might it be worth a comment to the above affect in the documentation? It seems to be a fundamental principal for the library, which, at least to me, wasn't clear.
Not a bad suggestion.
as in the rolling average implementation I sent around a few days ago.
I missed that until after I had posted.
The Sequence concept on which all the time series' are built requires readable and incrementable cursors. That means the time series algorithms *should* all work with an "input" or single-pass series types -- that is, one with a destructive read. That would be the way to go IMO. I could see a time series type implemented in terms of std::istream that reads runs from std::cin, for instance. Or more practially, one that memory-maps parts of a huge file and traverses it with single pass cursors. This would be a very interesting time series! The algorithms haven't been tested with such a single pass series, but I don't see a fundamental problem with it.
Excellent. Files normally contain multivariate data though, so presumabley it would require multiple series backed by a common object to do the memory mapping?
Probably.
I'm not 100% sure I understand your use case. But most of the series types and algorithms allow non-discrete sequences. That is, the offsets can be floating point. Could that help?
Yes I had seen that, but wasn't sure how it worked for sampled data. In my case I have a multiple time series with a (common) sample time that varies stochastically between 40-60ms. It wasn't clear to me that the offsets could be non-constant stride (whether integer or floating point). Even the sparse series seems to require a constant discretisation.
So, you have a discrete series (i.e., values at offsets), but the offsets map to discretizations according to some piecewise function? Did I get that right? That's interesting. Most of the time series algorithms don't actually use the discretization for anything besides type-checking. The only exception is integrate(), which multiplies runs by the discretization. How do you use the discretization? Perhaps your usage can be accommodated with an extra function that maps from cursor to discretization. I think this is a refinement to the TimeSeries concept, but one which most of the algorithms (besides integrate) won't care about.
Yup, no convolution yet. Sure would be nice. Patches welcome! :-)
:-)
-- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Eric,
I'm not 100% sure I understand your use case. But most of the series types and algorithms allow non-discrete sequences. That is, the offsets can be floating point. Could that help?
Yes I had seen that, but wasn't sure how it worked for sampled data. In my case I have a multiple time series with a (common) sample time that varies stochastically between 40-60ms. It wasn't clear to me that the offsets could be non-constant stride (whether integer or floating point). Even the sparse series seems to require a constant discretisation.
So, you have a discrete series (i.e., values at offsets), but the offsets map to discretizations according to some piecewise function? Did I get that right? That's interesting.
That would be interesting indeed, but no. These are logged data that are being collected from a bus as they arrive. The arrival time period is not a constant.
Most of the time series algorithms don't actually use the discretization for anything besides type-checking. The only exception is integrate(), which multiplies runs by the discretization. How do you use the discretization?
mostly to resample to fit the data in with other time series that have a fixed discretisation.
Perhaps your usage can be accommodated with an extra function that maps from cursor to discretization. I think this is a refinement to the TimeSeries concept, but one which most of the algorithms (besides integrate) won't care about.
Sounds possible. Thanks for taking the time to consider these use cases. Hugo

Hugo Duncan wrote:
Eric,
I'm not 100% sure I understand your use case. But most of the series types and algorithms allow non-discrete sequences. That is, the offsets can be floating point. Could that help? Yes I had seen that, but wasn't sure how it worked for sampled data. In my case I have a multiple time series with a (common) sample time that varies stochastically between 40-60ms. It wasn't clear to me that the offsets could be non-constant stride (whether integer or floating point). Even the sparse series seems to require a constant discretisation. So, you have a discrete series (i.e., values at offsets), but the offsets map to discretizations according to some piecewise function? Did I get that right? That's interesting.
That would be interesting indeed, but no. These are logged data that are being collected from a bus as they arrive. The arrival time period is not a constant.
OK, the time period isn't a constant, but what is it measured in? Milliseconds? Then you can make milliseconds your discretization, and then resample the data at a coarser discretization with the coarse_grain() algorithm. Am I still not understanding your problem? Perhaps it would help if you were more explicit about what data is being pulled from the bus, exactly. Values and time? Time measured how? I'll have to get back to you about the rest. I'm leaving tomorrow for a long weekend. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Eric,
OK, the time period isn't a constant, but what is it measured in? Milliseconds? Then you can make milliseconds your discretization, and then resample the data at a coarser discretization with the coarse_grain() algorithm. Am I still not understanding your problem? Perhaps it would help if you were more explicit about what data is being pulled from the bus, exactly. Values and time? Time measured how?
Light finally goes on! I agree that this is a solution. To do this I use a sparse_series, and each point becomes a run, I believe. (Just for the record, pulling time and values.) Thanks for your patient explanation - I think it was the name "sparse" that was giving me problems, as I was thinking of it as dense with variable discretisation (40-60ms) rather than sparse at constant millisecond discretisation.
I'll have to get back to you about the rest. I'm leaving tomorrow for a long weekend.
Have a good weekend. Hugo

Hugo Duncan wrote:
Eric,
OK, the time period isn't a constant, but what is it measured in? Milliseconds? Then you can make milliseconds your discretization, and then resample the data at a coarser discretization with the coarse_grain() algorithm. Am I still not understanding your problem? Perhaps it would help if you were more explicit about what data is being pulled from the bus, exactly. Values and time? Time measured how?
Light finally goes on! I agree that this is a solution. To do this I use a sparse_series, and each point becomes a run, I believe. (Just for the record, pulling time and values.) Thanks for your patient explanation - I think it was the name "sparse" that was giving me problems, as I was thinking of it as dense with variable discretisation (40-60ms) rather than sparse at constant millisecond discretisation.
I don't think sparse series is the right one for you. A sparse series of value, offset pairs such as the following: (42, 0) (43, 9) (44, 21) ... would have the value 42 at offset 0, 43 at offset 9, 44 at offset 21, *and zero everywhere else*. So at offset 1, the value would be zero. You probably want it to be 42 there also, right? I think what you want is a piecewise constant series of (value, offset, end offset) tuples like this: (42, 0, 9) (43, 9, 21) (44, 21, 30) ... The offsets are the (irregularly spaced) intervals at which you are sampling your bus. Then you can regularize the series by resampling to a coarser discretization; e.g., by taking the value at every 10th offset.
I'll have to get back to you about the rest. I'm leaving tomorrow for a long weekend.
Have a good weekend.
Thanks! I'm off. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

I think what you want is a piecewise constant series of (value, offset, end offset) tuples like this:
(42, 0, 9) (43, 9, 21) (44, 21, 30) ...
yes, that is what I am after. Two observations: One inconvenience of this is that I assume it will always be necessary to have the next sample before inserting the current sample (so the end of the run can be specified), though this is not clear to me from the documentation (do I call inserter(42,0) or inserter(42,0,9)?). And as an inefficiency, each timestamp gets stored twice (again, if I understand correctly). Hugo

Hugo Duncan wrote:
I think what you want is a piecewise constant series of (value, offset, end offset) tuples like this:
(42, 0, 9) (43, 9, 21) (44, 21, 30) ...
yes, that is what I am after. Two observations:
One inconvenience of this is that I assume it will always be necessary to have the next sample before inserting the current sample (so the end of the run can be specified), though this is not clear to me from the documentation (do I call inserter(42,0) or inserter(42,0,9)?).
The documentation is pretty explicit about this. inserter(42,0) inserts a 42 at offset 0 and only at offset 0. See the section "Populating a Series": http://tinyurl.com/gbg8u
And as an inefficiency, each timestamp gets stored twice (again, if I understand correctly).
That is true. It might be useful to define a series type where the end of one run is implicitly the beginning of another. Such a series would easily fit within the framework. Any ideas what such a thing might be called? For the purposes of genericity, the inserter would still need to accept runs in (value, offset, end offset) tuples, though. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

The documentation is pretty explicit about this. inserter(42,0) inserts a 42 at offset 0 and only at offset 0. See the section "Populating a Series": http://tinyurl.com/gbg8u
I read this pretty carefully before, and couldn't reach a definitive conclusion. If I assumed that I could use both insert(value,start) and insert(value,start,stop) for piecewise_constant_series, and could insert runs non-contiguously (as in the examples in the section you reference), then I could not decide if that resulted in a piecewise_constant series that was densely specified between the lowest and highest inserted ordinals, or if it resulted in a sparsely specified series with zeros, in which case I had problems seeing much difference between the piecewise_constant and the sparse series. I tend to be a little obtuse when it comes to inferring semantics from examples. Your statement above is more explicit than the documentation and all is now clear :-)
And as an inefficiency, each timestamp gets stored twice (again, if I understand correctly).
That is true. It might be useful to define a series type where the end of one run is implicitly the beginning of another. Such a series would easily fit within the framework. Any ideas what such a thing might be called?
piecewise_constant_dense_series? (as opposed to the current "piecewise_constant_sparse_series"...)
For the purposes of genericity, the inserter would still need to accept runs in (value, offset, end offset) tuples, though.
Of course...

Hugo Duncan wrote:
The documentation is pretty explicit about this. inserter(42,0) inserts a 42 at offset 0 and only at offset 0. See the section "Populating a Series": http://tinyurl.com/gbg8u
I read this pretty carefully before, and couldn't reach a definitive conclusion.
If I assumed that I could use both insert(value,start) and insert(value,start,stop) for piecewise_constant_series, and could insert runs non-contiguously (as in the examples in the section you reference), then I could not decide if that resulted in a piecewise_constant series that was densely specified between the lowest and highest inserted ordinals, or if it resulted in a sparsely specified series with zeros, in which case I had problems seeing much difference between the piecewise_constant and the sparse series.
I tend to be a little obtuse when it comes to inferring semantics from examples. Your statement above is more explicit than the documentation and all is now clear :-)
OK, thanks for the feedback. I'll add an unambiguous statement to this effect to make it perfectly clear.
And as an inefficiency, each timestamp gets stored twice (again, if I understand correctly). That is true. It might be useful to define a series type where the end of one run is implicitly the beginning of another. Such a series would easily fit within the framework. Any ideas what such a thing might be called?
piecewise_constant_dense_series? (as opposed to the current "piecewise_constant_sparse_series"...)
Interesting observation. Yes, or perhaps dense_piecewise_constant_series. Thanks. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

And if anybody would like to peruse the documentation online, you can find it here:
http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/
I'm curious about the property maps reference in the time series documentation. Boost already has a property maps library (actually just a header file). It's used heavily by Boost.Graph. I'm curious how your interpretation differs and whether or not it's worth trying to merge them or if they're just fundamentally different. Andrew Sutton asutton@cs.kent.edu

Andrew Sutton wrote:
And if anybody would like to peruse the documentation online, you can find it here:
http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/
I'm curious about the property maps reference in the time series documentation. Boost already has a property maps library (actually just a header file). It's used heavily by Boost.Graph. I'm curious how your interpretation differs and whether or not it's worth trying to merge them or if they're just fundamentally different.
The new property map code is actually from Dave A., so perhaps he could say a few words about it. I don't think it's fundamentally different. I know that one of the benefits of the new property map code is that it uses the new concept check library (and the concepts defined therein). -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

on Wed Aug 01 2007, Eric Niebler <eric-AT-boost-consulting.com> wrote:
Andrew Sutton wrote:
And if anybody would like to peruse the documentation online, you can find it here:
http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/
I'm curious about the property maps reference in the time series documentation. Boost already has a property maps library (actually just a header file). It's used heavily by Boost.Graph. I'm curious how your interpretation differs and whether or not it's worth trying to merge them or if they're just fundamentally different.
The new property map code is actually from Dave A., so perhaps he could say a few words about it. I don't think it's fundamentally different.
It was an attempt to evolve the Boost property map concepts, mostly by using function call notation instead of get() and set() functions (or operator[]) as the current property map does. This change was Doug Gregor's idea, and I think it's a good one in principle. I was also trying to build a multi-type concept that bundled the function with the iterators. However, Jeremy Siek pointed out that once you have function call notation, you don't really need creative names like "cursor", because it's just an iterator and a function, for which we already have concepts... and he objected strongly to the reuse of the concept name PropertyMap which already has a different meaning (currently just referring to the role played by the function part of the bundle, but using get/set). I agree with Jeremy, and eventually dropped the use of that name and started to rethink the bundling, but Eric had already picked up the concept checking/traits code from me. I think the concepts worked for Eric so he never felt it was important to rethink them, but given that Boost.PropertyMap already exists, some re-examination would probably be a good idea if TimeSeries is accepted. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Hi, On the boost document page it has entry of the library: C++/Tcl - C++/Tcl is a library that allows the easy integration of C++ and Tcl. If I follow the link of the library, it will direct me to: http://cpptcl.sourceforge.net/ And I can download source code of the library. But one problem I found was that there is a header file "tcl.h", included as header by the source code, and it was missing. So I suppose this C++/Tcl library requires TCL pkg from third party, e.g. ActiveState. But the pkg from ActiveState is not free. I am using C++ in MS VC++ 2005 express to integrate with tcl, and I am looking for tcl library for this. I am not sure what TCL library that the boost C++/Tcl developers and users are using. If somebody could give some comments or suggestions what TCL pkg can be used for the boost C++/Tcl library, it is appreciated. Thank you! Fanzhe
participants (6)
-
Andrew Sutton
-
David Abrahams
-
Eric Niebler
-
Fanzhe Cui
-
Hugo Duncan
-
John Phillips