
AMDG I still need to think a bit more before I vote but here are all the comments I have so far: In the Series Interface section in the user's guide shouldn't series-type( A0 const &, A2 const &, ... ); be prefaced by template<class A0, class A1, ...> ? "It may seem odd that even the unit series have a value parameter..." Wouldn't it be better to use a traits template to determine what "one" should be and require that this template be specialized? If there is no unambiguous unit value for a type then isn't it a little odd to use a unit series? Discretization | Equivalent To daily | mpl::int_<1> ... These are guaranteed not to be just typedefs I hope? You should state that the runtime discretization compatibility check is an assertion as opposed to throwing an exception. You also need to specify exactly what it means for discretizations to be the same--that the discretizations must be the same type (Otherwise I might be inclined to consider 1l and 1 equal) and that operator== must be defined ( Else I would likely try struct my_discretization {}; typedef sparse_series<double, my_discretization> series_type; ). Shouldn't ordered_inserter use the named parameters start, stop, and value? The promotion rules are complicated is there a metafunction that computes the result types or should I just use Boost.Typeof? Is this specialization missing in traits/conversion.hpp template<typename From, typename To> struct is_storage_convertible<scaled_storage_tag<From>, To> : is_storage_convertible<From, To> {}; serialize should use nvp so that it will work with xml archives. The following fails to compile because serialization is not defined for daily: #include <iostream> #include <boost/time_series/sparse_series.hpp> #include <boost/archive/text_oarchive.hpp> typedef boost::time_series::sparse_series<double, boost::time_series::daily> series_1; int main() { series_1 s1; boost::archive::text_oarchive ar(std::cout); ar << s1; } The comparison operators for time_series_base are defined in time_series_facade. Shouldn't they be in utility/time_series_facade.hpp? time_series_facade line 256: template<typename Left, typename Right> bool operator !=(time_series_base<Left> const &left, time_series_base<Right> const &right) { return ! operator ==(left.cast(), right.cast()); } This rules out a member operator==. Since you're explicitly downcasting and invoking ADL I would expect any definition of operator== to work. time_series_facade.hpp line 280 this->sout_ << " value = " << value << ", " << " offset = " << start << ", " << " end offset = " << stop << '\n'; value shouldn't have a space before it should it? time_series_facade.hpp line 395 template<typename Derived, typename Storage, typename Discretization> struct tag<time_series::time_series_facade<Derived, Storage, Discretization> > { typedef time_series_tag type; }; Should this be typedef tag<Derived>::type type;? numeric/numeric.hpp line 3 /// Defined the mathematical operator s on time series -> /// Defines the mathematical operator's on time series The description of series_stl_iterator in the reference has no direct indication of what kind of iterator it is. Is a time series required to inherit from time_series_base? If it is then this should be specified in the concepts somehow. If not then the reference for time_series_base should not say that it is a base for all time_series. The TimeSeries concepts section lists only dense series as models. Should it list the other time_series's to? The RangeRunStorage concept page has an empty See Also section. This is probably a boostbook issue but there are lots of empty namespaces in the reference section. ordered_inserter.hpp line 205 range_run_storage::set_at(this->inserter_, std::make_pair(offset, endoff), value); should this use run_type instead of std::pair? I strongly dislike ordered_inserter. Its name indicates to me that it adds elements to a time_series--not at all that it deletes all pre-existing elements. I would expect it to overwrite the old elements only where a new element is specified. This program compiles but fails at runtime. It should fail at compile time. #include <boost/time_series/sparse_series.hpp> #include <boost/time_series/ordered_inserter.hpp> using namespace boost::time_series; typedef sparse_series<double, int, double> series_1; int main() { series_1 s1; make_ordered_inserter(s1)(1.0, 0.0, 1.0).commit(); } time_series_facade.hpp line 262 #ifdef _MSC_VER BOOST_MSVC? Other places too. Something somewhere is suppressing "warning C4244: 'initializing' : conversion from 'double' to 'int', possible loss of data" and not restoring it but I'm not sure whether it's in your library yet. My IDE (MSVC 8) keeps freezing--Has anyone else noticed this? (It could be caused by code in trunk, too) I ran a quick test of sparse_series<quantity<SI::length> > and it worked fine. What does discretization mean for floating point offsets? For sparse/dense series with floating point offsets indexing doesn't make sense. Probably it should be made illegal somehow.... I would partly agree with Stjepan about the dichotomy between sparse/dense series and piecewise constant series. As I see it there are three cases a) sparse/dense and floating point b) piecewise constant and floating point c) integral Because of the limits on integers piecewise constant series and point series behave the same way for them. Multiplication makes no sense for point series. Likewise adjacent_difference, integrate, piecewise_sample, piecewise_surface_sample, and index. In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
I still need to think a bit more before I vote but here are all the comments I have so far:
In the Series Interface section in the user's guide shouldn't series-type( A0 const &, A2 const &, ... ); be prefaced by template<class A0, class A1, ...> ?
True, thanks.
"It may seem odd that even the unit series have a value parameter..." Wouldn't it be better to use a traits template to determine what "one" should be and require that this template be specialized? If there is no unambiguous unit value for a type then isn't it a little odd to use a unit series?
I don't think it's odd. The example I gave was when value_type is std::vector<int>. What is the unit value for this type? It could be std::vector<int>( N ,1 ) for any value of N.
Discretization | Equivalent To daily | mpl::int_<1> ... These are guaranteed not to be just typedefs I hope?
Correct, they're their own types. But it's come up a lot, and I think it's best to just drop them.
You should state that the runtime discretization compatibility check is an assertion as opposed to throwing an exception. You also need to specify exactly what it means for discretizations to be the same--that the discretizations must be the same type (Otherwise I might be inclined to consider 1l and 1 equal) and that operator== must be defined ( Else I would likely try struct my_discretization {}; typedef sparse_series<double, my_discretization> series_type; ).
The requirement that discretization are equality-comparable is specified as part of the TimeSeries concept: http://boost-sandbox.sourceforge.net/libs/time_series/doc/html/TimeSeries.ht... It would be nice to be more explicit about all this in the users' doc, though.
Shouldn't ordered_inserter use the named parameters start, stop, and value?
A good idea.
The promotion rules are complicated is there a metafunction that computes the result types or should I just use Boost.Typeof?
There is, but there isn't user documentation for it. In the time_series::traits namespace there are templates plus, minus, multiplies and divides which take and return storage tags types. And given a result storage tag type, you can generate a time series using traits::generate_series. It's admittedly a bit complicated, and all the storage tag machinery is an undocumented detail. Do you need this functionality? I could consider simpler ways of getting at this information and documenting it.
Is this specialization missing in traits/conversion.hpp template<typename From, typename To> struct is_storage_convertible<scaled_storage_tag<From>, To> : is_storage_convertible<From, To> {};
No. "scaled_storage_tag" is a very bad name---it is unrelated to the scaled_series, which is probably where you got your idea. It is used to differentiate between normal series and their unit series variants. Those conversions only go one way.
serialize should use nvp so that it will work with xml archives.
Noted.
The following fails to compile because serialization is not defined for daily:
#include <iostream> #include <boost/time_series/sparse_series.hpp> #include <boost/archive/text_oarchive.hpp>
typedef boost::time_series::sparse_series<double, boost::time_series::daily> series_1;
int main() { series_1 s1; boost::archive::text_oarchive ar(std::cout); ar << s1; }
Good catch.
The comparison operators for time_series_base are defined in time_series_facade. Shouldn't they be in utility/time_series_facade.hpp?
Not really, but IMO they shouldn't be defined as they are. These functions require the series types to be derived from time_series_base. Rather, this should be a generalized template, conditionally disabled with enable_if<mpl::and_<is_time_series<Left>, is_time_series<Right>, ...> or something.
time_series_facade line 256: template<typename Left, typename Right> bool operator !=(time_series_base<Left> const &left, time_series_base<Right> const &right) { return ! operator ==(left.cast(), right.cast()); } This rules out a member operator==. Since you're explicitly downcasting and invoking ADL I would expect any definition of operator== to work.
Sure.
time_series_facade.hpp line 280 this->sout_ << " value = " << value << ", " << " offset = " << start << ", " << " end offset = " << stop << '\n'; value shouldn't have a space before it should it?
Probably not.
time_series_facade.hpp line 395 template<typename Derived, typename Storage, typename Discretization> struct tag<time_series::time_series_facade<Derived, Storage, Discretization> > { typedef time_series_tag type; }; Should this be typedef tag<Derived>::type type;?
I don't think so.
numeric/numeric.hpp line 3 /// Defined the mathematical operator s on time series -> /// Defines the mathematical operator's on time series
Should be "operators". Plural, not possessive. Noted.
The description of series_stl_iterator in the reference has no direct indication of what kind of iterator it is.
Drat, Doxygen stripped out the inheritance from iterator_facade that would have shown it to be a forward iterator. But this guy certainly needs some better docs.
Is a time series required to inherit from time_series_base? If it is then this should be specified in the concepts somehow. If not then the reference for time_series_base should not say that it is a base for all time_series.
It's not required to satisfy the concept, but---as you've noticed---some of the function templates are currently over-constrained to only work with types derived from time_series_base. This needs to be fixed.
The TimeSeries concepts section lists only dense series as models. Should it list the other time_series's to?
No, that's just an example. I could list every model of the concept in the library, but I don't think that would add much.
The RangeRunStorage concept page has an empty See Also section.
BoostBook bug. Patches welcome. :-)
This is probably a boostbook issue but there are lots of empty namespaces in the reference section.
BoostBook.
ordered_inserter.hpp line 205 range_run_storage::set_at(this->inserter_, std::make_pair(offset, endoff), value); should this use run_type instead of std::pair?
Possibly. I seem to recall doing this on purpose, but now I can't remember why. :-P I can change it, run the tests, and see what breaks.
I strongly dislike ordered_inserter. Its name indicates to me that it adds elements to a time_series--not at all that it deletes all pre-existing elements. I would expect it to overwrite the old elements only where a new element is specified.
That's a valid criticism. Can you suggest a better name?
This program compiles but fails at runtime. It should fail at compile time.
#include <boost/time_series/sparse_series.hpp> #include <boost/time_series/ordered_inserter.hpp>
using namespace boost::time_series;
typedef sparse_series<double, int, double> series_1;
int main() { series_1 s1; make_ordered_inserter(s1)(1.0, 0.0, 1.0).commit(); }
Floating-point offsets strike again! I'll fix.
time_series_facade.hpp line 262 #ifdef _MSC_VER BOOST_MSVC? Other places too.
I don't know the policy here. I'm disabling msvc-specific warnings. Is there a case when BOOST_MSVC is defined but _MSC_VER is not? And in that case, can I still disable msvc-specific warnings?
Something somewhere is suppressing "warning C4244: 'initializing' : conversion from 'double' to 'int', possible loss of data" and not restoring it but I'm not sure whether it's in your library yet.
I'm usually pretty good about pushing and popping the warning levels. If I missed one, it's a bug and I'll fix it.
My IDE (MSVC 8) keeps freezing--Has anyone else noticed this? (It could be caused by code in trunk, too)
Me too. <sigh>
I ran a quick test of sparse_series<quantity<SI::length> > and it worked fine.
What does discretization mean for floating point offsets?
Same as for integral offsets. It's used to guard against combining series with e.g., data sampled daily vs. weekly, and also as a multiplicative factor for the integrate() algorithm.
For sparse/dense series with floating point offsets indexing doesn't make sense. Probably it should be made illegal somehow....
It is illegal for dense series already. Sparse series and delta series with floating point offsets are currently allowed, but I'm thinking I'll disallow those as well so that I can have half-open floating-point ranges.
I would partly agree with Stjepan about the dichotomy between sparse/dense series and piecewise constant series. As I see it there are three cases a) sparse/dense and floating point
On the chopping block.
b) piecewise constant and floating point c) integral Because of the limits on integers piecewise constant series and point series behave the same way for them. Multiplication makes no sense for point series. Likewise adjacent_difference, integrate, piecewise_sample, piecewise_surface_sample, and index.
I tend to agree, which is why they're not long for this world. Thanks for your very detailed feedback. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Eric Niebler Sent: 08 August 2007 06:01 To: boost@lists.boost.org Subject: Re: [boost] [time_series] Review
time_series_facade.hpp line 262 #ifdef _MSC_VER BOOST_MSVC? Other places too.
I don't know the policy here. I'm disabling msvc-specific warnings. Is there a case when BOOST_MSVC is defined but _MSC_VER is not? And in that case, can I still disable msvc-specific warnings?
Something somewhere is suppressing "warning C4244: 'initializing' : conversion from 'double' to 'int', possible loss of data" and not restoring it but I'm not sure whether it's in your library yet.
I'm usually pretty good about pushing and popping the warning levels. If I missed one, it's a bug and I'll fix it.
Mea culpa! I've had trouble with both of these. Does anyone have any ideas on better ways of avoiding 1 using #ifdef BOOST_MSVC when no Boost module has yet been #included and so is not defined. And so you don't get the effect you wanted and need to #ifdef _MSC_VER instead. Easy mistake to make. Is there still a reason why we need to use BOOST_MSVC? (rather than always using _MSC_VER?) 2 failure to push'n'pop warning suppression? I've spotted several mistakes - and worry about others confusing users, especially C4244. How can we check for this automatically? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Paul A Bristow wrote:
Is there still a reason why we need to use BOOST_MSVC? (rather than always using _MSC_VER?)
Yes: Some compilers (e.g. Intel/Win32) pretend to be MSVC by defining _MSC_VER. Boost.Config defines BOOST_MSVC for the real MSVC only. Regards, Tobias

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Tobias Schwinger Sent: 08 August 2007 11:14 To: boost@lists.boost.org Subject: Re: [boost] BOOST_MSVC versus _MSC_VER
Paul A Bristow wrote:
Is there still a reason why we need to use BOOST_MSVC? (rather than always using _MSC_VER?)
Yes: Some compilers (e.g. Intel/Win32) pretend to be MSVC by defining _MSC_VER. Boost.Config defines BOOST_MSVC for the real MSVC only.
I thought as much... But if we are suppressing unwanted warnings, do the pretender MSVCs behave in exactly the same way and all need the same treatment suppressing unwanted warnings? If this is really true, we need a way of ensuring that BOOST_MSVC is *always defined*, perhaps by calling boost config - though this seems a bit OTT? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

AMDG Paul A Bristow <pbristow <at> hetp.u-net.com> writes:
2 failure to push'n'pop warning suppression? I've spotted several mistakes - and worry about others confusing users, especially C4244. How can we check for this automatically?
We can scan each file for the pragmas and check whether every #pragma warning(suppress) is between a #pragma warning push and a pop I'll look into writing the script and possibly adding it to inspect. What pragmas do other compilers use for suppressing warnings? In Christ, Steven Watanabe

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Watanabe Sent: 08 August 2007 22:49 To: boost@lists.boost.org Subject: Re: [boost] BOOST_MSVC versus _MSC_VER
AMDG
Paul A Bristow <pbristow <at> hetp.u-net.com> writes:
2 failure to push'n'pop warning suppression? I've spotted several mistakes - and worry about others confusing users, especially C4244. How can we check for this automatically?
We can scan each file for the pragmas and check whether every #pragma warning(suppress) is between a #pragma warning push and a pop
I'll look into writing the script and possibly adding it to inspect.
Good - I'm sure this will catch some mistakes that could really confuse users.
What pragmas do other compilers use for suppressing warnings?
I don't know - and hope others can advise, especially the ones cunningly pretending to be msvc ;-) Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Steven Watanabe wrote:
time_series_facade.hpp line 280 this->sout_ << " value = " << value << ", " << " offset = " << start << ", " << " end offset = " << stop << '\n'; value shouldn't have a space before it should it?
Probably not.
Nit-picky, but "offset" and "end offset" probably shouldn't have spaces before them either since the space is with the comma in the previous line, right? --Michael Fawcett

Michael Fawcett wrote:
On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Steven Watanabe wrote:
time_series_facade.hpp line 280 this->sout_ << " value = " << value << ", " << " offset = " << start << ", " << " end offset = " << stop << '\n'; value shouldn't have a space before it should it?
Probably not.
Nit-picky, but "offset" and "end offset" probably shouldn't have spaces before them either since the space is with the comma in the previous line, right?
You're right, of course, but this is really just a debug aid. A general ostream inserter for time series should be more configurable than this. I'm not sure what it should look like exactly, and I'm also not sure if this is a problem the time series library really needs to be solving. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Michael Fawcett wrote:
On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Steven Watanabe wrote:
time_series_facade.hpp line 280 this->sout_ << " value = " << value << ", " << " offset = " << start << ", " << " end offset = " << stop << '\n'; value shouldn't have a space before it should it?
Probably not.
Nit-picky, but "offset" and "end offset" probably shouldn't have spaces before them either since the space is with the comma in the previous line, right?
You're right, of course, but this is really just a debug aid. A general ostream inserter for time series should be more configurable than this. I'm not sure what it should look like exactly, and I'm also not sure if this is a problem the time series library really needs to be solving.
Very reminiscent of the Quan/Units reviews. IMHO, you did the right thing going the Units route by not trying to include any fancy IO mechanisms. --Michael Fawcett

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
I would partly agree with Stjepan about the dichotomy between sparse/dense series and piecewise constant series. As I see it there are three cases a) sparse/dense and floating point
On the chopping block.
b) piecewise constant and floating point c) integral Because of the limits on integers piecewise constant series and point series behave the same way for them. Multiplication makes no sense for point series. Likewise adjacent_difference, integrate, piecewise_sample, piecewise_surface_sample, and index.
I tend to agree, which is why they're not long for this world.
They are still needed for the result of adjacent_difference on piecewise constant floating point series. What do you think of allowing runs to specify what kind of range they use, open, closed or half-open. struct half_open_range : mpl::int_<0> {}; struct rhalf_open_range : mpl::int_<1> {}; // (] struct open_range : mpl::int_<2> {}; struct closed_range : mpl::int_<3> {}; I think that this would allow sparse floating point series to make sense without destroying consistency within the library. In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
I would partly agree with Stjepan about the dichotomy between sparse/dense series and piecewise constant series. As I see it there are three cases a) sparse/dense and floating point On the chopping block.
b) piecewise constant and floating point c) integral Because of the limits on integers piecewise constant series and point series behave the same way for them. Multiplication makes no sense for point series. Likewise adjacent_difference, integrate, piecewise_sample, piecewise_surface_sample, and index. I tend to agree, which is why they're not long for this world.
They are still needed for the result of adjacent_difference on piecewise constant floating point series. What do you think of allowing runs to specify what kind of range they use, open, closed or half-open.
struct half_open_range : mpl::int_<0> {}; struct rhalf_open_range : mpl::int_<1> {}; // (] struct open_range : mpl::int_<2> {}; struct closed_range : mpl::int_<3> {};
I think that this would allow sparse floating point series to make sense without destroying consistency within the library.
I would hesitate to add that level of complexity to the library. It would make it 4x harder to write a correct time series algorithm. I'm sorry that I don't have an answer for you yet about this issue. I agree with the feeling that what is there right now is not quite right. It'll take some rethinking, but I haven't yet heard a solution that appeals to me. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8 Aug 2007, at 16:07, Steven Watanabe wrote:
They are still needed for the result of adjacent_difference on piecewise constant floating point series. What do you think of allowing runs to specify what kind of range they use, open, closed or half-open.
struct half_open_range : mpl::int_<0> {}; struct rhalf_open_range : mpl::int_<1> {}; // (] struct open_range : mpl::int_<2> {}; struct closed_range : mpl::int_<3> {};
I think that this would allow sparse floating point series to make sense without destroying consistency within the library.
Why do you need intervals for sparse series? Sparse series define values only at a finite number of points, and not in intervals? Do you mean piecewise constant series?

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
I strongly dislike ordered_inserter. Its name indicates to me that it adds elements to a time_series--not at all that it deletes all pre-existing elements. I would expect it to overwrite the old elements only where a new element is specified.
That's a valid criticism. Can you suggest a better name?
I can't say that I'm really comfortable with the way ordered_inserter works regardless of the name. I would like it to behave something along the lines of: The inserter puts elements directly in the series eliminating the need for commit. It is an error when the series has elements after the offset of the inserter. An inserter is invalidated when the container is modified without using the iterator. Is there some reason why this doesn't work? If I need the other behavior I can always create a new series fill it and swap myself. In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
I strongly dislike ordered_inserter. Its name indicates to me that it adds elements to a time_series--not at all that it deletes all pre-existing elements. I would expect it to overwrite the old elements only where a new element is specified. That's a valid criticism. Can you suggest a better name?
I can't say that I'm really comfortable with the way ordered_inserter works regardless of the name. I would like it to behave something along the lines of:
The inserter puts elements directly in the series eliminating the need for commit. It is an error when the series has elements after the offset of the inserter. An inserter is invalidated when the container is modified without using the iterator.
It's not exactly what you're asking for, but if you want to modify a time_series in-place, you can do that with range_run_storage::set_at(). It's a terrible way to build a series though, because you need to find the place in the series where this element should be inserted (could be O(log N)) and then insert it (could be O(N)). You seem to be asking for an ordered_append feature. That might be useful to add.
Is there some reason why this doesn't work? If I need the other behavior I can always create a new series fill it and swap myself.
Fill it how? That's exactly what ordered_inserter is for! So you can see, ordered_inserter is necessary. But the name isn't right. The reason for ordered_inserter's existence is that it is the fastest way to build a time_series from scratch, regardless of the internal representation. When you know elements are sorted, you can just bang them into a dense vector, sparse vector, map, or what-have-you -- very efficient. Once you have a time series, there are other ways to manipulate it. The reason for .commit() is that, since you're pushing in one datum at a time, there may be a more efficient intermediate representation than the final time series data structure. For the scenarios I've encountered, being able to efficiently build a series by pushing in one datum at a time is important. The alternative would be use a range-based interface like the STL uses. E.g.: series.assign( from, to ); Here, "from" and "to" can't really be iterators though, or, if they are, they should yield a 3-tuple of (value, offset, end offset). An interface like this is very unwieldy in signal processing, where you often collect data by polling a value at an interval. for(;;) series.push_back(poll_some_value(), last_time, current_time); Of course, push_back() may not be the most efficient way to build up some kinds of data structures. What the ordered_inserter gives is a way have something as efficient as a range-based assign() while retaining the poll-and-push-back convenience. Does that make sense? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
It's not exactly what you're asking for, but if you want to modify a time_series in-place, you can do that with range_run_storage::set_at(). It's a terrible way to build a series though, because you need to find the place in the series where this element should be inserted (could be O(log N)) and then insert it (could be O(N)).
You seem to be asking for an ordered_append feature. That might be useful to add.
Is there some reason why this doesn't work? If I need the other behavior I can always create a new series fill it and swap myself.
Fill it how? That's exactly what ordered_inserter is for! So you can see, ordered_inserter is necessary. But the name isn't right.
I would fill it using ordered_append.
The reason for ordered_inserter's existence is that it is the fastest way to build a time_series from scratch, regardless of the internal representation. When you know elements are sorted, you can just bang them into a dense vector, sparse vector, map, or what-have-you -- very efficient. Once you have a time series, there are other ways to manipulate it.
The reason for .commit() is that, since you're pushing in one datum at a time, there may be a more efficient intermediate representation than the final time series data structure. For the scenarios I've encountered, being able to efficiently build a series by pushing in one datum at a time is important.
Do you have an example where pushing them using a different intermediate representation is significantly more efficient? Even if there is such a case should we really use an interface for insertion that departs from established practice even when it isn't needed? Would it be possible to have a range assign function template in addition to the inserter that takes a sequence of tuples? Could this be as efficient as the current ordered inserter?
The alternative would be use a range-based interface like the STL uses. E.g.:
series.assign( from, to );
Here, "from" and "to" can't really be iterators though, or, if they are, they should yield a 3-tuple of (value, offset, end offset). An interface like this is very unwieldy in signal processing, where you often collect data by polling a value at an interval.
for(;;) series.push_back(poll_some_value(), last_time, current_time);
Of course, push_back() may not be the most efficient way to build up some kinds of data structures. What the ordered_inserter gives is a way have something as efficient as a range-based assign() while retaining the poll-and-push-back convenience.
Does that make sense?
No. std::vector<tuple<...> > elements; for(;;) elements.push_back(make_tuple(poll_some_value(), last_time, current_time)); series.assign(elements); vs. ordered_inserter inserter(series); for(;;) *insert++ = make_tuple(poll_some_value(), last_time, current_time); inserter.commit(); In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
It's not exactly what you're asking for, but if you want to modify a time_series in-place, you can do that with range_run_storage::set_at(). It's a terrible way to build a series though, because you need to find the place in the series where this element should be inserted (could be O(log N)) and then insert it (could be O(N)).
You seem to be asking for an ordered_append feature. That might be useful to add.
Is there some reason why this doesn't work? If I need the other behavior I can always create a new series fill it and swap myself. Fill it how? That's exactly what ordered_inserter is for! So you can see, ordered_inserter is necessary. But the name isn't right.
I would fill it using ordered_append.
How would you suggest ordered_append be implemented?
The reason for ordered_inserter's existence is that it is the fastest way to build a time_series from scratch, regardless of the internal representation. When you know elements are sorted, you can just bang them into a dense vector, sparse vector, map, or what-have-you -- very efficient. Once you have a time series, there are other ways to manipulate it.
The reason for .commit() is that, since you're pushing in one datum at a time, there may be a more efficient intermediate representation than the final time series data structure. For the scenarios I've encountered, being able to efficiently build a series by pushing in one datum at a time is important.
Do you have an example where pushing them using a different intermediate representation is significantly more efficient?
Imagine the series storage is std::map<offset, value>. Inserting one value with insert(value) is O(log N). Inserting N values this way is O(N log N). But if you use insert(begin, end), where begin and end are iterators to sorted data, the insert is faster. O(N). So, if you're collecting data one sample at a time (common in signal processing), a good strategy might be to stuff pair<int,value> into a vector or deque first, then do one O(N) insert into your map at the end. That works out to be 2 O(N) operations, rather than one O(N log N) operation. Better. That's a made up example, though. No series types use std::map as a backing store. Rather, they generally use a sorted vector or something. Except delta, characteristic, heaviside and inverse heaviside which don't need growable storage at all. And some series types might not have any storage at all, but just write values to a stream or to disk or something. The point is, you should be able to efficiently be able to write ordered data to a series without any concern for how that data is being stored, or even if it's being stored.
Even if there is such a case should we really use an interface for insertion that departs from established practice even when it isn't needed? Would it be possible to have a range assign function template in addition to the inserter that takes a sequence of tuples? Could this be as efficient as the current ordered inserter?
Yes, and I think this would be good to have. When you have a sorted range of data, it would be nice to have a clean high-level ordered_assign(series, begin, end) that does the series assignment. It can use the lower-level stuff under the covers. But the lower-level stuff is still needed so that series types can be efficiently built in generic code (like the time series algorithms, for instance), which can't make any assumptions.
The alternative would be use a range-based interface like the STL uses. E.g.:
series.assign( from, to );
Here, "from" and "to" can't really be iterators though, or, if they are, they should yield a 3-tuple of (value, offset, end offset). An interface like this is very unwieldy in signal processing, where you often collect data by polling a value at an interval.
for(;;) series.push_back(poll_some_value(), last_time, current_time);
Of course, push_back() may not be the most efficient way to build up some kinds of data structures. What the ordered_inserter gives is a way have something as efficient as a range-based assign() while retaining the poll-and-push-back convenience.
Does that make sense?
No.
std::vector<tuple<...> > elements; for(;;) elements.push_back(make_tuple(poll_some_value(), last_time, current_time)); series.assign(elements);
Every element is copied twice.
vs.
ordered_inserter inserter(series); for(;;) *insert++ = make_tuple(poll_some_value(), last_time, current_time); inserter.commit();
Every element is copied once. Have I missed your point, or did you just make mine for me? ;-) -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
How would you suggest ordered_append be implemented?
Mostly copied from dense_ordered_inserter: template<typename Value> struct dense_ordered_append_iterator { typedef Value value_type; explicit dense_ordered_append_iterator(dense_array<Value> &that) : new_(&that) {} template<typename R, typename V> void set_at(R &run, V &value) { dense_array<Value> &that = *new_; typename Run<R>::offset_type off = rrs::offset(run); typename Run<R>::offset_type endoff = rrs::end_offset(run); if(off == -inf) { that.pre_.set(run, value); } else if(endoff == inf) { that.post_.set(run, value); } else { that.offset_ = std::min<std::ptrdiff_t>(that.offset_, off); that.reserve(endoff - that.offset_); that.resize(off - that.offset_, implicit_cast<Value const &>(rrs::zero(that))); that.resize(endoff - that.offset_, value); } } private: dense_array<Value>* new_; };
Imagine the series storage is std::map<offset, value>. Inserting one value with insert(value) is O(log N). Inserting N values this way is O(N log N). But if you use insert(begin, end), where begin and end are iterators to sorted data, the insert is faster. O(N).
It is in fact possible to fill a binary tree by pushing sorted data in linear time. struct node { node* right; node* left; node* parent; int value; }; struct map { node* root; }; struct node_inserter { void insert_to_right(int i) { node* result = new node; result->left = result->right = 0; result->parent = current; result->value = i; current->right = result; current = result; } void insert_after_complete_subtree(int i) { node* result = new node; result->parent = current->parent; current->parent->right = result; result->right = 0; result->left = current; result->value = i; current->parent = result; current = result; } void insert_at_root(int i) { node* result = new node; result->left = m->root; result->right = result->parent = 0; result->value = i; current = m->root = result; } void insert(int i) { unsigned long count = size; if(size == 0) { insert_at_root(i); } else if(count % 2 == 0) { insert_to_right(i); } else { count /= 2; while(count % 2 == 1) { count /= 2; current = current->parent; } if(count == 0) { insert_at_root(i); } else { insert_after_complete_subtree(i); } } ++size; } node* current; map* m; unsigned long size; };
The point is, you should be able to efficiently be able to write ordered data to a series without any concern for how that data is being stored, or even if it's being stored.
I don't think you should have *only* a range insert. I think an output iterator that behaves more like std::back_inserter is necessary too.
But the lower-level stuff is still needed so that series types can be efficiently built in generic code (like the time series algorithms, for instance), which can't make any assumptions.
I agree that we need the lower level stuff. I just disagree about *what* lower level stuff.
std::vector<tuple<...> > elements; for(;;) elements.push_back( make_tuple(poll_some_value(), last_time, current_time)); series.assign(elements);
Every element is copied twice.
vs.
ordered_inserter inserter(series); for(;;) *insert++ = make_tuple(poll_some_value(), last_time, current_time); inserter.commit();
Every element is copied once.
Unless you're using a different intermediate representation.
Have I missed your point, or did you just make mine for me?
Actually what I would like most is ordered_inserter inserter(series); for(;;) *insert++ = make_tuple(poll_some_value(), last_time, current_time); With the additional guarantee that you can process the partial results at any time. In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
How would you suggest ordered_append be implemented?
Mostly copied from dense_ordered_inserter:
<snip> Hmm, yes an in-place append would be a good addition. Agreed.
Imagine the series storage is std::map<offset, value>. Inserting one value with insert(value) is O(log N). Inserting N values this way is O(N log N). But if you use insert(begin, end), where begin and end are iterators to sorted data, the insert is faster. O(N).
It is in fact possible to fill a binary tree by pushing sorted data in linear time.
<snip> I don't doubt it. std::map does it for its range-based insert so it must be possible. And this may seem like splitting hairs to you, but what I said is true -- you can't efficiently build a std::map by pushing in sorted data in one element at a time, because the interface doesn't allow for it. You have to pass it a range of sorted elements. You may say, "Well, then don't use a std::map to build a time series!" Fair enough. But when a perfectly satisfactory data structure like std::map cannot be adapted to satisfy a concept, it suggests a faulty concept.
The point is, you should be able to efficiently be able to write ordered data to a series without any concern for how that data is being stored, or even if it's being stored.
I don't think you should have *only* a range insert. I think an output iterator that behaves more like std::back_inserter is necessary too.
An in-place append, yes.
But the lower-level stuff is still needed so that series types can be efficiently built in generic code (like the time series algorithms, for instance), which can't make any assumptions.
I agree that we need the lower level stuff. I just disagree about *what* lower level stuff.
You're suggesting to replace the ordered_inserter (actually an ordered_assign) with an ordered_append? And perhaps also a clear() member? Or keep ordered_assign? And (crucially) you would do away with commit() on both ordered_assign and ordered_append? Or would you keep them on the low-level interface and add a higher level interface that called commit for you and handled the 90% case? The thoughts I've heard that I like so far are: - add an ordered_append - rename ordered_inserter to ordered_assign - add a higher level interface to make them both easier to use The thought I'm not sold on is: - get rid of commit() on the lower level interface. <snip>
Actually what I would like most is
ordered_inserter inserter(series); for(;;) *insert++ = make_tuple(poll_some_value(), last_time, current_time);
With the additional guarantee that you can process the partial results at any time.
OK. It seems to me that such a thing could be built on top of an ordered_append that has a commit(), simply by calling commit() after every appended element. Would you agree? Thought: it occurs to me that commit() is a bad name. It is evocative of transactional semantics, which is not necessarily accurate. What I'm looking for is a way to signal the end of an input sequence, such that *if* the series has decided to buffer the input, it now knows it has received it all. It doesn't *have* to buffer the input, though. Maybe done() or end() or even eof(). Anyway, the STL-ish assign(begin,end) doesn't have this problem because the end is specified explicitly and it has extra benefits (the size may be known), but this doesn't mesh well with the signal processing domain. In assign(begin,end), the container *pulls* the data. In signal processing, data is often read by polling, and then *pushed* into a container, hence the desire for something like a back_inserter. Time passes ... If we ditched the assigners and the appenders and went with the range-based STL-ish interface, what happens? Any convenient push-based interface we build on top would have to resort to sticking in one value at a time, just like std::back_inserter. Do you really feel that's an improvement? It's an honest question. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Eric Niebler wrote:
Thought: it occurs to me that commit() is a bad name. It is evocative of transactional semantics, which is not necessarily accurate. What I'm looking for is a way to signal the end of an input sequence, such that *if* the series has decided to buffer the input, it now knows it has received it all. It doesn't *have* to buffer the input, though. Maybe done() or end() or even eof().
flush()? - Michael Marcin

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
I don't doubt it. std::map does it for its range-based insert so it must be possible. And this may seem like splitting hairs to you, but what I said is true -- you can't efficiently build a std::map by pushing in sorted data in one element at a time, because the interface doesn't allow for it. You have to pass it a range of sorted elements. You may say, "Well, then don't use a std::map to build a time series!" Fair enough. But when a perfectly satisfactory data structure like std::map cannot be adapted to satisfy a concept, it suggests a faulty concept.
Or that the container is not perfectly satisfactory for the particular task. std::map is simply not designed for inserting many elements at the end. Buffering the input still increases the overhead significantly because std::map is not allowed to assume that the input range is sorted. It must check. So using std::map is worse than a custom container for two reasons--it needs temporary storage and it must check the range before inserting it.
And (crucially) you would do away with commit() on both ordered_assign and ordered_append? Or would you keep them on the low-level interface and add a higher level interface that called commit for you and handled the 90% case?
I would remove commit and require buffering to be explicit.
The thought I'm not sold on is:
- get rid of commit() on the lower level interface.
I consider immediate insertion lower level than buffering.
OK. It seems to me that such a thing could be built on top of an ordered_append that has a commit(), simply by calling commit() after every appended element. Would you agree?
I would prefer to implement the "commit" version in terms of the plain version. See below.
Anyway, the STL-ish assign(begin,end) doesn't have this problem because the end is specified explicitly and it has extra benefits (the size may be known), but this doesn't mesh well with the signal processing domain. In assign(begin,end), the container *pulls* the data. In signal processing, data is often read by polling, and then *pushed* into a container, hence the desire for something like a back_inserter.
Just wondering... Isn't it possible to have a pull model from the container's side *and* a push model from the data's side with coroutines?
Time passes ...
If we ditched the assigners and the appenders and went with the range-based STL-ish interface, what happens? Any convenient push-based interface we build on top would have to resort to sticking in one value at a time, just like std::back_inserter. Do you really feel that's an improvement? It's an honest question.
Yes I do. If you want buffering it doesn't have to be done behind the scenes. By adjusting the interface slightly it can be implemented in terms of concepts that already exist. typedef get_efficient_collector<series_type>::type collector_type; collector_type collector; back_inserter<collector_type> inserter(collector); for(;;) *inserter++ = make_tuple(poll_some_value(), last_time, current_time); series.assign(std::move(collector)); In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
I don't doubt it. std::map does it for its range-based insert so it must be possible. And this may seem like splitting hairs to you, but what I said is true -- you can't efficiently build a std::map by pushing in sorted data in one element at a time, because the interface doesn't allow for it. You have to pass it a range of sorted elements. You may say, "Well, then don't use a std::map to build a time series!" Fair enough. But when a perfectly satisfactory data structure like std::map cannot be adapted to satisfy a concept, it suggests a faulty concept.
Or that the container is not perfectly satisfactory for the particular task. std::map is simply not designed for inserting many elements at the end. Buffering the input still increases the overhead significantly because std::map is not allowed to assume that the input range is sorted. It must check. So using std::map is worse than a custom container for two reasons--it needs temporary storage and it must check the range before inserting it.
I agree that with a push-based insertion interface, std::map will be worse than a custom type that is designed with a push-based interface in mind. The question is whether models of the concept are *required* to have a push-based interface, or can they expose an opaque "collector" (as you refer to it below) that gives them a more flexible collection strategy.
And (crucially) you would do away with commit() on both ordered_assign and ordered_append? Or would you keep them on the low-level interface and add a higher level interface that called commit for you and handled the 90% case?
I would remove commit and require buffering to be explicit.
The thought I'm not sold on is:
- get rid of commit() on the lower level interface.
I consider immediate insertion lower level than buffering.
This is misrepresenting my position. Buffering is not inherent in what I've been endorsing. The low-level interface I've been suggesting is: begin transmission back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3) end transmission Whether or not buffering happens during this operation is entirely up to the container. This is a very general form for a push-based interface. It can accommodate immediate insertion *or* buffering.
OK. It seems to me that such a thing could be built on top of an ordered_append that has a commit(), simply by calling commit() after every appended element. Would you agree?
I would prefer to implement the "commit" version in terms of the plain version. See below.
Anyway, the STL-ish assign(begin,end) doesn't have this problem because the end is specified explicitly and it has extra benefits (the size may be known), but this doesn't mesh well with the signal processing domain. In assign(begin,end), the container *pulls* the data. In signal processing, data is often read by polling, and then *pushed* into a container, hence the desire for something like a back_inserter.
Just wondering... Isn't it possible to have a pull model from the container's side *and* a push model from the data's side with coroutines?
Yes if we had them. And we do in some sense (see last year's Google Summer of Code).
Time passes ...
If we ditched the assigners and the appenders and went with the range-based STL-ish interface, what happens? Any convenient push-based interface we build on top would have to resort to sticking in one value at a time, just like std::back_inserter. Do you really feel that's an improvement? It's an honest question.
Yes I do.
If you want buffering it doesn't have to be done behind the scenes. By adjusting the interface slightly it can be implemented in terms of concepts that already exist.
typedef get_efficient_collector<series_type>::type collector_type; collector_type collector; back_inserter<collector_type> inserter(collector); for(;;) *inserter++ = make_tuple(poll_some_value(), last_time, current_time); series.assign(std::move(collector));
Precisely. This is how a buffered ordered inserter work today. Now consider the case of a generic time series algorithm that needs to construct a new series. It doesn't know the optimal way to build the series, so it has to ask the series for its "efficient collector". In short, the efficient collector is the de-facto standard way to build a series generically. My point is, the phrase "if you want buffering" doesn't make any sense. The user (or generic algorithm) doesn't know and shouldn't care what the optimal insertion strategy for a given series type is. Only the series type knows that. This is why I made the "efficient collector" interface the default. Everything else can be built on top. Let's back up. Why do you believe that back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3) should be lower-level than begin transmission back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3) end transmission ? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
My point is, the phrase "if you want buffering" doesn't make any sense. The user (or generic algorithm) doesn't know and shouldn't care what the optimal insertion strategy for a given series type is.
Is there any data structure for which buffering actually helps?
Let's back up. Why do you believe that
back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3)
should be lower-level than
begin transmission back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3) end transmission
?
It is possible to implement either in terms of the other so in a sense neither one is intrinsically "lower-lever." The first is simpler. I do not think the second gains enough flexibility to justify the added complexity. I've just been running some tests with binary trees and the results are surprising. Neither msvc 8.0 nor gcc 3.4 implement the range insertion in linear time for sorted ranges (In spite of the standard). Both just do an element by element insert. Second, I discovered that the map_back_inserter I posted is as fast as a buffered version ( the buffering is insignificant, BTW). In Christ, Steven Watanabe

Steven Watanabe wrote:
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
My point is, the phrase "if you want buffering" doesn't make any sense. The user (or generic algorithm) doesn't know and shouldn't care what the optimal insertion strategy for a given series type is.
Is there any data structure for which buffering actually helps?
If we're talking about buffering of sorted data being appended to the end of a sequence, then I think I agree with you. No *useful* data structure comes to mind. But IMO the problem is larger than "buffering sorted data being appended to the end of a sequence." More below.
Let's back up. Why do you believe that
back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3)
should be lower-level than
begin transmission back_insert (v1, from1, to1) back_insert (v2, from2, to2) back_insert (v3, from3, to3) end transmission
?
It is possible to implement either in terms of the other so in a sense neither one is intrinsically "lower-lever."
I disagree with this. If anything non-trivial is going on in commit(), then the second version cannot be *efficiently* implemented in terms of the first, which means the second is necessarily the more primitive of the two.
The first is simpler.
Agreed.
I do not think the second gains enough flexibility to justify the added complexity.
This is a judgment call, and I have done a poor job justifying the added complexity.
I've just been running some tests with binary trees and the results are surprising. Neither msvc 8.0 nor gcc 3.4 implement the range insertion in linear time for sorted ranges (In spite of the standard). Both just do an element by element insert.
Gah! That's terrible. :-(
Second, I discovered that the map_back_inserter I posted is as fast as a buffered version ( the buffering is insignificant, BTW).
Very interesting data points. Thanks for the tests. The inserter design (with commit()) came out of a design discussion with Dave A., which I took the time to review this morning. He was coming at the problem from the perspective of MTL4 -- matrices and vectors. For MTL4, he wanted generic ways of representing sparse data and of building sequences. We wanted a mechanism that was general. He imagined a desire to temporarily violate the invariants of a sequence while building it, and then using commit() to "make it good". Some examples of invariant breaking insertions might be inserting data that is unsorted, or sorted in reverse order, or inserting at the front of a sequence (not necessarily invariant-breaking, but possibly inefficient otherwise). For these purposes, there would be a collection of inserters, not just the "ordered_inserter" in Time_series, which is a rather uninteresting example. This perhaps explains why we've been talking past each other. I've had this desire for extreme generality in the back of my mind. The Sequence and RangeRunStorage concepts are general and can be reused in other contexts, where alternate representations may be more common and pre-sorted data may be less. Where does this leave inserters and the time series library? Perhaps we decide that within the time series domain, inserting pre-sorted data at the end of the sequence is the only scenario that matters, and everything else is unnecessary complication. For the purpose of generality though, the inserter interface should remain on the lower-level RangeRunStorage interface. Perhaps for the TimeSeries concepts, the inserters are hidden, and push_back How It's Done. How does this sound to you? Personally, I wasn't ready to say that appending pre-sorted data is the only scenario that matters; hence, I exposed the more general interface. Perhaps someone more steeped in the time series and signal processing domains could weight in. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
How does this sound to you? Personally, I wasn't ready to say that appending pre-sorted data is the only scenario that matters; hence, I exposed the more general interface. Perhaps someone more steeped in the time series and signal processing domains could weight in.
If you are collecting data from a number of sources over a network, their time-stamped data can come in out of order. If you know exactly what sort of data to expect, you can wait to get enough to pre-sort and commit to the series, but in some cases either you don't know what sort of data to expect, or things get lost in the network so waiting for something to come is risky. So, appending pre-sorted data may not be the only scenario that matters, if I'm understanding your discussion correctly. Stjepan

on Sun Aug 12 2007, "Stjepan Rajko" <stipe-AT-asu.edu> wrote:
On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
How does this sound to you? Personally, I wasn't ready to say that appending pre-sorted data is the only scenario that matters; hence, I exposed the more general interface. Perhaps someone more steeped in the time series and signal processing domains could weight in.
If you are collecting data from a number of sources over a network, their time-stamped data can come in out of order. If you know exactly what sort of data to expect, you can wait to get enough to pre-sort and commit to the series, but in some cases either you don't know what sort of data to expect, or things get lost in the network so waiting for something to come is risky. So, appending pre-sorted data may not be the only scenario that matters, if I'm understanding your discussion correctly.
Also, if your data is large, there may not be enough memory in the machine to store two copies, in which case presorting isn't an option. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

AMDG Eric Niebler <eric <at> boost-consulting.com> writes:
I disagree with this. If anything non-trivial is going on in commit(), then the second version cannot be *efficiently* implemented in terms of the first, which means the second is necessarily the more primitive of the two.
If anything interesting is going on in commit() then neither one can be implemented efficiently in terms of the other. I was a bit unclear. I thought that buffering was the only reason for using the second form. Thus, I envisioned template<class T> struct get_buffer_type { typedef no_buffer type; }; template<bool> ordered_assign_impl; template<class T> struct ordered_assign : ordered_assign_impl< is_same<typename get_buffer_type<T>::type, no_buffer>::value
::template apply<T> {};
template<> struct ordered_assign_impl<true> { //use push_back }; template<> struct ordered_assign_impl<false> { //use a range assignment. };
The inserter design (with commit()) came out of a design discussion with Dave A., which I took the time to review this morning. He was coming at the problem from the perspective of MTL4 -- matrices and vectors. For MTL4, he wanted generic ways of representing sparse data and of building sequences. We wanted a mechanism that was general. He imagined a desire to temporarily violate the invariants of a sequence while building it, and then using commit() to "make it good".
<snip>
I see. The inserters make a lot more sense in a larger context. In Christ, Steven Watanabe

Steven Watanabe wrote:
The inserter design (with commit()) came out of a design discussion with Dave A., which I took the time to review this morning. He was coming at the problem from the perspective of MTL4 -- matrices and vectors. For MTL4, he wanted generic ways of representing sparse data and of building sequences. We wanted a mechanism that was general. He imagined a desire to temporarily violate the invariants of a sequence while building it, and then using commit() to "make it good".
<snip>
I see. The inserters make a lot more sense in a larger context.
Your vote to reject the time series library was based in part on your feeling that the ordered inserter interface was wrong. Now that I've explained the reasons for the interface, does it change your vote? -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Eric Niebler wrote:
Steven Watanabe wrote:
The inserter design (with commit()) came out of a design discussion with Dave A., which I took the time to review this morning. He was coming at the problem from the perspective of MTL4 -- matrices and vectors. For MTL4, he wanted generic ways of representing sparse data and of building sequences. We wanted a mechanism that was general. He imagined a desire to temporarily violate the invariants of a sequence while building it, and then using commit() to "make it good".
<snip>
I see. The inserters make a lot more sense in a larger context.
Your vote to reject the time series library was based in part on your feeling that the ordered inserter interface was wrong. Now that I've explained the reasons for the interface, does it change your vote?
I'm also interested in your answer to this. John

AMDG John Phillips <phillips <at> delos.mps.ohio-state.edu> writes:
Your vote to reject the time series library was based in part on your feeling that the ordered inserter interface was wrong. Now that I've explained the reasons for the interface, does it change your vote?
I'm also interested in your answer to this.
Yes. It changes my vote. In Christ, Steven Watanabe

On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Here, "from" and "to" can't really be iterators though, or, if they are, they should yield a 3-tuple of (value, offset, end offset). An interface like this is very unwieldy in signal processing, where you often collect data by polling a value at an interval.
for(;;) series.push_back(poll_some_value(), last_time, current_time);
<nit-pick> With the current convention of half-open intervals, wouldn't that be [last_time, current_time), and therefore series[current_time] not be affected by the above insertion? </nit-pick> On a less nit-picky note though, I still can't find a single outside reference in which something that assigns a value to a whole real set interval is called a time series. Eric, you indicated that your choice for using range runs (as opposed to just points, I assume) was that this yielded superior generic algorithms. But in the floating point case, this is causing that most of your structures to represent something that is not really a time series. In rethinking the floating point case, are any of the strategies you are considering looking to put all of your structures in line with what the mathematical notion of what a time series is? In one of your posts, you mentioned something along the lines of making Point concept a first class citizen of the library - IIUTC, that would be a good approach. Furthermore, I think that the RangeRun should be rethought, so that a RangeRun is in effect equivalent to a countable set of Points even in the floating point case (where by "a countable set", I mean "a countable set significantly smaller than the one including every Point indexable by a floating point number between the start offset and end offset"). If not, then I see this as a Time_series+something else library, which is fine. But with time series, I think a continuous interval is much less useful than a way to specify a number of discrete points in an interval. Best regards, Stjepan

Stjepan Rajko wrote:
On 8/8/07, Eric Niebler <eric@boost-consulting.com> wrote:
Here, "from" and "to" can't really be iterators though, or, if they are, they should yield a 3-tuple of (value, offset, end offset). An interface like this is very unwieldy in signal processing, where you often collect data by polling a value at an interval.
for(;;) series.push_back(poll_some_value(), last_time, current_time);
<nit-pick> With the current convention of half-open intervals, wouldn't that be [last_time, current_time), and therefore series[current_time] not be affected by the above insertion? </nit-pick>
Picky, picky. ;-) Yes, you're right.
On a less nit-picky note though, I still can't find a single outside reference in which something that assigns a value to a whole real set interval is called a time series. Eric, you indicated that your choice for using range runs (as opposed to just points, I assume) was that this yielded superior generic algorithms. But in the floating point case, this is causing that most of your structures to represent something that is not really a time series. In rethinking the floating point case, are any of the strategies you are considering looking to put all of your structures in line with what the mathematical notion of what a time series is?
Would you agree that the time series types that use integral offsets are isomorphic to what a time series is, in the mathematical sense? Are there any time series (in the math sense) that are not representable using integral offsets?
In one of your posts, you mentioned something along the lines of making Point concept a first class citizen of the library - IIUTC, that would be a good approach. Furthermore, I think that the RangeRun should be rethought, so that a RangeRun is in effect equivalent to a countable set of Points even in the floating point case (where by "a countable set", I mean "a countable set significantly smaller than the one including every Point indexable by a floating point number between the start offset and end offset"). If not, then I see this as a Time_series+something else library, which is fine. But with time series, I think a continuous interval is much less useful than a way to specify a number of discrete points in an interval.
I'm having a hard time seeing how this is any different that using a series with integral offsets and a floating point discretization. The time series library provides this functionality already. Can you clarify what you're suggesting? I agree that the time series types that use floating point offsets are not very time series-ish in the math sense. But some have expressed the strong opinion during this review that the functionality they provide is useful. I guess what I'm missing is a use case for non-integral offsets. Any reanalysis of what floating point offsets mean has to start there. Is it simply the desire to index into a sequence of points and interpolate between them in some way? If that's the case, then support for floating point offsets can be dropped in favor of a flexible interpolating facade. (IMO, something like that is needed anyway.) If someone really needs a way to say, "This signal really has the value of X in the time interval [Y,Z)" where Y and Z are floating point values, then continuous floating point runs are the way to go. That seems like a reasonable thing to want, even if it doesn't fit the mathematical definition of "time series". -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8/12/07, Eric Niebler <eric@boost-consulting.com> wrote:
Stjepan Rajko wrote:
On a less nit-picky note though, I still can't find a single outside reference in which something that assigns a value to a whole real set interval is called a time series. Eric, you indicated that your choice for using range runs (as opposed to just points, I assume) was that this yielded superior generic algorithms. But in the floating point case, this is causing that most of your structures to represent something that is not really a time series. In rethinking the floating point case, are any of the strategies you are considering looking to put all of your structures in line with what the mathematical notion of what a time series is?
Would you agree that the time series types that use integral offsets are isomorphic to what a time series is, in the mathematical sense? Are
Yes
there any time series (in the math sense) that are not representable using integral offsets?
I think that most time-series, and especially those time series used in practice, could be approximated fairly well using an integral-offset series with the appropriate discretization. The problems are: 1) if you don't know much about the series a priori, you might not know what to set the discretization to initially, and might never get a good idea of what the discretization really should be. 2) say you have a series coming at you, and the time intervals between the samples keep getting smaller and smaller, repetitively beating your discretization no matter how small it is. You would have to keep fine_graining, which doesn't seem efficient. 3) a user simply might not want to use discretization. But this is not really a problem when it comes to your lib, because the sparse series with floating point offsets would do the trick.
In one of your posts, you mentioned something along the lines of making Point concept a first class citizen of the library - IIUTC, that would be a good approach. Furthermore, I think that the RangeRun should be rethought, so that a RangeRun is in effect equivalent to a countable set of Points even in the floating point case (where by "a countable set", I mean "a countable set significantly smaller than the one including every Point indexable by a floating point number between the start offset and end offset"). If not, then I see this as a Time_series+something else library, which is fine. But with time series, I think a continuous interval is much less useful than a way to specify a number of discrete points in an interval.
I'm having a hard time seeing how this is any different that using a series with integral offsets and a floating point discretization. The time series library provides this functionality already. Can you clarify what you're suggesting?
I don't think that the integral offsets + floating point discretization approach always works (mostly given the problems I list above). But again, sparse_series with floating point offsets can be used instead. I gave a slightly more specific example of what I am suggesting in http://tinyurl.com/ywu53v, but also see below.
I agree that the time series types that use floating point offsets are not very time series-ish in the math sense. But some have expressed the strong opinion during this review that the functionality they provide is useful.
Please don't get me wrong - I also think that the floating point offset series are useful as they are. For example, I think it's really useful to be able to multiply a sparse_series with a piecewise_constant_series, to accomplish something like "multiply all the samples in [0, 100) by 10, and all samples in [100, 200) by 20". Also, I agree with Steven in that the floating point offset series can be divided into two categories - sparse/dense (and delta), which are pretty consistent with the mathematical concept of a time series as they are, with the exception of their pre_runs and post_runs, and the rest, which are closer to modeling a piecewise constant function. What keeps nagging me is that all these "others" are not time series. I wouldn't even call them "series", although they are a series of tuples, because they so much better reflect a piecewise constant function. What I do see coming out of the RangeRun concept is a potentially wonderful foundation for Boost.MathFunction - but in order to get there, it would need to grow (for example, somehow supporting all flavors of open/closed/half-open intervals). So, I see most of these floating point. So all these "others", I see in this limbo - they are not time series, but they are useful to have with time series, and they are almost really nice implementations of piecewise constant functions (and with the potential to implement any function I think, using the RangeRun concept) but not quite there either. So what I'm mostly suggesting is: * whatever is supposed to be a time series - make it a true time series. At the end of the day, anything that is a time series should be convertible to a sequence of discrete time points with values attached, and nothing more. Integral offset versions of the series are there. Floating point versions of dense, sparse, and delta series are also there, except for their pre-runs and their post runs. * whatever is not a time-series - call it something else, or make it clear in the documentation that it behaves as something else in certain circumstances (like floating point offsets). I am not disputing the fact that they are useful, and not suggesting they be removed from the library - they are definitely useful in conjunction with time-series. * alternatively - make everything a time series, which would require you to revisit the RangeRun concept so that it is always convertible to a countable set of discrete (value, time) pairs. The utility I see here is the following: from a time series perspective, I can't just say "All samples in [0, 100) have value 10". I have to specify exactly where all these samples lie. Allowing me to do this concisely using a modified RangeRun would be very useful, since I wouldn't have to specify each of the possibly numerous samples separately, nor would they have to be stored separately.
I guess what I'm missing is a use case for non-integral offsets. Any reanalysis of what floating point offsets mean has to start there. Is it
I hope the above makes a case for some of that. I think in a lot of cases, users will not want to deal with discretization or any involved transforms and just want to use their (value, time) pairs as they are.
simply the desire to index into a sequence of points and interpolate between them in some way? If that's the case, then support for floating point offsets can be dropped in favor of a flexible interpolating facade. (IMO, something like that is needed anyway.)
I have to think about that... I wasn't thinking about that case.
If someone really needs a way to say, "This signal really has the value of X in the time interval [Y,Z)" where Y and Z are floating point values, then continuous floating point runs are the way to go. That seems like a reasonable thing to want, even if it doesn't fit the mathematical definition of "time series".
It is a *very* reasonable thing to want. And it fits the mathematical definition of a function with a domain in the real numbers very well ;-) Best regards, Stjepan

AMDG Steven Watanabe <watanabesj <at> gmail.com> writes:
I still need to think a bit more before I vote
Ok. I vote to reject. ordered_inserter is wrong. Floating point offsets need to be fixed. If these are fixed or if I can be convinced that they are not problems then I would change my vote to a yes. I have made suggestions for both of these, but I am not sure that they work as I haven't tried them yet. A little more about floating point offsets: I do not think that floating point offsets can just be eliminated. The piecewise constant forms behave much like integral offsets. As I mentioned in another post sparse series is needed if the piecewise constant series are available. Also, I gave a whole list of algorithms that don't work for sparse floating point sparse series but on the other side of the coin partial_sum makes sense for them but not for piecewise constant series. If point series cannot be made to fit properly within the concepts then the concepts need to be adjusted. Finally, I am not knowledgeable at all about the problem domain. Therefore, I am not qualified to express an opinion about the usefulness of the library. In Christ, Steven Watanabe
participants (11)
-
David Abrahams
-
Eric Niebler
-
John Phillips
-
Matthias Troyer
-
Michael Fawcett
-
Michael Marcin
-
Paul A Bristow
-
Steven Watanabe
-
Steven Watanabe
-
Stjepan Rajko
-
Tobias Schwinger