Re: [boost] [time_series] Review

12 Aug 2007

      Steven Watanabe wrote:
...
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
...
My point is, the phrase "if you want buffering" doesn't make any sense. 
The user (or generic algorithm) doesn't know and shouldn't care what the 
optimal insertion strategy for a given series type is.
Is there any data structure for which buffering actually helps?
If we're talking about buffering of sorted data being appended to the 
end of a sequence, then I think I agree with you. No *useful* data 
structure comes to mind.

But IMO the problem is larger than "buffering sorted data being appended 
to the end of a sequence." More below.
...
...
Let's back up. Why do you believe that
back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
should be lower-level than
begin transmission
   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
   end transmission
?
It is possible to implement either
in terms of the other so in a sense neither
one is intrinsically "lower-lever."
I disagree with this. If anything non-trivial is going on in commit(), 
then the second version cannot be *efficiently* implemented in terms of 
the first, which means the second is necessarily the more primitive of 
the two.
...
The first is simpler.
Agreed.
...
I do not think the
second gains enough flexibility to justify
the added complexity.
This is a judgment call, and I have done a poor job justifying the added 
complexity.
...
I've just been running
some tests with binary trees and the results
are surprising.  Neither msvc 8.0 nor gcc 3.4
implement the range insertion in linear time
for sorted ranges (In spite of the standard).
Both just do an element by element insert.
Gah! That's terrible. :-(
...
Second, I discovered that the map_back_inserter
I posted is as fast as a buffered version (
the buffering is insignificant, BTW).
Very interesting data points. Thanks for the tests.

The inserter design (with commit()) came out of a design discussion with 
Dave A., which I took the time to review this morning. He was coming at 
the problem from the perspective of MTL4 -- matrices and vectors. For 
MTL4, he wanted generic ways of representing sparse data and of building 
sequences. We wanted a mechanism that was general. He imagined a desire 
to temporarily violate the invariants of a sequence while building it, 
and then using commit() to "make it good".

Some examples of invariant breaking insertions might be inserting data 
that is unsorted, or sorted in reverse order, or inserting at the front 
of a sequence (not necessarily invariant-breaking, but possibly 
inefficient otherwise). For these purposes, there would be a collection 
of inserters, not just the "ordered_inserter" in Time_series, which is a 
rather uninteresting example.

This perhaps explains why we've been talking past each other. I've had 
this desire for extreme generality in the back of my mind. The Sequence 
and RangeRunStorage concepts are general and can be reused in other 
contexts, where alternate representations may be more common and 
pre-sorted data may be less.

Where does this leave inserters and the time series library? Perhaps we 
decide that within the time series domain, inserting pre-sorted data at 
the end of the sequence is the only scenario that matters, and 
everything else is unnecessary complication. For the purpose of 
generality though, the inserter interface should remain on the 
lower-level RangeRunStorage interface. Perhaps for the TimeSeries 
concepts, the inserters are hidden, and push_back How It's Done.

How does this sound to you? Personally, I wasn't ready to say that 
appending pre-sorted data is the only scenario that matters; hence, I 
exposed the more general interface. Perhaps someone more steeped in the 
time series and signal processing domains could weight in.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

The Astoria Seminar ==> http://www.astoriaseminar.com