Re: [boost] [time_series] Review

10 Aug 2007

      Steven Watanabe wrote:
...
AMDG
Eric Niebler <eric <at> boost-consulting.com> writes:
...
I don't doubt it. std::map does it for its range-based insert so it must 
be possible. And this may seem like splitting hairs to you, but what I 
said is true -- you can't efficiently build a std::map by pushing in 
sorted data in one element at a time, because the interface doesn't 
allow for it. You have to pass it a range of sorted elements. You may 
say, "Well, then don't use a std::map to build a time series!" Fair 
enough. But when a perfectly satisfactory data structure like std::map 
cannot be adapted to satisfy a concept, it suggests a faulty concept.
Or that the container is not perfectly satisfactory
for the particular task.  std::map is simply not designed
for inserting many elements at the end.  Buffering the
input still increases the overhead significantly because
std::map is not allowed to assume that the input range
is sorted.  It must check.  So using std::map
is worse than a custom container for two reasons--it needs
temporary storage and it must check the range before inserting it.
I agree that with a push-based insertion interface, std::map will be 
worse than a custom type that is designed with a push-based interface in 
mind. The question is whether models of the concept are *required* to 
have a push-based interface, or can they expose an opaque "collector" 
(as you refer to it below) that gives them a more flexible collection 
strategy.
...
...
And (crucially) you would do away with commit() on both ordered_assign 
and ordered_append? Or would you keep them on the low-level interface 
and add a higher level interface that called commit for you and handled 
the 90% case?
I would remove commit and require buffering
to be explicit.
...
The thought I'm not sold on is:
- get rid of commit() on the lower level interface.
I consider immediate insertion lower level
than buffering.
This is misrepresenting my position. Buffering is not inherent in what 
I've been endorsing. The low-level interface I've been suggesting is:

   begin transmission
   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
   end transmission

Whether or not buffering happens during this operation is entirely up to 
the container. This is a very general form for a push-based interface. 
It can accommodate immediate insertion *or* buffering.
...
...
OK. It seems to me that such a thing could be built on top of an 
ordered_append that has a commit(), simply by calling commit() after 
every appended element. Would you agree?
I would prefer to implement the "commit" version
in terms of the plain version.  See below.
...
Anyway, the STL-ish assign(begin,end) doesn't have this problem because 
the end is specified explicitly and it has extra benefits (the size may 
be known), but this doesn't mesh well with the signal processing domain. 
In assign(begin,end), the container *pulls* the data. In signal 
processing, data is often read by polling, and then *pushed* into a 
container, hence the desire for something like a back_inserter.
Just wondering...
Isn't it possible to have a pull model
from the container's side *and* a push model
from the data's side with coroutines?
Yes if we had them. And we do in some sense (see last year's Google 
Summer of Code).
...
...
Time passes ...
If we ditched the assigners and the appenders and went with the 
range-based STL-ish interface, what happens? Any convenient push-based 
interface we build on top would have to resort to sticking in one value 
at a time, just like std::back_inserter. Do you really feel that's an 
improvement? It's an honest question.
Yes I do.
If you want buffering it doesn't have to
be done behind the scenes.  By adjusting the
interface slightly it can be implemented in terms
of concepts that already exist.
typedef get_efficient_collector<series_type>::type collector_type;
collector_type collector;
back_inserter<collector_type> inserter(collector);
for(;;)
    *inserter++ = make_tuple(poll_some_value(), last_time, current_time);
series.assign(std::move(collector));
Precisely. This is how a buffered ordered inserter work today. Now 
consider the case of a generic time series algorithm that needs to 
construct a new series. It doesn't know the optimal way to build the 
series, so it has to ask the series for its "efficient collector". In 
short, the efficient collector is the de-facto standard way to build a 
series generically.

My point is, the phrase "if you want buffering" doesn't make any sense. 
The user (or generic algorithm) doesn't know and shouldn't care what the 
optimal insertion strategy for a given series type is. Only the series 
type knows that. This is why I made the "efficient collector" interface 
the default. Everything else can be built on top.

Let's back up. Why do you believe that

   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)

should be lower-level than

   begin transmission
   back_insert (v1, from1, to1)
   back_insert (v2, from2, to2)
   back_insert (v3, from3, to3)
   end transmission

?

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

The Astoria Seminar ==> http://www.astoriaseminar.com