
Zach Laine wrote:
You misunderstand me. I'm all for DiscretizationType (the template parameter -- yay typesafety), and I'm all for discretization (the value and associated fuctions) for dense series. What I'm against is using discretization (the value) for non-dense series. (Note that we have run into the Discretization/discretization naming ambiguity again here.) I find it a confusing value to keep around, especially since it can be simply ignored, as you pointed out in a previous email. A data value that you can access and that is put forward as a first-class element of the design -- that is also ignored -- suggests a better design is possible. Here's what I suggest:
The Discretization template parameter becomes OffsetType, which is IMO a more accurate name, and follows the RangeRun concepts. I will be using that name below. The "OffsetType discretization" ctor parameter, the "OffsetType discretization()" accessors, and "void discretization(OffsetType d)" mutators should only be applied to the dense_series<> type. The user should be able to access any data in any type exclusively by using offsets. This means that dense_series<> seamlessly handles the mapping between (possibly floating point) offset and sample index I requested initially.
This has these advantages: The notion of discretization is not introduced into types for which it has questionable meaning, e.g. piecewise_constant_series<>. Offsets can be used exclusively, both on input (as before) and output (as before, but now including dense_series<> with floating point offsets). Floating point and integral offsets are now treated much more uniformly.
Is this reasonable? Have I perhaps missed something fundamental?
Reasonable, but problematic for reasons of unclear semantics and sub-par performance. A similar design was considered early on. Semantically, it makes it unclear what indexing into a dense series means. Consider: dense_series<int> d( discretization = 5 ); In your design as I understand, d[0], d[1], d[2], d[3] and d[4] all refer to the same element in the dense series. To me, this interface changes the meaning of a dense series. It is no longer a dense sampling of points at regular intervals. Now it looks piecewise constant. To illustrate the sorts of confusion this can cause, what happens now if I use set_at() to try to change the value at offset 2? It's nonsensical. As another example, say I have a dense series with discretization of 5ms and samples at 0ms, 5ms, and 10ms. Now I also have a sparse series (no discretization) with samples at 0, 5 and 10. Now I add the two. What does that mean? Does the dense series have a value at offset 3? If I understand your design, yes. Does the sparse series? No. Will the result of adding the two? It seems the answer is yes, but it should be no. Discretization for the other series types is not useless. For sparse, for example, it enforces that samples may only exist at offsets that are a multiple of the discretization. There are performance implications too. Indexing into a dense series is currently not much more expensive than indexing into an array. With this change, every access to the dense series would incur an integer division. And traversing a dense series would incur an integer multiplication at each iteration. But the real performance problems creep in when a dense series has a discretization, but other series do not. In the current design, algorithms check the discretization of two series once for compatibility, and then it can be safely ignored for the rest of the computation. Traversing two series in parallel involves conditionally bumping one or both cursors depending on if and how the current runs overlap. Dense and sparse both currently have "indivisible" runs. If they start at the same offset, I know they overlap perfectly and can bump both cursors. If I don't know that both series are using the same discretization, I can't do this, because the notion of indivisibility is intricately tied to discretization. What constitutes an indivisible run for the dense series [0ms,5ms) is different than an indivisible run for sparse series [0ms,1ms). The only way to fix these problems would be to add back a discretization to all the series types. Then the indexing thing (whether it indices are dense and logically multiplied by the discretization, or sparse and actually multiplied) becomes a matter of convention, convenience and efficiency. And on those counts I think the current design is a winner. -- Eric Niebler Boost Consulting www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com