
2008/5/12, Simonson, Lucanus J <lucanus.j.simonson@intel.com>:
Joachim Faulhaber wrote:
... interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined.
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
If so, I am looking forward to an inspiring discussion.
I found this in your implementation: /// Container type for the implementation typedef typename itl::set<interval_type,exclusive_less,Alloc> ImplSetT; as the underlying data structure for the base type of your interval maps and sets.
This is a coincidence, because I happen to have implemented the same underlying split interval map data structure for use as the scanline data structure for my algorithm that computes the connectivity graph of overlapping planar geometries. This data structure results in O(n) map entries for O(n) overlapping intervals, where each entry has O(n) elements in a set (in the worst case.) That turns out to be pretty suboptimal data structure because it has a storage space complexity O(n^2). A superior data structure for the same would be akin to a 1D R*Tree, center-cutline implementation of a quad-tree or similar. Such a data structure which would store only O(n) elements of constant size and provide log(n) + m (where m is n in the worst case) amortized lookup time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n in the worst case.) Imagine everyone at your example party has overlapping intervals and different arrival and departure times. Looking up who was at the party while a given person was present would require looking at each set of names for each sub-interval where the set of people was different and results in reading the same name many times because it appears in many adjacent sub intervals.
For my application the target use case is two layer connectivity extraction, where the maximum number of polygons (people) in the scanline (party) is bounded at two (pretty dead party) and the complexity of the algorithm based on the split interval map type data structure is optimal. For n overlapping polygons I know the algorithm to be suboptimal, and I know what data structure to apply to fix it. I would be more interested in the optimal data structure than a generic wrapper around this usage of std containers. I'm not sure how having your interval set code would have helped me write my connectivity extraction algorithm using this data structure vs. what I was able to do quite readily with the stl alone.
By the way, a map of sets has poor performance when rebalancing itself. Getting rid of this data structure in my code is on my todo list. Once I have a good replacement (and I have several quad-tree and R*Tree implementations internally and in the open source community to choose from) I would make it generic and provide it as part of my library (in addition to using it where appropriate in my algorithms.) There is also a generic RTree implementation that I have added to the library since submitting it to the vault, but that hasn't been cleared for external publication yet.
I suggest you research R*Trees and related data structure for spatial query and think about how your problem resembles 1D spatial query.
Thanks, Luke
Dear Luke, When I returned yesterday from a very mild Berlin spring evening with some extra grappas consumed I was so happy to find a long answer to my first boost posting that I joyfully conceded my interval_container classes to be suboptimal. Well this was pretty suboptimal with respect to my intention to demonstrate their benefits. Next day I realized that this was an exaggeration. So please let me correct a little;) Seen from the perspective of a special application interval_containers appear to be inferior to classes and algorithms that are tailored for that field. Yet interval_containers are not designed to be optimized to a special sort of application. Interval_sets and interval_maps are intended to be much more general in scope: As stated in my original posting the basic idea is that an interval_set implements a set as a set of intervals. An interval_map implements a map as a map of intervals with associated values. So the character of interval_sets/maps is twofold. You can use them with the same interface as stl::sets/maps. And you can use its interval structure by iterating over them. Or by performing intersection operations. The additional task in implementing interval_sets/maps, compared to common sets/maps, is the handling of overlaps: 1. interval_set: overlapping/bordering intervals are joined. 2. split_interval_set: overlapping/bordering intervals are preserved. simple so far! 3. split_interval_map: overlapping/bordering intervals are also preserved. Then we must decide what to do with the associated values on the overlapping interval ranges. It unfolds quite naturally that the associated values are 'added' on the overlap parts. So their type has to conform concept Addable, more specifically I chose InplaceAddable (+= has to exist) for efficiency reasons.
From this design follows, that a split_interval_map is a map that that does not overwrite values on additional insertion but it accumulates them. All kinds of data_types are accumulated according to their 'InplaceAddability'.
(a) So in the most simple case where data_type is int and we insert only value_pair([a,b),1). The resulting object is an overlap counter. (b) For numeric data_types and arbitrary associated values you are getting a quantity counter. E. g. number of grappas consumed in time intervals by party guests. In the cases (a) and (b) the problem of inefficient worst case storage complexity is not present. (c) Only if data_type is set<T> or some other 'InplaceAddable' container<T> the worst case O(n^2) storage problem can arise. In this case one may resort to a more efficient datastructure or accept that behavior, if e.g. the overlap structure is not excessive. Finally: The problem of inefficient search on the associated values: Since a split_interval_map is basically a map, it has the downside of it's stl brother. Lookup on the data_values is inefficient. So, in the current design efficient search on associated values is not provided. If this is crucial for an application a more specialized class has to be chosen instead of split_interval_map. In other words, like with stl::maps not every type instance of a split_interval_map has optimal efficiency properties for a given context of usage, but others have. The developer has to decide then whether to use a general class or a specialized one. The design of my interval_containers is more oriented on structure and orthogonality than on optimizing for specific applications. Nevertheless your points were very valuable and I will consider them to improve the code. Thank you again Joachim