proposal: interval containers

Dear boost developers, I would like to contribute some library code that covers the field of interval containers, specifically interval_set and interval_map. My design of interval sets is based on the fact that an interval<T> is a set<T>, which implies that a set<T> t can be represented as a set<interval<T>> s, where t is the union over all intervals contained in s. The implementation of a set<T> as a set<interval<T>> can be beneficial whenever in a given problem domain the elements of sets appear in contiguous chunks: intervals. This is obviously the case in many problem domains, namely in fields that deal with problems related to date and time. The benefits of interval sets and maps are 1. The implementation of sets/maps via interval_set/map is very space and also time efficient. 2. All problems concerning overlaps of intervals and their handling can be encapsulated in the implementation and thus ... 3. A higher abstraction level dealing with interval and specifically time interval related problems can be obtained. I wrote a library that provides generic classes for interval_sets and interval_maps and a few more things called "Interval Template Library (itl)". I'd like to give a very brief description of the library in this post. For more detailed information and the current source code you might want to download the itl from http://sourceforge.net/projects/itl. The code contained in the library was originally developed at Cortex Software GmbH Berlin (Clinic Information Systems), where interval containers already proved to be very useful for quite some years in almost all of our applications. In 2006 Cortex Software GmbH allowed the code to be released as open source and I provided a preliminary version at sourceforge. Meanwhile I have refactored the code to enhance it's fitness for generic designs, portable usage and a possible integration in the boost libraries. Since I have not (yet;) found the functionality of the 'itl' in the boost libraries nor a discussion about the rejection of such topics in the boost archives I would like to offer parts of the library here. interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined. split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections. split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example: typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"} As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values). Quality, validation, portability: The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=. The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp. I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at. If so, I am looking forward to an inspiring discussion. Best regards Joachim Faulhaber

Joachim Faulhaber wrote:
... interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined.
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
If so, I am looking forward to an inspiring discussion.
I found this in your implementation: /// Container type for the implementation typedef typename itl::set<interval_type,exclusive_less,Alloc> ImplSetT; as the underlying data structure for the base type of your interval maps and sets. This is a coincidence, because I happen to have implemented the same underlying split interval map data structure for use as the scanline data structure for my algorithm that computes the connectivity graph of overlapping planar geometries. This data structure results in O(n) map entries for O(n) overlapping intervals, where each entry has O(n) elements in a set (in the worst case.) That turns out to be pretty suboptimal data structure because it has a storage space complexity O(n^2). A superior data structure for the same would be akin to a 1D R*Tree, center-cutline implementation of a quad-tree or similar. Such a data structure which would store only O(n) elements of constant size and provide log(n) + m (where m is n in the worst case) amortized lookup time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n in the worst case.) Imagine everyone at your example party has overlapping intervals and different arrival and departure times. Looking up who was at the party while a given person was present would require looking at each set of names for each sub-interval where the set of people was different and results in reading the same name many times because it appears in many adjacent sub intervals. For my application the target use case is two layer connectivity extraction, where the maximum number of polygons (people) in the scanline (party) is bounded at two (pretty dead party) and the complexity of the algorithm based on the split interval map type data structure is optimal. For n overlapping polygons I know the algorithm to be suboptimal, and I know what data structure to apply to fix it. I would be more interested in the optimal data structure than a generic wrapper around this usage of std containers. I'm not sure how having your interval set code would have helped me write my connectivity extraction algorithm using this data structure vs. what I was able to do quite readily with the stl alone. By the way, a map of sets has poor performance when rebalancing itself. Getting rid of this data structure in my code is on my todo list. Once I have a good replacement (and I have several quad-tree and R*Tree implementations internally and in the open source community to choose from) I would make it generic and provide it as part of my library (in addition to using it where appropriate in my algorithms.) There is also a generic RTree implementation that I have added to the library since submitting it to the vault, but that hasn't been cleared for external publication yet. I suggest you research R*Trees and related data structure for spatial query and think about how your problem resembles 1D spatial query. Thanks, Luke

2008/5/12, Simonson, Lucanus J <lucanus.j.simonson@intel.com>:
Joachim Faulhaber wrote:
... interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined.
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
If so, I am looking forward to an inspiring discussion.
dear Luke, thank you for looking at my code. Since this was my first posting in boost it this is kind of adventurous for me.
I found this in your implementation: /// Container type for the implementation typedef typename itl::set<interval_type,exclusive_less,Alloc> ImplSetT; as the underlying data structure for the base type of your interval maps and sets.
This is a coincidence, because I happen to have implemented the same underlying split interval map data structure for use as the scanline data structure for my algorithm that computes the connectivity graph of overlapping planar geometries. This data structure results in O(n) map entries for O(n) overlapping intervals, where each entry has O(n) elements in a set (in the worst case.) That turns out to be pretty suboptimal data structure because it has a storage space complexity O(n^2).
Yes you are right. The implementation of my split_interval_maps is suboptimal. That is where the 'boost' of boost comes in: It brings in broader views, contexts and new requirements. The problem domains in which my interval containers have been used so far (processes related to patient management in hospitals) were all characterized by small numbers of overlaps. Split_interval_maps have been very useful to decompose different histories into a single product history like in the 'history' example included in my library code. In many instances the associated split_interval_map<Time, set<T>> were sets or other containers, but searching was not performed on the sets. In other instances split_interval_map<Time, Quantity> were used as a quantity on overlap aggregator. In all those problem domains the worst case situation that you are addressing was not relevant. For other problem domains the current implementation might not be sufficient.
A superior data structure for the same would be akin to a 1D R*Tree, center-cutline implementation of a quad-tree or similar. Such a data structure which would store only O(n) elements of constant size and provide log(n) + m (where m is n in the worst case) amortized lookup time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n in the worst case.) Imagine everyone at your example party has overlapping intervals and different arrival and departure times. Looking up who was at the party while a given person was present would require looking at each set of names for each sub-interval where the set of people was different and results in reading the same name many times because it appears in many adjacent sub intervals.
For my application the target use case is two layer connectivity extraction, where the maximum number of polygons (people) in the scanline (party) is bounded at two (pretty dead party) and the complexity of the algorithm based on the split interval map type data structure is optimal. For n overlapping polygons I know the algorithm to be suboptimal, and I know what data structure to apply to fix it. I would be more interested in the optimal data structure than a generic wrapper around this usage of std containers. I'm not sure how having your interval set code would have helped me write my connectivity extraction algorithm using this data structure vs. what I was able to do quite readily with the stl alone.
By the way, a map of sets has poor performance when rebalancing itself.
Yes, what I've been thinking about here is that every map<T1, set<T2>> can be represented by an equivalent set<pair<T1,T2>> or multi_map<T1,T2>. In other words maps of sets can be flattened.
Getting rid of this data structure in my code is on my todo list. Once I have a good replacement (and I have several quad-tree and R*Tree implementations internally and in the open source community to choose from) I would make it generic and provide it as part of my library (in addition to using it where appropriate in my algorithms.) There is also a generic RTree implementation that I have added to the library since submitting it to the vault, but that hasn't been cleared for external publication yet.
I suggest you research R*Trees and related data structure for spatial query and think about how your problem resembles 1D spatial query.
I will follow this advice
Thanks, Luke
Thanks, Joachim

2008/5/12, Simonson, Lucanus J <lucanus.j.simonson@intel.com>:
Joachim Faulhaber wrote:
... interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined.
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
If so, I am looking forward to an inspiring discussion.
I found this in your implementation: /// Container type for the implementation typedef typename itl::set<interval_type,exclusive_less,Alloc> ImplSetT; as the underlying data structure for the base type of your interval maps and sets.
This is a coincidence, because I happen to have implemented the same underlying split interval map data structure for use as the scanline data structure for my algorithm that computes the connectivity graph of overlapping planar geometries. This data structure results in O(n) map entries for O(n) overlapping intervals, where each entry has O(n) elements in a set (in the worst case.) That turns out to be pretty suboptimal data structure because it has a storage space complexity O(n^2). A superior data structure for the same would be akin to a 1D R*Tree, center-cutline implementation of a quad-tree or similar. Such a data structure which would store only O(n) elements of constant size and provide log(n) + m (where m is n in the worst case) amortized lookup time instead of worst case log(n) * m1 * m2 lookup time (m1 and m2 are n in the worst case.) Imagine everyone at your example party has overlapping intervals and different arrival and departure times. Looking up who was at the party while a given person was present would require looking at each set of names for each sub-interval where the set of people was different and results in reading the same name many times because it appears in many adjacent sub intervals.
For my application the target use case is two layer connectivity extraction, where the maximum number of polygons (people) in the scanline (party) is bounded at two (pretty dead party) and the complexity of the algorithm based on the split interval map type data structure is optimal. For n overlapping polygons I know the algorithm to be suboptimal, and I know what data structure to apply to fix it. I would be more interested in the optimal data structure than a generic wrapper around this usage of std containers. I'm not sure how having your interval set code would have helped me write my connectivity extraction algorithm using this data structure vs. what I was able to do quite readily with the stl alone.
By the way, a map of sets has poor performance when rebalancing itself. Getting rid of this data structure in my code is on my todo list. Once I have a good replacement (and I have several quad-tree and R*Tree implementations internally and in the open source community to choose from) I would make it generic and provide it as part of my library (in addition to using it where appropriate in my algorithms.) There is also a generic RTree implementation that I have added to the library since submitting it to the vault, but that hasn't been cleared for external publication yet.
I suggest you research R*Trees and related data structure for spatial query and think about how your problem resembles 1D spatial query.
Thanks, Luke
Dear Luke, When I returned yesterday from a very mild Berlin spring evening with some extra grappas consumed I was so happy to find a long answer to my first boost posting that I joyfully conceded my interval_container classes to be suboptimal. Well this was pretty suboptimal with respect to my intention to demonstrate their benefits. Next day I realized that this was an exaggeration. So please let me correct a little;) Seen from the perspective of a special application interval_containers appear to be inferior to classes and algorithms that are tailored for that field. Yet interval_containers are not designed to be optimized to a special sort of application. Interval_sets and interval_maps are intended to be much more general in scope: As stated in my original posting the basic idea is that an interval_set implements a set as a set of intervals. An interval_map implements a map as a map of intervals with associated values. So the character of interval_sets/maps is twofold. You can use them with the same interface as stl::sets/maps. And you can use its interval structure by iterating over them. Or by performing intersection operations. The additional task in implementing interval_sets/maps, compared to common sets/maps, is the handling of overlaps: 1. interval_set: overlapping/bordering intervals are joined. 2. split_interval_set: overlapping/bordering intervals are preserved. simple so far! 3. split_interval_map: overlapping/bordering intervals are also preserved. Then we must decide what to do with the associated values on the overlapping interval ranges. It unfolds quite naturally that the associated values are 'added' on the overlap parts. So their type has to conform concept Addable, more specifically I chose InplaceAddable (+= has to exist) for efficiency reasons.
From this design follows, that a split_interval_map is a map that that does not overwrite values on additional insertion but it accumulates them. All kinds of data_types are accumulated according to their 'InplaceAddability'.
(a) So in the most simple case where data_type is int and we insert only value_pair([a,b),1). The resulting object is an overlap counter. (b) For numeric data_types and arbitrary associated values you are getting a quantity counter. E. g. number of grappas consumed in time intervals by party guests. In the cases (a) and (b) the problem of inefficient worst case storage complexity is not present. (c) Only if data_type is set<T> or some other 'InplaceAddable' container<T> the worst case O(n^2) storage problem can arise. In this case one may resort to a more efficient datastructure or accept that behavior, if e.g. the overlap structure is not excessive. Finally: The problem of inefficient search on the associated values: Since a split_interval_map is basically a map, it has the downside of it's stl brother. Lookup on the data_values is inefficient. So, in the current design efficient search on associated values is not provided. If this is crucial for an application a more specialized class has to be chosen instead of split_interval_map. In other words, like with stl::maps not every type instance of a split_interval_map has optimal efficiency properties for a given context of usage, but others have. The developer has to decide then whether to use a general class or a specialized one. The design of my interval_containers is more oriented on structure and orthogonality than on optimizing for specific applications. Nevertheless your points were very valuable and I will consider them to improve the code. Thank you again Joachim

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Joachim Faulhaber Sent: 12 May 2008 16:36 To: boost@lists.boost.org Subject: [boost] proposal: interval containers
I would like to contribute some library code that covers the field of interval containers, specifically interval_set and interval_map.
My design of interval sets is based on the fact that an interval<T> is a set<T>, which implies that a set<T> t can be represented as a set<interval<T>> s, where t is the union over all intervals contained in s.
The implementation of a set<T> as a set<interval<T>> can be beneficial whenever in a given problem domain the elements of sets appear in contiguous chunks: intervals. This is obviously the case in many problem domains, namely in fields that deal with problems related to date and time.
The benefits of interval sets and maps are 1. The implementation of sets/maps via interval_set/map is very space and also time efficient. 2. All problems concerning overlaps of intervals and their handling can be encapsulated in the implementation and thus ... 3. A higher abstraction level dealing with interval and specifically time interval related problems can be obtained.
I wrote a library that provides generic classes for interval_sets and interval_maps and a few more things called "Interval Template Library (itl)". I'd like to give a very brief description of the library in this post. For more detailed information and the current source code you might want to download the itl from http://sourceforge.net/projects/itl.
A very brief reading of this (html) shows the code and documentation to be in mature shape with some interesting examples, that I feel sure will find application elsewhere. I found interesting the most simple example of people attending a party, and the more complex, attending hospital. (The examples might be more convincing to Boosters if the examples used Boost.TimeDate - to prove that the two work together). As ever, there is a problem connecting with the people who could use this code. You have obviously not found a horde of people desparate to use this ;-) (but perhaps they have already downloaded it and are using it?) I recommend exposing these examples so people don't have download the zip and unpack it before you can read the docs. (Yes I know people are idle - mea culpa). Personally, I think this would be a useful addition to Boost. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

2008/5/16, Paul A Bristow <pbristow@hetp.u-net.com>:
I wrote a library that provides generic classes for interval_sets and interval_maps and a few more things called "Interval Template Library (itl)". I'd like to give a very brief description of the library in this post. For more detailed information and the current source code you might want to download the itl from http://sourceforge.net/projects/itl.
A very brief reading of this (html) shows the code and documentation to be in mature shape with some interesting examples, that I feel sure will find application elsewhere.
I found interesting the most simple example of people attending a party, and the more complex, attending hospital.
(The examples might be more convincing to Boosters if the examples used Boost.TimeDate - to prove that the two work together).
As ever, there is a problem connecting with the people who could use this code. You have obviously not found a horde of people desparate to use this ;-) (but perhaps they have already downloaded it and are using it?)
I recommend exposing these examples so people don't have download the zip and unpack it before you can read the docs. (Yes I know people are idle - mea culpa).
Personally, I think this would be a useful addition to Boost.
Paul
Dear Paul, thank you for your positive feedback and for your advice to make my html documentation more easily accessible. I followed your tip and placed the documentation here: http://www.herold-faulhaber.de/itl/ Within the next days I am going to provide a few more examples that are better integrated with boost libraries specifically boost.date_time. cheers Joachim

----- Original Message ----- From: "Joachim Faulhaber" <afojgo@googlemail.com> To: <boost@lists.boost.org> Sent: Monday, May 12, 2008 5:35 PM Subject: [boost] proposal: interval containers
Dear boost developers,
I would like to contribute some library code that covers the field of interval containers, specifically interval_set and interval_map.
My design of interval sets is based on the fact that an interval<T> is a set<T>, which implies that a set<T> t can be represented as a set<interval<T>> s, where t is the union over all intervals contained in s.
The implementation of a set<T> as a set<interval<T>> can be beneficial whenever in a given problem domain the elements of sets appear in contiguous chunks: intervals. This is obviously the case in many problem domains, namely in fields that deal with problems related to date and time.
Hello, I'm mainly interested in an implementation of set<T> using set<interval<T> and on how it comapres to the stl::set in space and time, when the set are dense or scattered. I have just take a look to the documentation and the implementation, and I have some questions? * Why not follow the STL set and map interfaces as far as possible? * Why do you use virtual functions? Are the classes sets, and interval_base_set public? I think this can and should be avoided. * Does it works with boost::interval?
The benefits of interval sets and maps are 1. The implementation of sets/maps via interval_set/map is very space and also time efficient.
Have you done some messures, in space and time? In the worst case {1, 11, 21...}?
2. All problems concerning overlaps of intervals and their handling can be encapsulated in the implementation and thus ... 3. A higher abstraction level dealing with interval and specifically time interval related problems can be obtained.
I wrote a library that provides generic classes for interval_sets and interval_maps and a few more things called "Interval Template Library (itl)". I'd like to give a very brief description of the library in this post. For more detailed information and the current source code you might want to download the itl from http://sourceforge.net/projects/itl.
The code contained in the library was originally developed at Cortex Software GmbH Berlin (Clinic Information Systems), where interval containers already proved to be very useful for quite some years in almost all of our applications. In 2006 Cortex Software GmbH allowed the code to be released as open source and I provided a preliminary version at sourceforge.
Meanwhile I have refactored the code to enhance it's fitness for generic designs, portable usage and a possible integration in the boost libraries. Since I have not (yet;) found the functionality of the 'itl' in the boost libraries nor a discussion about the rejection of such topics in the boost archives I would like to offer parts of the library here.
Do you plan to boostify your code?
interval_set: itl::interval_set implements all functionality of std::set as a set of intervals and in addition +=, -= and *= for set union, difference and intersection. On insertion of neighbouring or overlapping intervals inserted intervals are joined.
I'm no sure interval_set is a model of Unique Associative Container. Given the following : interval_set<interval<int>> is; is.insert([1,3)); is.insert([3,5)); What is the result of assert(find([1,3))!= end); And of assert(find([1,5))== end); Which operator would you use for set_symmetric_difference? This is a detail but it will be better to provided the function version of this operators as well.
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
Which are the difference between split_interval_set <interval<T> > and set<interval<T> >?
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quite interesting!
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
I'm sure that this library will be a good candidate to boost.
If so, I am looking forward to an inspiring discussion.
Best regards Vicente Botet

2008/5/16, vicente.botet <vicente.botet@wanadoo.fr>:
Hello,
I'm mainly interested in an implementation of set<T> using set<interval<T> and on how it comapres to the stl::set in space and time, when the set are dense or scattered.
I have just take a look to the documentation and the implementation, and I have some questions?
Thank you for looking at my code. I am quite thrilled to get so many good responses. Let me note, before answering your questions, that I followed the boost advice from the web page to post suggestions *before* investing work on adjusting or producing library code to conform the boost standards and requirements, in order to avoid unnecessary work. I know the library might not conform some boost requirements yet, but I am of course happy to adjust the code and design if there is significant interest in my work. To keep the text simple I will talk mostly of sets and interval_sets, when I want to say things about sets/maps and interval_sets/interval_maps because most statements apply to both.
* Why not follow the STL set and map interfaces as far as possible?
I intended to follow the STL and there is a large set of functions and operators common to std::set/map and itl::interval_sets/maps. But there are differences also. (1) One very interesting point is that interval_sets in the design presented do have a twofold character. An interval_set is intended to implement set. Yet at the same time it is a set of intervals. So the question how a function like e.g. find should be implemented has no obvious answer. (2) In my design I am exploiting concepts (Inplace)Addable and (Inplace)Subtractable in many ways. That is I do rely on the existence of a += operator for almost all and -= for many datatypes in my library. So I have added +=, -= and also *= (intersection) to interval_set/map. A propagation of += and -= through container operations is a core element of my design and a base of the useful 'aggregate on overlap' (aggrovering;) mechanics of interval_maps. (see examples party and history from http://sourceforge.net/projects/itl.) I have defined o= operators as memberfunctions because this seemed natural to me. But I think the whole library can be easily refactored, making them global function templates.
* Why do you use virtual functions? Are the classes sets, and interval_base_set public? I think this can and should be avoided.
The original design was classical OO-design, defining an interface class that allows different implementations being used uniformly by a weak polymophic interfacepointer. Furthermore there is a body of baseclass functions that are common to interval_set and split_interval_set. They only vary in the handling of interval borders. This has been expressed by basecalss interval_base_set, interitence and a virtual function 'handle_neighbours'. All those virtual functions can be devirtualized without problems. The interface class templates can be omitted and handle_neighbours might be expressed via a policy template parameter.
* Does it works with boost::interval?
Unfortunately not. Interval containers rely on the notion of open interval bounds. This was necessary to express operations for flotingpoint numbers: interval_set<double> is; is.insert([1,pi)); is.insert([pi,4]); // is=={[1,4]} merging INTENDED is.insert([1,pi)); is.insert((pi,4]); // is=={[1,pi), (pi,4]} must not merge As I've read in the archives an extension for open interval bounds have been suggested for boost::interval in the past.
The benefits of interval sets and maps are 1. The implementation of sets/maps via interval_set/map is very space and also time efficient.
Have you done some messures, in space and time? In the worst case {1, 11, 21...}?
Not yet. Space can be reckoned easily. Wost case is that every interval inserted has exactly one value element x per interval [x,x]. In this case you have double data content and one byte extra for borderinformation so roughly 2 times the space of stl::set. The 'nice' case is that you can represent a set of uncountable values like {[0.0, pi)} quite compact ;)
Do you plan to boostify your code?
I would very much like to boostify the code. I admit thou that I am not yet familiar with all of the boostification ;) I hope guidance will be provided.
I'm no sure interval_set is a model of Unique Associative Container.
Given the following : interval_set<int> is; //not quite: interval_set<interval<int>> is; is.insert([1,3)); is.insert([3,5));
What is the result of assert(find([1,3))!= end); And of assert(find([1,5))== end);
I thought about find more than twice and decided not to provide it. You can say interval_set<int> found; is.intersect(found, rightopen_interval([1,5]); // Intersect to find contents As interval_set<T> implements set<T> if find is implemented it should deliver an iterator pointing to a value, but such iterators are not available. The closest one can provide is an iterator that points to the interval containing the value or end(), if such interval does not exist.
I'm no sure interval_set is a model of Unique Associative Container. No, I'm afraid it is not.
Which operator would you use for set_symmetric_difference? This is a detail but it will be better to provided the function version of this operators as well. don't know, probably function name. Are there boost suggestions for that?
split_interval_map: A split_interval_map implements a map as a map of intervals and associated values. For dealing with the insertion of overlapping intervals the principle of "aggregation on overlap" has proved to be very useful: It is shown by example:
typedef set<string> guests; split_interval_map<time, guests> party; guests mary; mary.insert("Mary"); guests harry; harry.insert("Harry"); party.insert(make_pair(rightopen_interval(20:00, 22:00), mary)); party.insert(make_pair(rightopen_interval(21:00, 23:00), harry)); // party now contains [20:00, 21:00)->{"Mary"} [21:00, 22:00)->{"Harry","Mary"} //guest sets aggregated on overlap [22:00, 23:00)->{"Harry"}
As can be seen from the example a split_interval_map has both a decompositional behavior (on the time dimension) as well as a accumulative one (on the associated values).
Quite interesting!
Quality, validation, portability:
The library code of the itl has been validated with a law based test automaton (LaBatea;) which is included in the current release. It includes a testcase generator and law tester that checks laws like e.g. commutativity<interval_set<double>, +=>: the commutativity of interval_set<double> w.r.t. the operation +=.
The code has been compiled and tested with three compilers (gcc 3.4.4, conceptgcc-boostcon, msvc 8.0 sp1) and with three implementations of the stl (linux/gnu, stlport, msvc8) under suse linux 10.1. and ms-windows xp.
I'd be happy if you found interval conainers as presented in this post an interesting contribution to look at.
I'm sure that this library will be a good candidate to boost.
If so, I am looking forward to an inspiring discussion.
Best regards
Vicente Botet
Thank you for your good questions and valuable feedback on my work. Best regards Joachim Faulhaber

Hello Joachim, ----- Original Message ----- From: "Joachim Faulhaber" <afojgo@googlemail.com> To: <boost@lists.boost.org> Sent: Saturday, May 17, 2008 4:16 AM Subject: Re: [boost] proposal: interval containers
2008/5/16, vicente.botet <vicente.botet@wanadoo.fr>:
Hello,
I'm mainly interested in an implementation of set<T> using set<interval<T> and on how it comapres to the stl::set in space and time, when the set are dense or scattered.
I have just take a look to the documentation and the implementation, and I have some questions?
Thank you for looking at my code. I am quite thrilled to get so many good responses.
You are welcome and thanks for all your explanations. I would like to use a set implemented with interval_set. This was on my todo list for a locally unique identifier library.
Let me note, before answering your questions, that I followed the boost advice from the web page to post suggestions *before* investing work on adjusting or producing library code to conform the boost standards and requirements, in order to avoid unnecessary work.
No problem.
I know the library might not conform some boost requirements yet, but I am of course happy to adjust the code and design if there is significant interest in my work.
To keep the text simple I will talk mostly of sets and interval_sets, when I want to say things about sets/maps and interval_sets/interval_maps because most statements apply to both.
* Why not follow the STL set and map interfaces as far as possible?
I intended to follow the STL and there is a large set of functions and operators common to std::set/map and itl::interval_sets/maps. But there are differences also.
Could you be exhaustive with the functions not implemented? In addition there are the template parameters for comparator and allocator that don't mach.
(1) One very interesting point is that interval_sets in the design presented do have a twofold character. An interval_set is intended to implement set. Yet at the same time it is a set of intervals. So the question how a function like e.g. find should be implemented has no obvious answer.
As the base classes are not public you have the choice between not implementing them, use policies, or even better use mixins, which will give you the impresion of virtual functions.
(2) In my design I am exploiting concepts (Inplace)Addable and (Inplace)Subtractable in many ways. That is I do rely on the existence of a += operator for almost all and -= for many datatypes in my library. So I have added +=, -= and also *= (intersection) to interval_set/map. A propagation of += and -= through container operations is a core element of my design and a base of the useful 'aggregate on overlap' (aggrovering;) mechanics of interval_maps. (see examples party and history from http://sourceforge.net/projects/itl.)
OK, I understand.
I have defined o= operators as memberfunctions because this seemed natural to me. But I think the whole library can be easily refactored, making them global function templates.
Yes, this should be possible.
* Why do you use virtual functions? Are the classes sets, and interval_base_set public? I think this can and should be avoided.
The original design was classical OO-design, defining an interface class that allows different implementations being used uniformly by a weak polymophic interfacepointer.
Do you need this polymorphism now?
Furthermore there is a body of baseclass functions that are common to interval_set and split_interval_set. They only vary in the handling of interval borders. This has been expressed by basecalss interval_base_set, interitence and a virtual function 'handle_neighbours'.
All those virtual functions can be devirtualized without problems. The interface class templates can be omitted and handle_neighbours might be expressed via a policy template parameter.
Please consider the possibility to use mixins.
* Does it works with boost::interval?
Unfortunately not. Interval containers rely on the notion of open interval bounds. This was necessary to express operations for flotingpoint numbers: interval_set<double> is; is.insert([1,pi)); is.insert([pi,4]); // is=={[1,4]} merging INTENDED is.insert([1,pi)); is.insert((pi,4]); // is=={[1,pi), (pi,4]} must not merge
This is not a reason your library should not work for boost::intervals. If interval_set works for any kind of intervals, why should not work for closed intervals?
As I've read in the archives an extension for open interval bounds have been suggested for boost::interval in the past.
It will be a good thing to see how to extend the interval library to manage with open boundaries and continuous semanstics. Why not make a separated proposal for intervals?
The benefits of interval sets and maps are 1. The implementation of sets/maps via interval_set/map is very space and also time efficient.
Have you done some messures, in space and time? In the worst case {1, 11, 21...}?
Not yet. Space can be reckoned easily. Wost case is that every interval inserted has exactly one value element x per interval [x,x]. In this case you have double data content and one byte extra for borderinformation so roughly 2 times the space of stl::set. The 'nice' case is that you can represent a set of uncountable values like {[0.0, pi)} quite compact ;)
You are right. This is the nice feature.
Do you plan to boostify your code?
I would very much like to boostify the code. I admit thou that I am not yet familiar with all of the boostification ;) I hope guidance will be provided.
If I can give you a first guide, follow the directory structure boost/itl for the public files boost/itl/detail for non public files libs/itl/src for the implementation libs/itl/example libs/itl/test This will help a lot on how to read your code.
I'm no sure interval_set is a model of Unique Associative Container.
Given the following : interval_set<int> is; //not quite: interval_set<interval<int>> is; is.insert([1,3)); is.insert([3,5));
What is the result of assert(find([1,3))!= end); And of assert(find([1,5))== end);
I thought about find more than twice and decided not to provide it. You can say interval_set<int> found; is.intersect(found, rightopen_interval([1,5]); // Intersect to find contents
This was a sage decision. I'm not sure which could be consequences to give to the find function a intersection semantic. I need to think more on that.
As interval_set<T> implements set<T> if find is implemented it should deliver an iterator pointing to a value, but such iterators are not available.
First of all I think that interval_set<interval<T>> must not provide the interface of set<T>. I can be used to implement the interface of set<T>.
The closest one can provide is an iterator that points to the interval containing the value or end(), if such interval does not exist.
You can use a use the same technique used by bitset. The iterator is a proxy that contains the iterator to the interval and the current value. Evidently this not work for continuous types.
I'm no sure interval_set is a model of Unique Associative Container. No, I'm afraid it is not.
Which operator would you use for set_symmetric_difference? This is a detail but it will be better to provided the function version of this operators as well. don't know, probably function name. Are there boost suggestions for that?
Maybe ^= could b an option. As is the case for stl::bitset and boost::dynamic_bitset a|b union a&b intersection a^b difference a^b symetric difference ~a not a|=b inplace union a&=b inplace intersection a-=b inplace difference a^=b inplace symetric difference Otherwise you can use /,and /= for symetric difference
split_interval_set: split_interval_set works like interval_set but borders of the inserted intervals are preserved, so it can be used as time grid. Using itl::split_interval_set you can partition other interval containers using the *= operator to perform intersections.
Please could you clarify me. Which are the difference between split_interval_set <interval<T> > and set<interval<T> >? Best regards Vicente Botet

2008/5/17, vicente.botet <vicente.botet@wanadoo.fr>: Hi Vicente, thank you again for your vivid interest in my stuff. Your questions are great, yet there are a lot explanations to give. I am going to answer only one/few question at a time so the postings won't get indigestibly fat.
* Why not follow the STL set and map interfaces as far as possible?
I intended to follow the STL and there is a large set of functions and operators common to std::set/map and itl::interval_sets/maps. But there are differences also.
Could you be exhaustive with the functions not implemented? In addition there are the template parameters for comparator and allocator that don't mach.
I resorted to template template parameters for comparator and allocator when performed code validation with the law based test automaton LaBatea (which might be worth an extra look ;-) One law that I wanted to validate was this: itl::interval_set<T> is homomorphic to itl::set<T>, which is to say both implementations show identical behavior. I implemented such homomorphism laws via a law 'BinaryPushout' see itl\src\validate\laws\pushouts.h in my source code that implements a commuting diagram: (a,b) ---o---> c | | |f |f V V (a',b')---o---> c' One instance of this BinaryPushout is a, b, c: objects of type itl::interval_set<T> a', b', c': objects of type itl::set<T> f: a functor Atomize: itl::interval_set<T> -> itl::set<T> that transforms an interval_set into a set. o: A binary operation (in it's inplace incarnation +=, -=, *=). What all the fuzz is about is simply to say that performing first an operaton += on interval sets and then atomize the result to set, yields the same results as atomizing the the arguments first and then performing += on the atomized sets: Atomize(a o b) == Atomize(a) o Atomize(b) The instance of that BinaryPushout Law looks like that: BinaryPushout < Type, typename Type::atomized_type, Interval::Atomize, inplace_plus
see \itl\src\validate\typevalidater.h line 505 Now, for the instance type Type, which requires to be an interval_container (or at least something that is Atomizable), we do need a derived or associated type typename Type::atomized_type !! interval_set provides that derived type: typedef typename itl::set<DomainT,Compare,Alloc> atomized_type; It passes the Compare and Alloc template tempate right thru to the itl::set and finally to the implementing std::set where it is finally instantiated. I at least had severe difficulties to manage those derived types as long as I worked with instantiated Compare<T> and Alloc<T>. Sorry I can't remember the details. But from the moment I consequently used template template parameters the problems ceased. IMO this is also much more elegant and appropriate. I suspect template template parameters were not available at the time of STL's original design, otherwise they would have been used then ;-) cheers Joachim

----- Original Message ----- From: "Joachim Faulhaber" <afojgo@googlemail.com> To: <boost@lists.boost.org> Sent: Saturday, May 17, 2008 12:08 PM Subject: Re: [boost] proposal: interval containers
2008/5/17, vicente.botet <vicente.botet@wanadoo.fr>:
Hi Vicente,
thank you again for your vivid interest in my stuff. Your questions are great, yet there are a lot explanations to give. I am going to answer only one/few question at a time so the postings won't get indigestibly fat.
* Why not follow the STL set and map interfaces as far as possible?
I intended to follow the STL and there is a large set of functions and operators common to std::set/map and itl::interval_sets/maps. But there are differences also.
Could you be exhaustive with the functions not implemented? In addition there are the template parameters for comparator and allocator that don't mach.
I resorted to template template parameters for comparator and allocator when performed code validation with the law based test automaton LaBatea (which might be worth an extra look ;-)
I will do
One law that I wanted to validate was this: itl::interval_set<T> is homomorphic to itl::set<T>, which is to say both implementations show identical behavior. I implemented such homomorphism laws via a law 'BinaryPushout' see itl\src\validate\laws\pushouts.h in my source code that implements a commuting diagram:
(a,b) ---o---> c | | |f |f V V (a',b')---o---> c'
One instance of this BinaryPushout is a, b, c: objects of type itl::interval_set<T> a', b', c': objects of type itl::set<T> f: a functor Atomize: itl::interval_set<T> -> itl::set<T> that transforms an interval_set into a set. o: A binary operation (in it's inplace incarnation +=, -=, *=).
What all the fuzz is about is simply to say that performing first an operaton += on interval sets and then atomize the result to set, yields the same results as atomizing the the arguments first and then performing += on the atomized sets:
Atomize(a o b) == Atomize(a) o Atomize(b)
The instance of that BinaryPushout Law looks like that:
BinaryPushout < Type, typename Type::atomized_type, Interval::Atomize, inplace_plus
see \itl\src\validate\typevalidater.h line 505
Now, for the instance type Type, which requires to be an interval_container (or at least something that is Atomizable), we do need a derived or associated type typename Type::atomized_type !!
interval_set provides that derived type: typedef typename itl::set<DomainT,Compare,Alloc> atomized_type;
It passes the Compare and Alloc template tempate right thru to the itl::set and finally to the implementing std::set where it is finally instantiated.
This is a good thing. Given an interval set we have an associaated set<T> via its atomized_type. Is the atomized_type a model of UniqueAssociativeContainer?
I at least had severe difficulties to manage those derived types as long as I worked with instantiated Compare<T> and Alloc<T>. Sorry I can't remember the details. But from the moment I consequently used template template parameters the problems ceased. IMO this is also much more elegant and appropriate. I suspect template template parameters were not available at the time of STL's original design, otherwise they would have been used then ;-)
Maybe you are right and it should be like that. In addition I'm not sure allocator are part of the Container concept. Can some one help on this? Which are the allocator related constraints for the Container concept? I suspect that I will need to revise my Locally Unique Identifiers library which has a parameter UniqueAssociativeContainer and use the allocator constructor. Cheers Vicente

2008/5/17, vicente.botet <vicente.botet@wanadoo.fr>:
* Why do you use virtual functions? Are the classes sets, and interval_base_set public? I think this can and should be avoided.
The original design was classical OO-design, defining an interface class that allows different implementations being used uniformly by a weak polymophic interfacepointer.
Do you need this polymorphism now?
No I don't. While it was a good thing to design those basic interfaces as a way of getting aware what the minimal abstract essence of sets and maps are (now done via c++0x-concepts), we did never really use them in the various applications built with interval_containers at Cortex Software. Also within the library such base pointer polymorphism is not used. So it is no problem to remove the interface class templates 'sets' and 'maps' from the design.
* Does it works with boost::interval?
Unfortunately not. Interval containers rely on the notion of open interval bounds. This was necessary to express operations for flotingpoint numbers: interval_set<double> is; is.insert([1,pi)); is.insert([pi,4]); // is=={[1,4]} merging INTENDED is.insert([1,pi)); is.insert((pi,4]); // is=={[1,pi), (pi,4]} must not merge
This is not a reason your library should not work for boost::intervals. If interval_set works for any kind of intervals, why should not work for closed intervals?
The reason is this: interval_set<double> is; is.insert(closed_interval(0.0, 4.0)); // {[0.0, 4.0]} is.subtract(closed_interval(1.0, 3.0)); // {[0.0, 1.0), (3.0, 4.0]} Elements 1.0 thru 3.0 are being subtracted from is. How can this be expressed otherwise? The current implementation uses a borderflipping semantics for interval subtraction or intersection. This works fine for all interval instance types. A special semantics for discrete interval types that are used with closed intervals only can be implemented. Yet the code would suffer from conciseness.
As I've read in the archives an extension for open interval bounds have been suggested for boost::interval in the past.
It will be a good thing to see how to extend the interval library to manage with open boundaries and continuous semanstics. Why not make a separated proposal for intervals?
well ... as a boost newbee I'm not keen on stomping into other peoples gardens. I'd rather offer them a few flowers from mine ;-) cheers Joachim interval container code and documentation see: http://sourceforge.net/projects/itl
participants (5)
-
Joachim Faulhaber
-
Paul A Bristow
-
Simonson, Lucanus J
-
Vicente Botet
-
vicente.botet