Interval Trees & ICL

Hi, I have been working on a problem for which a suitable solution is an efficient container of intervals. The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition. For the case where there is a potentially large number of overlapping intervals a split_interval_set will be less efficient than an interval tree. In the worst case intervals would be split down to unit size. Searching the archives I see there was some discussion of interval trees in the context of ICL but the additional of an interval tree container did not, as far as I can tell, come up (perhaps because it should first appear in a boost release before there is any consideration of adding to it?). I have a few questions: 1. Is there a specific name for the data structures provided by the ICL? I have searched the literature and cannot find any mention of sets that split or combine intervals this way. It is either a genuinely novel concept, which I find difficult to believe, or I haven't used the correct search terms (interval, extent or segment). Either way, I would like to update wikipedia appropriately to help other lost souls. 2. Are there any compelling reasons why interval trees are less useful than the containers provided by the ICL? 3. Perhaps based on the answer to 2. Would interval trees be a sensible addition to the ICL? 4. Are there any other classes of interval container worthy of mention? Regards, Bruce.

Hi Bruce 2010/12/14 Bruce Adams <tortoise_74@yahoo.co.uk>:
Hi, I have been working on a problem for which a suitable solution is an efficient container of intervals.
The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library. For certain use cases they perform better than the interval container currently available in the ICL.
For the case where there is a potentially large number of overlapping intervals a split_interval_set will be less efficient than an interval tree. In the worst case intervals would be split down to unit size.
That depends what you intend to do with your interval container. One of the fundamental ideas for containers of the ICL was that of a compact and minimal representation of sets or maps using intervals. An icl::interval_set for example instantly joins inserted intervals if they overlap and thus always stays in a minimal form. Contrary to an interval tree, it forgets the intervals that have been inserted into it and it can not answer stabbing queries anymore.
Searching the archives I see there was some discussion of interval trees in the context of ICL but the additional of an interval tree container did not, as far as I can tell, come up (perhaps because it should first appear in a boost release before there is any consideration of adding to it?).
There was an agreement among the reviewers that the library in the current from is very useful, and efficient in many use cases, so it should be accepted into boost. I have agreed to integrate an interval tree implementation in the further evolution of the library. But it might also be contributed by others.
I have a few questions:
1. Is there a specific name for the data structures provided by the ICL?
As already mentioned, a basic idea for the library was to implement just sets and maps in a compact form exploiting the the fact, that elements occur in contiguous chunks: intervals. So you could say the ICL implements sets and maps. They are called interval_sets and interval_maps just as with bitsets the prefix bit... points to the way or technique the sets are implemented. I am not aware of a more specific name for this class of data structures other than interval_set and interval_map which I find quite natural.
I have searched the literature and cannot find any mention of sets that split or combine intervals this way. It is either a genuinely novel concept, which I find difficult to believe, or I haven't used the correct search terms (interval, extent or segment). Either way, I would like to update wikipedia appropriately to help other lost souls.
I admit, I never did an in-depth literature search to find out the "right" name for what I am doing. If you find out what it is, let me know ;-)
2. Are there any compelling reasons why interval trees are less useful than the containers provided by the ICL?
I don't say interval trees are less useful than icl containers. They are different data structures with different strenghts and weaknesses.
3. Perhaps based on the answer to 2. Would interval trees be a sensible addition to the ICL?
Yes.
4. Are there any other classes of interval container worthy of mention?
Roughly... (1) A flat_interval_set/map that is implemented using a vector can be more efficient, if there are mainly lookups and less inserts/deletes on the object, after is has been created. (2) Another interesting implementation of interval_sets/maps is a tree that only stores the beginning points of intervals. The interval_set/map is always infinite starting with (-inf, +inf) and gets only split by the insertion of elements. Cheers Joachim -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de/

On Wed, Dec 15, 2010 at 12:49 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library. For certain use cases they perform better than the interval container currently available in the ICL.
In the cases where I've needed them, only a container that preserved the inserted intervals (and information about overlaps) would do (I was storing musical notes, FWIW) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2010/12/15 Dave Abrahams <dave@boostpro.com>:
On Wed, Dec 15, 2010 at 12:49 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library. For certain use cases they perform better than the interval container currently available in the ICL.
In the cases where I've needed them, only a container that preserved the inserted intervals
interesting
(and information about overlaps)
isn't this the hairy part?
would do (I was storing musical notes, FWIW)
Cheers, Joachim

At Wed, 15 Dec 2010 21:34:44 +0100, Joachim Faulhaber wrote:
2010/12/15 Dave Abrahams <dave@boostpro.com>:
On Wed, Dec 15, 2010 at 12:49 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library. For certain use cases they perform better than the interval container currently available in the ICL.
In the cases where I've needed them, only a container that preserved the inserted intervals
interesting
(and information about overlaps)
isn't this the hairy part?
Not too hairy; an interval tree I made by modifying my STL's rbtree implementation worked just fine. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2010/12/15 Dave Abrahams <dave@boostpro.com>:
At Wed, 15 Dec 2010 21:34:44 +0100, Joachim Faulhaber wrote:
2010/12/15 Dave Abrahams <dave@boostpro.com>:
On Wed, Dec 15, 2010 at 12:49 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The two main candidates would be something like to boost::split_interval_set or a centred interval tree (along the lines of http://en.wikipedia.org/wiki/Interval_tree). It seems to me that the characteristics of an interval tree differ sufficiently from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library. For certain use cases they perform better than the interval container currently available in the ICL.
In the cases where I've needed them, only a container that preserved the inserted intervals
interesting
(and information about overlaps)
isn't this the hairy part?
Not too hairy; an interval tree I made by modifying my STL's rbtree implementation worked just fine.
so why not add your interval tree implementation to Interval Containers, if it is readily available? Joachim -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de/

At Thu, 16 Dec 2010 00:25:37 +0100, Joachim Faulhaber wrote:
2010/12/15 Dave Abrahams <dave@boostpro.com>:
At Wed, 15 Dec 2010 21:34:44 +0100, Joachim Faulhaber wrote:
Not too hairy; an interval tree I made by modifying my STL's rbtree implementation worked just fine.
so why not add your interval tree implementation to Interval Containers, if it is readily available?
That code was written before Boost even existed, I think, and is owned by my employer-at-the-time. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2010/12/16 Dave Abrahams <dave@boostpro.com>:
At Thu, 16 Dec 2010 00:25:37 +0100, Joachim Faulhaber wrote:
2010/12/15 Dave Abrahams <dave@boostpro.com>:
At Wed, 15 Dec 2010 21:34:44 +0100, Joachim Faulhaber wrote:
Not too hairy; an interval tree I made by modifying my STL's rbtree implementation worked just fine.
so why not add your interval tree implementation to Interval Containers, if it is readily available?
That code was written before Boost even existed, I think, and is owned by my employer-at-the-time.
:( So, what would be a good way of doing this? Instead of implementing a specific augmentation modifying an existing rbtree, I think it'd be more fun, to write an augmented_rbtree on the basis of a good rbtree implementation, and then write the interval_tree as an instance of augmented_rbtree. The augmented_rbtree would have a template parameter typename A, for the sythesized attribute, the augmentation and a functor F to compute the augmented node attribute a = F()(left, x, right). This way it might be possible to write a template for a class of augmented trees, that contains the interval_tree as a special case but can be used for other problems as well. Joachim -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

On Thu, Dec 16, 2010 at 6:20 AM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
I think it'd be more fun, to write an augmented_rbtree on the basis of a good rbtree implementation, and then write the interval_tree as an instance of augmented_rbtree
Go for it, brother. Fun is what it's all about here at Boost! -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2010/12/16 Dave Abrahams <dave@boostpro.com>:
On Thu, Dec 16, 2010 at 6:20 AM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
I think it'd be more fun, to write an augmented_rbtree on the basis of a good rbtree implementation, and then write the interval_tree as an instance of augmented_rbtree
Go for it, brother. Fun is what it's all about here at Boost!
you're boosting my motivation :D -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

Bruce Adams wrote:
Hi, I have been working on a problem for which a suitable solution is an efficient container of intervals.
Hi everyone, Bruce's post got me thinking about this problem again. Of course an interval tree, such as one based on a red-black tree as Dave suggests, would be a fantastic thing to have in Boost [1]. However something I think would be even better would be some sort of "interval adaptor" that would allow some existing non-interval container to be efficiently adapted to store intervals. This could be used on top of a std::map, but the reason why I'd like it is that it could also be used with something like Beman's proposed b-tree[2]. But how to do it? I've used a few of my insomniac brain-cycles thinking about this, and I may have finally come up with a solution. My idea is to consider an interval (low,high) as a point in a 2D plain. Then, a query to find all of the intervals overlapping (a,b) is the same as the query to find all the points in the quarter-plane where low<b and high>a. Any 2D data structure could be used to store these (e.g. quadtrees etc), but the aim here is to adapt a 1D data structure. I have used the Z-curve method [3] to do this efficiently for some years now and it works well. The idea is that you bitwise-interleave the two values and use this as the key. You can then iterate over all points in a rectangle by iterating over all points in the 1D range between the interleaved bottom left and top right corner values. A restriction is that the types must be suitable for this bitwise interleaving, i.e. floating point types don't work. It's unlikely that I am the first to think of this, but I've not found yet any references. Perhaps I'm searching for the wrong things. Any thoughts anyone? Regards, Phil. [1] For the record, there is a C++ red-black interval tree here: http://cs.montana.edu/~bohannan Boost people probably wouldn't find this useful in its present form, but it could be useful for study. [2] http://mysite.verizon.net/beman/btree/index.html [3] http://en.wikipedia.org/wiki/Z-order_(curve)

2010/12/16 Phil Endecott <spam_from_boost_dev@chezphil.org>:
My idea is to consider an interval (low,high) as a point in a 2D plain. Then, a query to find all of the intervals overlapping (a,b) is the same as the query to find all the points in the quarter-plane where low<b and high>a.
Any 2D data structure could be used to store these (e.g. quadtrees etc), but the aim here is to adapt a 1D data structure. I have used the Z-curve method [3] to do this efficiently for some years now and it works well. The idea is that you bitwise-interleave the two values and use this as the key. You can then iterate over all points in a rectangle by iterating over all points in the 1D range between the interleaved bottom left and top right corner values.
Hi Phil, I find your idea quite nifty and fascinating. The transformation from intervals to 2D plane to z-curve ordering is quite powerful and opens up new frames of thinking on the problem. I guess I need a little more thinking time and intend to post another answer tomorrow. Best regards, Joachim -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

----- Original Message ----
From: Joachim Faulhaber <afojgo@googlemail.com> To: boost@lists.boost.org Sent: Fri, December 17, 2010 12:32:48 AM Subject: Re: [boost] Interval Trees & ICL
2010/12/16 Phil Endecott <spam_from_boost_dev@chezphil.org>:
My idea is to consider an interval (low,high) as a point in a 2D plain. Then, a query to find all of the intervals overlapping (a,b) is the same as the query to find all the points in the quarter-plane where low<b and high>a.
Any 2D data structure could be used to store these (e.g. quadtrees etc), but the aim here is to adapt a 1D data structure. I have used the Z-curve method [3] to do this efficiently for some years now and it works well. The idea is that you bitwise-interleave the two values and use this as the key. You can then iterate over all points in a rectangle by iterating over all points in the 1D range between the interleaved bottom left and top right corner values.
Hi Phil,
I find your idea quite nifty and fascinating. The transformation from intervals to 2D plane to z-curve ordering is quite powerful and opens up new frames of thinking on the problem. I guess I need a little more thinking time and intend to post another answer tomorrow.
Best regards, Joachim
-- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de
That is an interesting idea. However, I see the method (or at least some implementations of it) are covered by a patent. Presumably that could affect its adoption into boost? I'm broadly against software patents but things get hairy if you give the lawyers a hook to fish with. Actually digging a bit further it could be that the patent is to prevent evil lawyers doing this as the claim is by an academic rather than say Oracle (unless he works for Oracle!) Looks like enough of it is prior art that something could be used. I note that the current interval containers could also be considered / are implemented as adapters fitting over sets or maps so this would be a good fit. It might thus be quicker to integrate. Regards, Bruce.

2010/12/17 Bruce Adams <tortoise_74@yahoo.co.uk>:
----- Original Message ----
From: Joachim Faulhaber <afojgo@googlemail.com> To: boost@lists.boost.org Sent: Fri, December 17, 2010 12:32:48 AM Subject: Re: [boost] Interval Trees & ICL
2010/12/16 Phil Endecott <spam_from_boost_dev@chezphil.org>:
My idea is to consider an interval (low,high) as a point in a 2D plain. Then, a query to find all of the intervals overlapping (a,b) is the same as the query to find all the points in the quarter-plane where low<b and high>a.
Any 2D data structure could be used to store these (e.g. quadtrees etc), but the aim here is to adapt a 1D data structure. I have used the Z-curve method [3] to do this efficiently for some years now and it works well. The idea is that you bitwise-interleave the two values and use this as the key. You can then iterate over all points in a rectangle by iterating over all points in the 1D range between the interleaved bottom left and top right corner values.
Hi Phil,
I find your idea quite nifty and fascinating. The transformation from intervals to 2D plane to z-curve ordering is quite powerful and opens up new frames of thinking on the problem. I guess I need a little more thinking time and intend to post another answer tomorrow.
After I have spent some more time studying your proposal, I still think it is a very smart method and a good alternative to the "classical" interval tree implementation suggested by Dave. The restriction imposed by bit-interleaving on the applicable element types is justified if the data structure gains efficiency. Using meta programming we can choose the best implementation dependent on the element type. More unpleasant is the fact that an important part of the method, the efficient search of the z-ordered values within the quarter-plane defined by the query interval seems to be protected by a software patent as Bruce pointed out. But may be you have a different implementation that circumvents this obstacle.
That is an interesting idea. However, I see the method (or at least some implementations of it) are covered by a patent. Presumably that could affect its adoption into boost?
I'm broadly against software patents
+1 Best regards, Joachim -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

On Fri, Dec 17, 2010 at 12:31 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The restriction imposed by bit-interleaving on the applicable element types is justified if the data structure gains efficiency. Using meta programming we can choose the best implementation dependent on the element type.
I think you mean the key type? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

2010/12/18 Dave Abrahams <dave@boostpro.com>:
On Fri, Dec 17, 2010 at 12:31 PM, Joachim Faulhaber <afojgo@googlemail.com> wrote:
The restriction imposed by bit-interleaving on the applicable element types is justified if the data structure gains efficiency. Using meta programming we can choose the best implementation dependent on the element type.
I think you mean the key type?
In terms of intervals as a set of elements: element_type. In terms of a 2D plane: coordinate_type. In terms of a sorted container: key_type. Since Phil started with intervals I chose "element_type" (of the interval). The key values of Phils interval_adaper would be z-values which differ from the element_type of the intervals as they have double length due to bit-interleaving. -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

Hi Bruce, Bruce Adams wrote:
That is an interesting idea. However, I see the method (or at least some implementations of it) are covered by a patent.
I don't look at patents. Everything I've described is either my own (re-)invention or taken from the 1981 paper "Multidimensional Range Search in Dynamically Balanced Trees" by Tropf and Herzog that is linked from the Z-curve Wikipedia page.
Presumably that could affect its adoption into boost?
I doubt it. Regards, Phil.

2010/12/18 Phil Endecott <spam_from_boost_dev@chezphil.org>:
Hi Bruce,
Bruce Adams wrote:
That is an interesting idea. However, I see the method (or at least some implementations of it) are covered by a patent.
I don't look at patents. Everything I've described is either my own (re-)invention or taken from the 1981 paper "Multidimensional Range Search in Dynamically Balanced Trees" by Tropf and Herzog that is linked from the Z-curve Wikipedia page.
Can something that is published by Tropf 1981 be patented by Tropf 2004? -- Interval Container Library [Boost.Icl] http://www.joachim-faulhaber.de

Joachim Faulhaber wrote:
Can something that is published by Tropf 1981 be patented by Tropf 2004?
IANAL How about this scenario (purely hypothetical): Tropf applied for a patent in 1980. The patent application took longer than expected, and Tropf published something more or less closely related in 1981. The patent was finally granted in 2004. So it looks to me as if under these circumstances, the patent would be valid. The publication from 1981 would not qualify as prior art, not because it is written by the same author, but because it was published later. But I have to admit that it occurs unlikely to me that it took 20 years until a patent was granted. Regards, Thomas

On Mon, 2010-12-20 at 17:24 +0100, Thomas Klimpel wrote:
Joachim Faulhaber wrote:
Can something that is published by Tropf 1981 be patented by Tropf 2004?
IANAL
How about this scenario (purely hypothetical): Tropf applied for a patent in 1980. The patent application took longer than expected, and Tropf published something more or less closely related in 1981. The patent was finally granted in 2004.
If we're talking about patent number 7321890 (which was referenced on the aforementioned Wikipedia page), Google says: Filing date: Feb 18, 2004 Issue date: Jan 22, 2008 Application number: 10/781,488 The patent has only 6 claims, you'd probably have to read them and the earlier paper to figure out what the differences are between the earlier paper and the later patent. Or, you could write Tropf and ask him ;) -Hal
So it looks to me as if under these circumstances, the patent would be valid. The publication from 1981 would not qualify as prior art, not because it is written by the same author, but because it was published later. But I have to admit that it occurs unlikely to me that it took 20 years until a patent was granted.
Regards, Thomas _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (6)
-
Bruce Adams
-
Dave Abrahams
-
Hal Finkel
-
Joachim Faulhaber
-
Phil Endecott
-
Thomas Klimpel