Geometry and spatial indexes, my opinion

Hi, Wanting to code a simple 3D virtual world, I looked into the recent spatial indexes SOC work, which depends on the proposed geometry library. I have to admit I was fairly disappointed. So I'm going to put some critics, some quite harsh maybe, along with some ramblings about what I would have liked to have. Note that I am no geometry or space indexing expert, so what I say may be perfectly silly. For starters, I think those libraries should be dimension agnostic and generic. Geometry not only has a strong 2D bias, but the documentation also says geometry::box only works with point_xy (and point_ll). Since it is used everywhere in spatial index, that means it's unusable on 3D. The geometry documentation doesn't clearly state what a geometry::box is. Is it (assuming 2D) any rectangle or only axis-aligned rectangles? In case it is just any rectangle, it mind be interesting to introduce a new type for axis-aligned ones, which can always be defined from two points whatever the dimension (well, at least it seems enough for 3D, I'm not good at projecting in other dimensions to see if that'd work). I suppose the spatial index interface expects axis-aligned boxes, so it's fairly important to clarify what we're talking about. Apart from that, there is also lots of point_xy-dependent code in spatial indexes. Also the quadtree implementation isn't dimension agnostic in particular (it should have 2^n children nodes, and not 4). I don't really get why it uses shared_ptr everywhere either (is it supposed to support COW or something?) I don't really get what boost::spatial_index::spatial_index is for. It does nothing but forward everything to the last template parameter. Instead of writing boost::spatial_index::spatial_index< point_type, value_type, boost::spatial_index::rtree<point_type, value_type>
index(args...);
I might as well write boost::spatial_index::rtree<point_type, value_type> index(args...); Also, I think the spatial indexes do not provide enough operations. The only things they allow is putting point => value pairs, retrieving the value associated to a point in O(log n) time (I assume, complexity is not documented), and finding all values within a certain box (in O(log n) too?). Many things one typically wants to do with spatial indexes are not possible: - Iterate all (points, value) pairs in the index - Find the element closest (or k-closest) to some point in space or other element. - Find all elements within a certain range of some point in space or other element - Find the elements that intersect with a given object (ray tracing, for example, searches for an element that intersects with a ray and that is the closest to its source) - Consider only a part of the indexed space (e.g. an area delimited by a range/sphere or box) - Being able to index arbitrary objects and not just points - Traverse the spatial index as a tree I believe there is no need to clutter the interface with a lot of functions, most algorithms can be implemented generically as free functions. Giving a read-only tree interface to all spatial indexes ought to be enough to implement all algorithms. A spatial index, after all (correct me if I'm wrong), is nothing else but a tree that partitions space, where every node defines a partition of space that includes all of its children (which might overlap or not). Subtrees should be full-fledged spatial indexes (so that I can perform a search in a subtree), albeit read-only. Apart from that they should allow insertion and removal of object => value pairs. Quadtrees and r-trees should be fairly easy to work with: they both partition space into axis-aligned boxes (which are even cubes in quadtrees). As for allowing all kinds of objects to be stored instead of just points, that should be possible by using generic distance and intersection functions. The real difference is that the same object could eventually be in multiple places in the tree, because it spans over multiple space partitions. Whenever you put an object in the tree or split a partition, however, it would be necessary to calculate in which partitions to put the object, whereas with points you only have to find the one partition. Of course, some cases can be specialized and optimized, like axis-aligned boxes in a R-tree, which can directly be used as partitions and thus never span across multiple ones. I find it weird having a different interface for R-trees and Quadtrees. Indeed, one can only put points in a quadtree while in a r-tree you can put boxes as well. A generic approach allowing to put any object with boxes specialized seems like a better approach to me. About geometry, I think there should be more basic objects. While it is true polygons and multipolygons (polyhedrons) are enough to represent everything in 3D space, having rays, segments, triangles, quads, circles, spheres, squares, rectangles, cubes, cuboids (boxes), planes, half-planes and more, with ability to compute all intersections and distances, would be the most practical thing for 3D development. I do not know how to organize this in a really dimension-agnostic manner, however. In a nutshell, I think the spatial indexes design and how geometry can help it ought to be discussed.

-------------------------------------------------- From: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org> Sent: Tuesday, October 07, 2008 5:52 PM To: <boost@lists.boost.org> Subject: [boost] Geometry and spatial indexes, my opinion
Hi,
Wanting to code a simple 3D virtual world, I looked into the recent spatial indexes SOC work, which depends on the proposed geometry library.
I have to admit I was fairly disappointed. So I'm going to put some critics, some quite harsh maybe, along with some ramblings about what I would have liked to have. Note that I am no geometry or space indexing expert, so what I say may be perfectly silly.
For starters, I think those libraries should be dimension agnostic and generic. Geometry not only has a strong 2D bias, but the documentation also says geometry::box only works with point_xy (and point_ll). Since it is used everywhere in spatial index, that means it's unusable on 3D.
Hi Mathias, I just wanted to say something here about dimensional agnosticism. I don't think this is necessarily the way to go. There are a couple of reasons that I say this. The more trivial point is that higher than 3D geometry isn't likely to be supported and isn't of much practical use anyway. The second point is on the basis of coordinate frameworks. While I agree that any library shouldn't necessarily be limited to 2D, I don't think that you should lose the notions of what type of coordinate frame you are working in. So while you may be able to have 0, 1, or 2 to describe agnostically the dimension in the framework you want, the descriptions of what those dimensions are is lost. For example, you may have an algorithm which is designed to work in Cartesian coordinates in 2D (or 3D). You may have another algorithm which requires points in Polar (Spherical) coordinates. Another might want cylindrical coordinates to be optimal. Then there are algorithms which require translation of points into parametric coordinates to perform some logic optimally. If you have points which are agnostic to dimension, how can you distinguish 0,1,2 from x,y,z or r, theta, phi? The constraints imposed by these concepts are rather that when given a point of some dimensionality you need to be able to determine the type of coordinate frame that the point is describing. I've been working on a generative geometry library which includes support for these distinctions via conceptually enforced interfaces and traits.
The geometry documentation doesn't clearly state what a geometry::box is. Is it (assuming 2D) any rectangle or only axis-aligned rectangles? In case it is just any rectangle, it mind be interesting to introduce a new type for axis-aligned ones, which can always be defined from two points whatever the dimension (well, at least it seems enough for 3D, I'm not good at projecting in other dimensions to see if that'd work). I suppose the spatial index interface expects axis-aligned boxes, so it's fairly important to clarify what we're talking about.
Apart from that, there is also lots of point_xy-dependent code in spatial indexes. Also the quadtree implementation isn't dimension agnostic in particular (it should have 2^n children nodes, and not 4). I don't really get why it uses shared_ptr everywhere either (is it supposed to support COW or something?)
I don't really get what boost::spatial_index::spatial_index is for. It does nothing but forward everything to the last template parameter.
Instead of writing boost::spatial_index::spatial_index< point_type, value_type, boost::spatial_index::rtree<point_type, value_type>
index(args...);
I might as well write boost::spatial_index::rtree<point_type, value_type> index(args...);
Also, I think the spatial indexes do not provide enough operations. The only things they allow is putting point => value pairs, retrieving the value associated to a point in O(log n) time (I assume, complexity is not documented), and finding all values within a certain box (in O(log n) too?). Many things one typically wants to do with spatial indexes are not possible: - Iterate all (points, value) pairs in the index - Find the element closest (or k-closest) to some point in space or other element. - Find all elements within a certain range of some point in space or other element - Find the elements that intersect with a given object (ray tracing, for example, searches for an element that intersects with a ray and that is the closest to its source) - Consider only a part of the indexed space (e.g. an area delimited by a range/sphere or box) - Being able to index arbitrary objects and not just points - Traverse the spatial index as a tree
I believe there is no need to clutter the interface with a lot of functions, most algorithms can be implemented generically as free functions. Giving a read-only tree interface to all spatial indexes ought to be enough to implement all algorithms. A spatial index, after all (correct me if I'm wrong), is nothing else but a tree that partitions space, where every node defines a partition of space that includes all of its children (which might overlap or not). Subtrees should be full-fledged spatial indexes (so that I can perform a search in a subtree), albeit read-only. Apart from that they should allow insertion and removal of object => value pairs.
Quadtrees and r-trees should be fairly easy to work with: they both partition space into axis-aligned boxes (which are even cubes in quadtrees).
As for allowing all kinds of objects to be stored instead of just points, that should be possible by using generic distance and intersection functions. The real difference is that the same object could eventually be in multiple places in the tree, because it spans over multiple space partitions. Whenever you put an object in the tree or split a partition, however, it would be necessary to calculate in which partitions to put the object, whereas with points you only have to find the one partition.
Of course, some cases can be specialized and optimized, like axis-aligned boxes in a R-tree, which can directly be used as partitions and thus never span across multiple ones. I find it weird having a different interface for R-trees and Quadtrees. Indeed, one can only put points in a quadtree while in a r-tree you can put boxes as well. A generic approach allowing to put any object with boxes specialized seems like a better approach to me.
About geometry, I think there should be more basic objects. While it is true polygons and multipolygons (polyhedrons) are enough to represent everything in 3D space, having rays, segments, triangles, quads, circles, spheres, squares, rectangles, cubes, cuboids (boxes), planes, half-planes and more, with ability to compute all intersections and distances, would be the most practical thing for 3D development. I do not know how to organize this in a really dimension-agnostic manner, however.
This certainly makes sense, though I think starting with points, segments and polylines/polygons give sufficient primitives to do most basic geometry operations. These could of course be specialized into classes as you have described.
In a nutshell, I think the spatial indexes design and how geometry can help it ought to be discussed.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-------------------------------------------------- From: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org> Sent: Tuesday, October 07, 2008 5:52 PM To: <boost@lists.boost.org> Subject: [boost] Geometry and spatial indexes, my opinion
Hi,
Wanting to code a simple 3D virtual world, I looked into the recent spatial indexes SOC work, which depends on the proposed geometry library.
I have to admit I was fairly disappointed. So I'm going to put some critics, some quite harsh maybe, along with some ramblings about what I would have liked to have. Note that I am no geometry or space indexing expert, so what I say may be perfectly silly.
For starters, I think those libraries should be dimension agnostic and generic. Geometry not only has a strong 2D bias, but the documentation also says geometry::box only works with point_xy (and point_ll). Since it is used everywhere in spatial index, that means it's unusable on 3D.
Hi Mathias, I just wanted to say something here about dimensional agnosticism. I don't think this is necessarily the way to go. There are a couple of reasons that I say this. The more trivial point is that higher than 3D geometry isn't likely to be supported and isn't of much practical use anyway. The second point is on the basis of coordinate frameworks. While I agree that any library shouldn't necessarily be limited to 2D, I don't think that you should lose the notions of what type of coordinate frame you are working in. So while you may be able to have 0, 1, or 2 to describe agnostically the dimension in the framework you want, the descriptions of what those dimensions are is lost. For example, you may have an algorithm which is designed to work in Cartesian coordinates in 2D (or 3D). You may have another algorithm which requires points in Polar (Spherical) coordinates. Another might want cylindrical coordinates to be optimal. Then there are algorithms which require translation of points into parametric coordinates to perform some logic optimally. If you have points which are agnostic to dimension, how can you distinguish 0,1,2 from x,y,z or r, theta, phi? The constraints imposed by these concepts are rather that when given a point of some dimensionality you need to be able to determine the type of coordinate frame that the point is describing. I've been working on a generative geometry library which includes support for these distinctions via conceptually enforced interfaces and traits.
The geometry documentation doesn't clearly state what a geometry::box is. Is it (assuming 2D) any rectangle or only axis-aligned rectangles? In case it is just any rectangle, it mind be interesting to introduce a new type for axis-aligned ones, which can always be defined from two points whatever the dimension (well, at least it seems enough for 3D, I'm not good at projecting in other dimensions to see if that'd work). I suppose the spatial index interface expects axis-aligned boxes, so it's fairly important to clarify what we're talking about.
Apart from that, there is also lots of point_xy-dependent code in spatial indexes. Also the quadtree implementation isn't dimension agnostic in particular (it should have 2^n children nodes, and not 4). I don't really get why it uses shared_ptr everywhere either (is it supposed to support COW or something?)
I don't really get what boost::spatial_index::spatial_index is for. It does nothing but forward everything to the last template parameter.
Instead of writing boost::spatial_index::spatial_index< point_type, value_type, boost::spatial_index::rtree<point_type, value_type>
index(args...);
I might as well write boost::spatial_index::rtree<point_type, value_type> index(args...);
Also, I think the spatial indexes do not provide enough operations. The only things they allow is putting point => value pairs, retrieving the value associated to a point in O(log n) time (I assume, complexity is not documented), and finding all values within a certain box (in O(log n) too?). Many things one typically wants to do with spatial indexes are not possible: - Iterate all (points, value) pairs in the index - Find the element closest (or k-closest) to some point in space or other element. - Find all elements within a certain range of some point in space or other element - Find the elements that intersect with a given object (ray tracing, for example, searches for an element that intersects with a ray and that is the closest to its source) - Consider only a part of the indexed space (e.g. an area delimited by a range/sphere or box) - Being able to index arbitrary objects and not just points - Traverse the spatial index as a tree
I believe there is no need to clutter the interface with a lot of functions, most algorithms can be implemented generically as free functions. Giving a read-only tree interface to all spatial indexes ought to be enough to implement all algorithms. A spatial index, after all (correct me if I'm wrong), is nothing else but a tree that partitions space, where every node defines a partition of space that includes all of its children (which might overlap or not). Subtrees should be full-fledged spatial indexes (so that I can perform a search in a subtree), albeit read-only. Apart from that they should allow insertion and removal of object => value pairs.
Quadtrees and r-trees should be fairly easy to work with: they both partition space into axis-aligned boxes (which are even cubes in quadtrees).
As for allowing all kinds of objects to be stored instead of just points, that should be possible by using generic distance and intersection functions. The real difference is that the same object could eventually be in multiple places in the tree, because it spans over multiple space partitions. Whenever you put an object in the tree or split a partition, however, it would be necessary to calculate in which partitions to put the object, whereas with points you only have to find the one partition.
Of course, some cases can be specialized and optimized, like axis-aligned boxes in a R-tree, which can directly be used as partitions and thus never span across multiple ones. I find it weird having a different interface for R-trees and Quadtrees. Indeed, one can only put points in a quadtree while in a r-tree you can put boxes as well. A generic approach allowing to put any object with boxes specialized seems like a better approach to me.
About geometry, I think there should be more basic objects. While it is true polygons and multipolygons (polyhedrons) are enough to represent everything in 3D space, having rays, segments, triangles, quads, circles, spheres, squares, rectangles, cubes, cuboids (boxes), planes, half-planes and more, with ability to compute all intersections and distances, would be the most practical thing for 3D development. I do not know how to organize this in a really dimension-agnostic manner, however.
This certainly makes sense, though I think starting with points, segments and polylines/polygons give sufficient primitives to do most basic geometry operations. These could of course be specialized into classes as you have described.
In a nutshell, I think the spatial indexes design and how geometry can help it ought to be discussed.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi Brandon, Hi Mathias,
I just wanted to say something here about dimensional agnosticism. I don't think this is necessarily the way to go. There are a couple of reasons that I say this. The more trivial point is that higher than 3D geometry isn't likely to be supported and isn't of much practical use anyway.
Strongly disagree! Spatial index structures are very commonly used to accelerate nearest neighbor search, an important operation in numerous fields. There is a nice listing at http://en.wikipedia.org/wiki/Nearest_neighbor_search#Applications. The search space in such applications may have hundreds of dimensions. In computer vision we use nearest neighbor search for keypoint matching, which in turn is fundamental to content-based image retrieval, visual odometry, wide-baseline stereo, simultaneous localization and mapping, panorama stitching, and the list goes on.... Computational geometry has many, many applications beyond the obvious ones (e.g. collision detection), and restricting yourself to 3 dimensions or fewer is rather crippling. In my opinion, a Boost geometry library must have at least basic support for n-dimensional computational geometry. Let's start with able to handle points of arbitrary dimension and calculate distances between them (given some metric). This is easy to do and already sufficient to implement many useful spatial indices. I have not looked at Barend's geometry library in detail, but on the surface it looks generic enough to support this. The second point is on the basis of coordinate frameworks. While I agree
that any library shouldn't necessarily be limited to 2D, I don't think that you should lose the notions of what type of coordinate frame you are working in. So while you may be able to have 0, 1, or 2 to describe agnostically the dimension in the framework you want, the descriptions of what those dimensions are is lost. For example, you may have an algorithm which is designed to work in Cartesian coordinates in 2D (or 3D). You may have another algorithm which requires points in Polar (Spherical) coordinates. Another might want cylindrical coordinates to be optimal. Then there are algorithms which require translation of points into parametric coordinates to perform some logic optimally. If you have points which are agnostic to dimension, how can you distinguish 0,1,2 from x,y,z or r, theta, phi?
Surely this can be handled by the type system? Simply have distinct Point3D (Cartesian), SphericalPoint3D and CylindricalPoint3D classes with appropriate functionality. In the vision library I used to work with, there was a similar problem of supporting images with different numbers of channels and different color representations (RGB, HSV, XYZ...). We implemented them as distinct pixel types (IIRC Boost.GIL has more or less the same architecture), and even gave them conversion operators to each other so that we could use them interchangeably with correct behavior.
The constraints imposed by these concepts are rather that when given a point of some dimensionality you need to be able to determine the type of coordinate frame that the point is describing. I've been working on a generative geometry library which includes support for these distinctions via conceptually enforced interfaces and traits.
So it sounds like you are already doing something of the sort? Back to Mathias' criticisms - although I realize it is an alpha, I was similarly a bit disappointed when looking at the Spatial Indexes library. It is good code (I hope that Federico will not be discouraged by criticism!), but it is very limited in what it can do currently. It has no nearest neighbor search, nor even a traversal interface that would allow the user to implement it himself. It only provides 2D structures when it is very useful to support spaces of arbitrary dimension. I would like for the library to be generic in the dimension, not just the (2D) point type. Perhaps a good place to start is the quadtree - I do not see why the difference between a quadtree and an octree should be more than an int template parameter. I'm admittedly opening up a bit of a can of worms here - nearest neighbor search in high dimensions is a Hard Problem. However, there are straightforward approximate nearest neighbor algorithms that are efficient and plenty good enough for many applications. Let me summarize with a wish list: * tree traversal interface * generic in the dimension for any structure for which this makes sense * (k-)nearest neighbor and other standard query methods, as mentioned by Mathias * kd-tree implementation * metric tree implementation (splits need not be axis-aligned) * one or more good bulk-loading algorithms for each structure (repeated insertion is not acceptable) * approximate (k-)nearest neighbor search I could add some more items, but this is already a daunting amount of work. So perhaps Federico and Barend can clarify, what are their plans for development of the Spatial Index library? Do they include any of the items above? It seems to me that Federico's work is quite a good start, but there is a great deal of room for future development. Cheers, Patrick

Hello
In my opinion, a Boost geometry library must have at least basic support for n-dimensional computational geometry. Let's start with able to handle points of arbitrary dimension and calculate distances between them (given some metric). This is easy to do and already sufficient to implement many useful spatial indices. I have not looked at Barend's geometry library in detail, but on the surface it looks generic enough to support this.
This is already done. Making the distance algorithm dimension-agnostic is the first thing I did with Barend after the preview 2, when I joined the project. And the iteration between dimensions is obviously made at compile-time. Looks like we *really* have to show the preview 3 as soon as possible :-) Regards Bruno

On Wed, Oct 8, 2008 at 2:37 AM, Patrick Mihelich <patrick.mihelich@gmail.com
wrote:
In my opinion, a Boost geometry library must have at least basic support for n-dimensional computational geometry. Let's start with able to handle points of arbitrary dimension and calculate distances between them (given some metric). This is easy to do and already sufficient to implement many useful spatial indices. I have not looked at Barend's geometry library in detail, but on the surface it looks generic enough to support this.
I very much need this for the generic DM/KDD algorithms that I am working on. We started work on a clustering algorithms library at boostcon this year, and support for n-dimensional spatial indexes/queries is still needed to improve the performance of some of the algorithms. WRT the clustering library, I dropped the ball over the summer when things heated up at work and school, but will be picking it up again Real Soon Now. If you have any interest in clustering or classification algorithms, then please contact me! </shameless_plug>. Jon

On Wed, Oct 8, 2008 at 8:17 AM, Jonathan Franklin < franklin.jonathan@gmail.com> wrote:
I very much need this for the generic DM/KDD algorithms that I am working on. We started work on a clustering algorithms library at boostcon this year, and support for n-dimensional spatial indexes/queries is still needed to improve the performance of some of the algorithms.
WRT the clustering library, I dropped the ball over the summer when things heated up at work and school, but will be picking it up again Real Soon Now. If you have any interest in clustering or classification algorithms, then please contact me! </shameless_plug>.
This is very interesting, and relevant to my current work. I would love to see a nice generic clustering library in Boost. I can't promise much help at this point (I also have a project I've been meaning to pick up again Real Soon Now :-) but let me know when you start working on it again. Clustering and spatial indexes have some close connections - as you say, spatial indexes can be used to improve the performance of some clustering algorithms. Likewise, clustering is sometimes used to construct spatial indexes (agglomerative clustering, hierarchical k-means). So, it would be nice for the libraries to play nicely with each other... but more important to get the core functionality in place first, of course. Cheers, Patrick

On Wed, Oct 8, 2008 at 4:17 PM, Patrick Mihelich <patrick.mihelich@gmail.com
wrote:
I would love to see a nice generic clustering library in Boost. I can't promise much help at this point (I also have a project I've been meaning to pick up again Real Soon Now :-) but let me know when you start working on it again.
I will.
Clustering and spatial indexes have some close connections - as you say, spatial indexes can be used to improve the performance of some clustering algorithms. Likewise, clustering is sometimes used to construct spatial indexes (agglomerative clustering, hierarchical k-means). So, it would be nice for the libraries to play nicely with each other... but more important to get the core functionality in place first, of course.
I agree, on all counts. I'm also looking at it as a starting point for other DM/KDD algorithms. I'd love to build a generic SVM implementation, and there are plenty of others. Jon

On Wed, Oct 8, 2008 at 4:36 PM, Jonathan Franklin <franklin.jonathan@gmail.com> wrote:
other DM/KDD algorithms. I'd love to build a generic SVM implementation, and there are plenty of others.
The kernel-machine library has some genericity, uses boost and implements SVM: http://www.terborg.net/research/kml/ I've only used it briefly and a long time ago, but I remember liking it (it was actually what introduced me to boost). Might be worth looking at if you're thinking of implementing your own. Kind regards, Stjepan

Hi Mathias, Thanks for your interest in the geometry / spatial index library. There were also some other postings about this library last week, but to be clear. The preview (on which the spatial index was built) was posted in March this year. There were many discussions about this preview, among others about dimension agnosticity and concepts. Bruno Lalande joined me. We agree with those critics and have reworked the library, which indeed took a while, but which will now appear very soon on this list. It is dimension agnostic and uses concepts.
For starters, I think those libraries should be dimension agnostic and generic. Geometry not only has a strong 2D bias, but the documentation also says geometry::box only works with point_xy (and point_ll). Since it is used everywhere in spatial index, that means it's unusable on 3D.
The box in the library was just a small helper class. It is indeed more important in the spatial index library and should be generic, indeed. However, the restriction for just xy / ll points is gone, it can take 3d points now, for example. Points (point concepts) are now dimension agnostic and generic in our library. I will not go into the details of your comments on the spatial index library, but keep in mind that it was a GSoC exercise and that the author, who did a great job, noted with his final report: (paste of his email):
The code is still beta (or maybe alpha) and not recommended for production use because it is still not tested enough, but if you want to check the actual status you can find it in the following location...
About geometry, I think there should be more basic objects. While it is true polygons and multipolygons (polyhedrons) are enough to represent everything in 3D space, having rays, segments, triangles, quads, circles, spheres, squares, rectangles, cubes, cuboids (boxes), planes, half-planes and more, with ability to compute all intersections and distances, would be the most practical thing for 3D development. I do not know how to organize this in a really dimension-agnostic manner, however. In a nutshell, I think the spatial indexes design and how geometry can help it ought to be discussed.
You're probably right that there should be more objects. Especially 3D/gaming geometry is quite demanding. We still concentrate on the basics, "point", the concepts, most discussions on the list are about those basics. Besides objects there should also be more algorithms. A "complete" geometry library, designed for all purposes, will be huge. To build it in a generic way, where every boost user believes in and agrees with, is challenging and will need much discussion. So agree, it ought to be discussed. Within a week or so a new preview (Preview 3) from our side should be there. Best regards, Barend, Geodan, Amsterdam, the Netherlands

Hi Patrick, :) -------------------------------------------------- From: "Patrick Mihelich" <patrick.mihelich@gmail.com> Sent: Wednesday, October 08, 2008 3:37 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
Hi Brandon,
Hi Mathias,
I just wanted to say something here about dimensional agnosticism. I don't think this is necessarily the way to go. There are a couple of reasons that I say this. The more trivial point is that higher than 3D geometry isn't likely to be supported and isn't of much practical use anyway.
Strongly disagree! Spatial index structures are very commonly used to accelerate nearest neighbor search, an important operation in numerous fields. There is a nice listing at http://en.wikipedia.org/wiki/Nearest_neighbor_search#Applications. The search space in such applications may have hundreds of dimensions. In computer vision we use nearest neighbor search for keypoint matching, which in turn is fundamental to content-based image retrieval, visual odometry, wide-baseline stereo, simultaneous localization and mapping, panorama stitching, and the list goes on.... Computational geometry has many, many applications beyond the obvious ones (e.g. collision detection), and restricting yourself to 3 dimensions or fewer is rather crippling.
In my opinion, a Boost geometry library must have at least basic support for n-dimensional computational geometry. Let's start with able to handle points of arbitrary dimension and calculate distances between them (given some metric). This is easy to do and already sufficient to implement many useful spatial indices. I have not looked at Barend's geometry library in detail, but on the surface it looks generic enough to support this.
Ok, I take your point. I wasn't suggesting that a library not be able to handle more than 3 dimensions. Rather I would suggest that the population of developers who have such requirements must be very small. Considering that generalization on dimension is not trivial for many algorithms, this leads me to the conclusion that the specific algorithms will decide the requirements on their inputs. This is why I chose a traits based interface for the geometry library I've been implementing. There's nothing to stop one from creating a multi-dimensional Cartesian/polar/parametric access traits interface for any algorithm which requires it. So I suppose I agree, the geometry library should not constrain on the number of dimensions. But I don't agree with full agnosticism (to mean agnosticism of coordinate frame as well.) There should be a way to differentiate points further to align with the concrete traits of the dimension they describe.
The second point is on the basis of coordinate frameworks. While I agree
that any library shouldn't necessarily be limited to 2D, I don't think that you should lose the notions of what type of coordinate frame you are working in. So while you may be able to have 0, 1, or 2 to describe agnostically the dimension in the framework you want, the descriptions of what those dimensions are is lost. For example, you may have an algorithm which is designed to work in Cartesian coordinates in 2D (or 3D). You may have another algorithm which requires points in Polar (Spherical) coordinates. Another might want cylindrical coordinates to be optimal. Then there are algorithms which require translation of points into parametric coordinates to perform some logic optimally. If you have points which are agnostic to dimension, how can you distinguish 0,1,2 from x,y,z or r, theta, phi?
Surely this can be handled by the type system? Simply have distinct Point3D (Cartesian), SphericalPoint3D and CylindricalPoint3D classes with appropriate functionality. In the vision library I used to work with, there was a similar problem of supporting images with different numbers of channels and different color representations (RGB, HSV, XYZ...). We implemented them as distinct pixel types (IIRC Boost.GIL has more or less the same architecture), and even gave them conversion operators to each other so that we could use them interchangeably with correct behavior.
One certainly could if they chose to take the route of providing all these point classes.
The constraints imposed by these concepts are rather that when given a point of some dimensionality you need to be able to determine the type of coordinate frame that the point is describing. I've been working on a generative geometry library which includes support for these distinctions via conceptually enforced interfaces and traits.
So it sounds like you are already doing something of the sort?
Yes, I have taken a slightly different route than the other libs I have found in the vault. In my library the algorithms require inputs conform to a specific access traits concept. I have been using get_x, get_y, get_z as the general interface for cartesian (and would have used get_r, get_theta, get_phi for the others.. if I had gotten to that), but I think now that I'm fairly well persuaded to investigate a compile time indexed mechanism like boost::tuple. Has anyone done a performance test on using such a mechanism in practical settings? While I'm sure it's constant, how big is that constant in practice? Is it worth paying in cases where an algorithm requires a constrained dimensionality? I'm guessing nobody has really addressed these questions... so perhaps I'll take a gander and see if it's worth providing both interfaces in my access traits.
Back to Mathias' criticisms - although I realize it is an alpha, I was similarly a bit disappointed when looking at the Spatial Indexes library. It is good code (I hope that Federico will not be discouraged by criticism!), but it is very limited in what it can do currently. It has no nearest neighbor search, nor even a traversal interface that would allow the user to implement it himself. It only provides 2D structures when it is very useful to support spaces of arbitrary dimension. I would like for the library to be generic in the dimension, not just the (2D) point type. Perhaps a good place to start is the quadtree - I do not see why the difference between a quadtree and an octree should be more than an int template parameter.
I'm admittedly opening up a bit of a can of worms here - nearest neighbor search in high dimensions is a Hard Problem. However, there are straightforward approximate nearest neighbor algorithms that are efficient and plenty good enough for many applications.
I can't say that I have seriously considered this. Just out of curiosity, how do kd-trees scale memory-wise on average in high dimension?
Let me summarize with a wish list: * tree traversal interface * generic in the dimension for any structure for which this makes sense * (k-)nearest neighbor and other standard query methods, as mentioned by Mathias * kd-tree implementation * metric tree implementation (splits need not be axis-aligned) * one or more good bulk-loading algorithms for each structure (repeated insertion is not acceptable) * approximate (k-)nearest neighbor search
I could add some more items, but this is already a daunting amount of work. So perhaps Federico and Barend can clarify, what are their plans for development of the Spatial Index library? Do they include any of the items above? It seems to me that Federico's work is quite a good start, but there is a great deal of room for future development.
Cheers, Patrick
Cheers, Brandon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Oct 8, 2008 at 7:51 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
Ok, I take your point. I wasn't suggesting that a library not be able to handle more than 3 dimensions. Rather I would suggest that the population of developers who have such requirements must be very small. Considering that generalization on dimension is not trivial for many algorithms, this leads me to the conclusion that the specific algorithms will decide the requirements on their inputs. This is why I chose a traits based interface for the geometry library I've been implementing. There's nothing to stop one from creating a multi-dimensional Cartesian/polar/parametric access traits interface for any algorithm which requires it. So I suppose I agree, the geometry library should not constrain on the number of dimensions. But I don't agree with full agnosticism (to mean agnosticism of coordinate frame as well.) There should be a way to differentiate points further to align with the concrete traits of the dimension they describe.
This sounds reasonable. Certainly there are many 2D/3D algorithms for which there is little point in generalizing to higher dimensions.
Yes, I have taken a slightly different route than the other libs I have found in the vault. In my library the algorithms require inputs conform to a specific access traits concept. I have been using get_x, get_y, get_z as the general interface for cartesian (and would have used get_r, get_theta, get_phi for the others.. if I had gotten to that), but I think now that I'm fairly well persuaded to investigate a compile time indexed mechanism like boost::tuple. Has anyone done a performance test on using such a mechanism in practical settings? While I'm sure it's constant, how big is that constant in practice? Is it worth paying in cases where an algorithm requires a constrained dimensionality? I'm guessing nobody has really addressed these questions... so perhaps I'll take a gander and see if it's worth providing both interfaces in my access traits.
Cool, compile time indexing is very useful. But, I'm not sure I see the need for using tuples. I can't offhand think of a use case where the dimensions are of heterogeneous type, so wouldn't a simple array work?
Back to Mathias' criticisms - although I realize it is an alpha, I was
similarly a bit disappointed when looking at the Spatial Indexes library. It is good code (I hope that Federico will not be discouraged by criticism!), but it is very limited in what it can do currently. It has no nearest neighbor search, nor even a traversal interface that would allow the user to implement it himself. It only provides 2D structures when it is very useful to support spaces of arbitrary dimension. I would like for the library to be generic in the dimension, not just the (2D) point type. Perhaps a good place to start is the quadtree - I do not see why the difference between a quadtree and an octree should be more than an int template parameter.
I'm admittedly opening up a bit of a can of worms here - nearest neighbor search in high dimensions is a Hard Problem. However, there are straightforward approximate nearest neighbor algorithms that are efficient and plenty good enough for many applications.
I can't say that I have seriously considered this. Just out of curiosity, how do kd-trees scale memory-wise on average in high dimension?
They scale quite well memory-wise - each node splits on a single dimension, so the size of the internal data is independent of the number of dimensions. Of course the size of the data points themselves is linear in the number of dimensions. The real problem with using kd-trees in high dimensions (for nearest neighbor search) is the "Curse of Dimensionality." Nearest neighbor search using kd-trees consists of (very succinctly) a depth-first search to the node containing the query point, followed by a back-tracking branch-and-bound search to ensure that we find the data point closest to the query point (as they need not be in the same node). For low dimension this tends to be quite efficient. For d > 10 or so, we often end up looking at most of the nodes in the tree during the back-tracking search... which can be slower than a naive linear scan of the data points. This is why we resort to approximate methods in high dimensions. One popular way is to simply cut off the back-tracking search after some number of nodes, which we expand in order of their nearness to the query point. In the vision community this algorithm is known as Best Bin First. Cheers, Patrick

On Wed, Oct 8, 2008 at 5:33 PM, Patrick Mihelich <patrick.mihelich@gmail.com> wrote:
Cool, compile time indexing is very useful. But, I'm not sure I see the need for using tuples. I can't offhand think of a use case where the dimensions are of heterogeneous type, so wouldn't a simple array work?
Sure a simple array should work. The point of a generic library is that a simple array would work, so would a tuple, and so would your custom point class. As for the heterogeneous types, think of huge (GBs worth) digital elevation models in meter units where altitude is in a range from 0 - 20,000m. In that case, my point type would be something like vector3<int, short, int> instead of a vector3<int>. With a 1024x1024 file, that would result in a savings of only 2MB of memory, but extrapolate that out to a 16384x16384 file, and the savings is 500MB. --Michael Fawcett

On Wed, Oct 8, 2008 at 2:53 PM, Michael Fawcett <michael.fawcett@gmail.com>wrote:
On Wed, Oct 8, 2008 at 5:33 PM, Patrick Mihelich <patrick.mihelich@gmail.com> wrote:
Cool, compile time indexing is very useful. But, I'm not sure I see the
need
for using tuples. I can't offhand think of a use case where the dimensions are of heterogeneous type, so wouldn't a simple array work?
Sure a simple array should work. The point of a generic library is that a simple array would work, so would a tuple, and so would your custom point class.
As for the heterogeneous types, think of huge (GBs worth) digital elevation models in meter units where altitude is in a range from 0 - 20,000m. In that case, my point type would be something like vector3<int, short, int> instead of a vector3<int>.
With a 1024x1024 file, that would result in a savings of only 2MB of memory, but extrapolate that out to a 16384x16384 file, and the savings is 500MB.
OK, yes, I'm convinced. Support for heterogeneous types has its uses. But I will still ask, why use a tuple to store the data in the basic (Cartesian) point class? Barend and Bruno's library uses an array instead, which I think should be preferred for two reasons. (1) I would still think the homogeneous case to be much more common the heterogeneous case. (2) Using a tuple makes it difficult or impossible to use the generic point class in high dimensions. Consider typedef geometry::point<float, 128> sift_descriptor; vs. typedef geometry::point< fusion::vector<float, float, float, float, ... /* 128 of these */> > sift_descriptor; which is probably beyond what Fusion supports, and would break the compiler anyway. For your use case, it would of course be possible to define your own heterogeneous point class. Now that I think about it, it could be very interesting to make the provided point classes Fusion-compatible sequence types via the extension mechanism. After all, we are already agreed on supporting compile-time access. And then, why not go all the way and write the default implementations of the core algorithms on top of Fusion? In that case typedef fusion::vector3<int, short, int> dem_point; would work out of the box. Patrick

Hi,
Cool, compile time indexing is very useful. But, I'm not sure I see the need for using tuples. I can't offhand think of a use case where the dimensions are of heterogeneous type, so wouldn't a simple array work?
Sure a simple array should work. The point of a generic library is that a simple array would work, so would a tuple, and so would your custom point class.
As for the heterogeneous types, think of huge (GBs worth) digital elevation models in meter units where altitude is in a range from 0 - 20,000m. In that case, my point type would be something like vector3<int, short, int> instead of a vector3<int>.
With a 1024x1024 file, that would result in a savings of only 2MB of memory, but extrapolate that out to a 16384x16384 file, and the savings is 500MB.
OK, yes, I'm convinced. Support for heterogeneous types has its uses. But I will still ask, why use a tuple to store the data in the basic (Cartesian) point class? Barend and Bruno's library uses an array instead, which I think should be preferred for two reasons. (1) I would still think the homogeneous case to be much more common the heterogeneous case. (2) Using a tuple makes it difficult or impossible to use the generic point class in high dimensions. Consider
Stop me if I'm wrong but what you're discussing now is how to implement the point class provided by an ideal geometry library, right? If so, I think it's not the most important concern since, as I've already said, the primary goal is not to provide a point class but to define how points are accessed and defined in general. I think that, by talking about tuples, Brandon was rather pointing out that points should be accessible the same way as tuples: get<N>(p). Which is quite a general consensus now, IMO. Bruno

On Thu, Oct 9, 2008 at 1:48 AM, Bruno Lalande <bruno.lalande@gmail.com>wrote:
Stop me if I'm wrong but what you're discussing now is how to implement the point class provided by an ideal geometry library, right? If so, I think it's not the most important concern since, as I've already said, the primary goal is not to provide a point class but to define how points are accessed and defined in general. I think that, by talking about tuples, Brandon was rather pointing out that points should be accessible the same way as tuples: get<N>(p). Which is quite a general consensus now, IMO.
OK, perhaps I misinterpreted Brandon's comment. I have been getting caught up on the March discussions, so apologies if I rehash old debates. If he was simply advocating compile-time indexing via get<N>(p) or similar syntax, then I completely agree. This should be required by the Point concept (or Coordinate concept? I do not quite understand the distinction). On the other hand, I'm worried to see that the geometry::point implementation (at least the version included with SpatialIndexes) supports only compile-time indexing. I think we must also have a RuntimeIndexable concept, to which a particular point type may or may not conform. RuntimeIndexable would require an indexing operator[], so it can only be modeled by homogeneous points. Iterators via begin() and end() could be useful as well. I have a couple of reasons for wanting this, as usual motivated by high dimensional spaces... First, some algorithms require runtime indexing. kd-trees are an interesting example. When constructing a kd-tree over a data set, we need some heuristic to decide which dimension to split on at each tree node. In 2 or 3 dimensions, usually you simply cycle through the dimensions in order, for which in principle only compile-time indexing is necessary. But this heuristic is not so appropriate in high dimensions; then it is usual to split each tree node on the dimension with the greatest variance in the data, which of course is unknown at compile time. Second, there are efficiency and code size concerns. Consider the Euclidean distance function. For 2D/3D points, it is nice to use compile-time indexing and metaprogramming to guarantee tight, loop-less code. For high dimensional points, however, beyond some dimension (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer desirable, and we would rather just do the runtime iteration. If the points are RuntimeIndexable, we can choose an appropriate distance algorithm at compile time depending on the points' dimensionality. I do think you are on the right track, and I look forward to seeing preview 3. Cheers, Patrick

-- Patrick Mihelich wrote:
On the other hand, I'm worried to see that the geometry::point implementation (at least the version included with SpatialIndexes) supports only compile-time indexing. I think we must also have a RuntimeIndexable concept, to which a particular point type may or may not conform. RuntimeIndexable would require an indexing operator[], so it can only be modeled by homogeneous points. Iterators via begin() and end() could be useful as well. I have a couple of reasons for wanting this, as usual motivated by high dimensional spaces...
I personally believe that runtime indexable is preferable to compile time only. It is like defining an array that is only allowed to take compile time constants as indexes. How useful is that array? I don't believe that when a compile time constant is specified as the runtime index into a point that the compiler will be challenged to inline the accessor function and convert the index directly to an offset within the point data type when it is optimizing the code. I rely heavily on the compiler to inline and optimize in these cases to eliminate the runtime overhead of the abstractions I want to create, and the compiler generally doesn't disappoint me and is improving all the time. Luke

AMDG Simonson, Lucanus J wrote:
I personally believe that runtime indexable is preferable to compile time only. It is like defining an array that is only allowed to take compile time constants as indexes. How useful is that array?
Here's my take on the matter: An algorithm that uses only compile-time indexing is preferable because it can work with an arbitrary point class more easily. A point class that supports runtime indexing is preferable because it works easily with more algorithms. In Christ, Steven Watanabe

Steven Watanabe wrote:
An algorithm that uses only compile-time indexing is preferable because it can work with an arbitrary point class more easily.
A point class that supports runtime indexing is preferable because it works easily with more algorithms.
Most legacy point types don't provide runtime indexing, but that isn't much of an obstacle to providing runtime indexing accessors to them. template <> coord get(point& p, int i) { if(i == 0) return p.x(); return p.y(); } On the other hand, algorithms that don't take runtime parameters have to be instantiated for all possible values. As the number of the dimensions in the systems goes up it becomes less reasonable to enumerate dimensions and more likely that the data types are runtime indexable. if(i == 0) { do_something_algorithmic<0>(point); } else { do_something_algorithmic<1>(point); } Even in two dimensions where it is just a matter of where you push the wrinkle in the carpet, the key difference is that the former goes in one place while the latter goes everywhere the algorithm is used when the parameter is a runtime value. I push the wrinkle under the couch. In my application domain, the parameter is typically a runtime value. if(preferred_orientation(metal_layer) == HORIZONTAL) do_something<0>(); else do_somthing<1>(); vs: do_something(preferred_orientation(metal_layer)); This is an important distinction to my users. Basically, I pushed down the runtime index to the lowest level and factor the if statements out of the high level (user) code. That leads to better user code, and improving the quality of the code written in terms of my API by thoughtful interface design was something I put a lot of work into. Luke

Michael Fawcett wrote:
As for the heterogeneous types, think of huge (GBs worth) digital elevation models in meter units where altitude is in a range from 0 - 20,000m. In that case, my point type would be something like vector3<int, short, int> instead of a vector3<int>.
Phil Endecott wrote:
In GPS data, I store altitude to the nearest metre in an int16_t, and altitude and longitude in degrees in 32-bit fixed-point types. Although longitude needs one more bit than latitude I tend to use the same type for both, so my 2d algorithms are homogeneous. But anything that also involves altitude needs to be heterogeneous.
In VLSI layout we sometimes represent the layer as the z axis of a 3D coordinate system. Because there are only tens of layers we can use something smaller than int to store it. However, the compiler will often pad it back to word aligned addressing and insert padding bytes into a data structure, reducing the memory savings. Also, internally, all arithmetic is 32 bit or greater, so there is no advantage in using smaller data types for local operations. I think it is perfectly reasonable to allow point classes to have heterogeneous coordinate types, but require them to cast to and from a homogeneous coordinate type at the interface between that object and the geometry library. In all three examples, we would make the coordinate type of the interface between the point and the library int and allow the 16 bit value to cast up to 32 bits when it is read into an algorithm and back down to 16 bits when it is written out. Luke

On Wed, Oct 8, 2008 at 7:32 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
Michael Fawcett wrote:
As for the heterogeneous types, think of huge (GBs worth) digital elevation models in meter units where altitude is in a range from 0 - 20,000m. In that case, my point type would be something like vector3<int, short, int> instead of a vector3<int>.
Phil Endecott wrote:
In GPS data, I store altitude to the nearest metre in an int16_t, and altitude and longitude in degrees in 32-bit fixed-point types. Although longitude needs one more bit than latitude I tend to use the same type for both, so my 2d algorithms are homogeneous. But anything that also involves altitude needs to be heterogeneous.
In VLSI layout we sometimes represent the layer as the z axis of a 3D coordinate system. Because there are only tens of layers we can use something smaller than int to store it. However, the compiler will often pad it back to word aligned addressing and insert padding bytes into a data structure, reducing the memory savings. Also, internally, all arithmetic is 32 bit or greater, so there is no advantage in using smaller data types for local operations. I think it is perfectly reasonable to allow point classes to have heterogeneous coordinate types, but require them to cast to and from a homogeneous coordinate type at the interface between that object and the geometry library. In all three examples, we would make the coordinate type of the interface between the point and the library int and allow the 16 bit value to cast up to 32 bits when it is read into an algorithm and back down to 16 bits when it is written out.
I don't understand why the interface or algorithm cares whether it's homogeneous or heterogeneous. --Michael Fawcett

In VLSI layout we sometimes represent the layer as the z axis of a 3D coordinate system. Because there are only tens of layers we can use something smaller than int to store it. However, the compiler will often pad it back to word aligned addressing and insert padding bytes into a data structure, reducing the memory savings. Also, internally, all arithmetic is 32 bit or greater, so there is no advantage in using smaller data types for local operations. I think it is perfectly reasonable to allow point classes to have heterogeneous coordinate types, but require them to cast to and from a homogeneous coordinate type at the interface between that object and the geometry library. In all three examples, we would make the coordinate type of the interface between the point and the library int and allow the 16 bit value to cast up to 32 bits when it is read into an algorithm and back down to 16 bits when it is written out.
--Michael Fawcett wrote:
I don't understand why the interface or algorithm cares whether it's homogeneous or heterogeneous.
The interface or the algorithm must necessarily be much more complex to accommodate heterogeneous coordinates. Moreover, you have a real problem with casting when you do arithmetic between heterogeneous coordinates in an algorithm. x_coordinate_type x; y_coordinate_type y; ?_coordinate_type result = x+y; You would probably have to register a type for casting to: x_coordinate_type x; y_coordinate_type y; general_coordinate_type result = (general_coordinate_type)x + (general_coordinate_type)y; Which is exactly what you do if you cast to the general coordinate type in the interface between the algorithm and the data type. Luke

On Thu, Oct 9, 2008 at 2:06 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
In VLSI layout we sometimes represent the layer as the z axis of a 3D coordinate system. Because there are only tens of layers we can use something smaller than int to store it. However, the compiler will often pad it back to word aligned addressing and insert padding bytes into a data structure, reducing the memory savings. Also, internally, all arithmetic is 32 bit or greater, so there is no advantage in using smaller data types for local operations. I think it is perfectly reasonable to allow point classes to have heterogeneous coordinate types, but require them to cast to and from a homogeneous coordinate type at the interface between that object and the geometry library. In all three examples, we would make the coordinate type of the interface between the point and the library int and allow the 16 bit value to cast up to 32 bits when it is read into an algorithm and back down to 16 bits when it is written out.
--Michael Fawcett wrote:
I don't understand why the interface or algorithm cares whether it's homogeneous or heterogeneous.
The interface or the algorithm must necessarily be much more complex to accommodate heterogeneous coordinates. Moreover, you have a real problem with casting when you do arithmetic between heterogeneous coordinates in an algorithm.
No, it's not necessary. Surely the interface is just taking a type that conforms to a concept, and both a homogeneous and heterogeneous point type will conform.
x_coordinate_type x; y_coordinate_type y; ?_coordinate_type result = x+y;
Boost.Typeof. Boost.Units also handles this gracefully, but I'm not sure how they ended up solving it. Regardless, this is not something the end-user cares how it works, just that it works. It's doable by the library writer, so should be done.
You would probably have to register a type for casting to:
x_coordinate_type x; y_coordinate_type y; general_coordinate_type result = (general_coordinate_type)x + (general_coordinate_type)y;
Which is exactly what you do if you cast to the general coordinate type in the interface between the algorithm and the data type.
This type of conversion is handled automatically by the compiler in much simpler expressions: short s = 10; int i = 10; int result = s + i; // no warning, no casting necessary It should be handled the same way (i.e. not at all) in the underlying algorithms. vec3<int, short, int> isi(80000, 1000, 80000); vec3<int> i(80000, 80000, 80000); vec3<int> result = isi + i; // no warning, no casting necessary --Michael Fawcett

--Michael Fawcett
Boost.Typeof. Boost.Units also handles this gracefully, but I'm not sure how they ended up solving it. Regardless, this is not something the end-user cares how it works, just that it works. It's doable by the library writer, so should be done. ... This type of conversion is handled automatically by the compiler in much simpler expressions:
First off, relying on auto casting is a great way to write template code that only works when the template parameters are built-in types and doesn't even compile when user defined data types are used instead. Moreover, compiler always auto-casts built-in to user defined regardless of what your intention, and believe me, the user does care. The compiler doesn't always do the right thing by default. The rationale that something is possible, therefore it should be done is no rationale at all. Heterogeneous coordinates in the interfaces and internals of algorithms is incompatible with runtime accessors to coordinates and high order constructs. I don't see any value in carrying the individual coordinate data types through the interface into the algorithms. Why not cast them up front in the user-provided accessor functions and let the user who knows and cares what casting they want to get control that. Having one coordinate data type within the algorithms *does* make the implementation of those algorithms simpler and simpler is usually something worth pursuing. Luke

On Thu, Oct 9, 2008 at 3:36 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
--Michael Fawcett
Boost.Typeof. Boost.Units also handles this gracefully, but I'm not sure how they ended up solving it. Regardless, this is not something the end-user cares how it works, just that it works. It's doable by the library writer, so should be done. ... This type of conversion is handled automatically by the compiler in much simpler expressions:
First off, relying on auto casting is a great way to write template code that only works when the template parameters are built-in types and doesn't even compile when user defined data types are used instead. Moreover, compiler always auto-casts built-in to user defined regardless of what your intention, and believe me, the user does care. The compiler doesn't always do the right thing by default.
No, it doesn't only work with built-in types. It works with any type that has the correct operators defined. If the user does care, he can cast obviously. The algorithm shouldn't do the casting or require it though.
The rationale that something is possible, therefore it should be done is no rationale at all.
This is a gross misrepresentation of what I said. I said the end-user cares, *and* it's doable, so it should be done. You only took half of what I said.
Heterogeneous coordinates in the interfaces and internals of algorithms is incompatible with runtime accessors to coordinates and high order constructs.
Algorithms that need runtime indexing should require that concept.
I don't see any value in carrying the individual coordinate data types through the interface into the algorithms.
They're already present at compile-time. No extra work required.
Why not cast them up front in the user-provided accessor functions and let the user who knows and cares what casting they want to get control that. Having one coordinate data type within the algorithms *does* make the implementation of those algorithms simpler and simpler is usually something worth pursuing.
Because the casting and the temporaries it creates is not free, neither in terms of speed or storage. --Michael Fawcett

The rationale that something is possible, therefore it should be done is no rationale at all.
--Michael Fawcett wrote:
This is a gross misrepresentation of what I said. I said the end-user cares, *and* it's doable, so it should be done. You only took half of what I said.
You are right, I apologize.
Heterogeneous coordinates in the interfaces and internals of algorithms is incompatible with runtime accessors to coordinates and high order constructs.
Algorithms that need runtime indexing should require that concept.
All of my algorithms use runtime indexing, so that might slant my point of view slightly here.
I don't see any value in carrying the individual coordinate data types through the interface into the algorithms.
They're already present at compile-time. No extra work required.
Well, yes and no. If I want to have a template <int axis> coordinate_type get(const point_type& point); accessor function I need to go through a lot of work to rewrite it as template <int axis> typename coordinate_type<axis>::type get(const point_type& point); and the user needs to go through more work to provide both template<int axis> struct coordinate_type<axis> { typedef int type; } and struct general_coordinate_type { typedef int type; } and will generally be puzzled about why the geometry library is so complicated to use.
Why not cast them up front in the user-provided accessor functions and let the user who knows and cares what casting they want to get control that. Having one coordinate data type within the algorithms *does* make the implementation of those algorithms simpler and simpler is usually something worth pursuing.
Because the casting and the temporaries it creates is not free, neither in terms of speed or storage.
You will have to cast at some point to use the value, why is it more expensive to cast up front rather than at the last possible moment. Assuming the value is going to be used more than once by the algorithm it will be cast more than once. Better to factor that to the front. A smart compiler would actually do that for you. Futhermore, there has to be a teomporary at the interface because as I've just discussed with Brandon, there is the clear need to make things like polar points conform to Cartesian interfaces, which is incompatible with direct access to the data members. Since we already have to make a temporary we should cast at that point instead of having yet another temoporary and cast later on. Luke

AMDG Simonson, Lucanus J wrote:
You will have to cast at some point to use the value, why is it more expensive to cast up front rather than at the last possible moment.
This only applies to built in types. For user defined types, it may not be necessary to cast in order to operate on objects of two different types.
Assuming the value is going to be used more than once by the algorithm it will be cast more than once. Better to factor that to the front. A smart compiler would actually do that for you.
That's fine for built in types. What about user-defined types?
Futhermore, there has to be a teomporary at the interface because as I've just discussed with Brandon, there is the clear need to make things like polar points conform to Cartesian interfaces, which is incompatible with direct access to the data members. Since we already have to make a temporary we should cast at that point instead of having yet another temoporary and cast later on.
Maybe I'm misunderstanding something, but it seems to me that an algorithm that works with Cartesian coordinates will tend to be very inefficient if it has to convert to and from polar coordinates on the fly at every access. In Christ, Steven Watanabe

Steven Watanabe wrote:
Simonson, Lucanus J wrote:
Futhermore, there has to be a teomporary at the interface because as I've just discussed with Brandon, there is the clear need to make things like polar points conform to Cartesian interfaces, which is incompatible with direct access to the data members. Since we already have to make a temporary we should cast at that point instead of having yet another temoporary and cast later on.
Maybe I'm misunderstanding something, but it seems to me that an algorithm that works with Cartesian coordinates will tend to be very inefficient if it has to convert to and from polar coordinates on the fly at every access. I agree, that is too inefficient. And it is not necessary.
We (Bruno and I) are now about to publish the new preview 3 of the "geometry library" of which preview 2 was last March. But for now, in short. There is a point concept. The algorithms can take any point type which fulfils the point concept. Per point type a strategy can be registered using a traits class. That strategy will do the primitive calculations, such as distance from point to point, or distance from point to segment. An algorithm (e.g. simplify) will use the registered strategy and will do no casts or conversions, will calculate distances where necessary and result in a simplied line. Regards, Barend

be a teomporary at the interface because as I've just discussed with Brandon, there is the clear need to make things like polar points conform to Cartesian interfaces, which is incompatible with direct access to the data members. Since we already have to make a temporary we should cast at that point instead of having yet another temoporary and cast later on.
Maybe I'm misunderstanding something, but it seems to me that an algorithm that works with Cartesian coordinates will tend to be very inefficient if it has to convert to and from polar coordinates on the fly at every access.
Just to further this point : if we want to do geometry in spherical coordinates, then the point type cannot be homogeneous, especially if we wanted, e.g., to enforce dimensional correctness a la Boost.Units : cartesianPoint<si::length,si::length,si::length> vs. sphericalPoint<si::length,si::radians,si::radians> I understand that for most of the applications that people are considering, cartesian coordinates are dominant, but I can easily imagine a really cool game that accurately modeled relativistic effects to simulate space combat at near-light speed; to do that, you need a much more sophisticated and flexible system that is capable of dealing with Minkowski metrics rather than Euclidean... You could also have a game where players were constrained to motion on the surface of a 3-sphere...seems like there are a bunch of possibilities that thinking in a cartesian straitjacket precludes. Matthias

Matthias wrote:
Just to further this point : if we want to do geometry in spherical coordinates, then the point type cannot be homogeneous, especially if we wanted, e.g., to enforce dimensional correctness a la Boost.Units :
cartesianPoint<si::length,si::length,si::length>
vs.
sphericalPoint<si::length,si::radians,si::radians>
You are right, obviously for a spherical coordinate or polar coordinate concept we would use different types. That is one reason why Cartesian point would be a different concept and use different accessors than polar or spherical point. There is no need to have one point concept and set of traits that applies to all 2D, 3D and ND coordinate systems. Luke

Simonson, Lucanus J wrote:
You will have to cast at some point to use the value, why is it more expensive to cast up front rather than at the last possible moment.
Steven Watanabe
This only applies to built in types. For user defined types, it may not be necessary to cast in order to operate on objects of two different types. ... Maybe I'm misunderstanding something, but it seems to me that an algorithm that works with Cartesian coordinates will tend to be very inefficient if it has to convert to and from polar coordinates on the fly at every access.
User defined mixed type arithmetic operators seems like a fairly unlikely circumstance. I'm not sure introducing complexity of keeping track of the different data types into the algorithms is warranted by such a use case. The interface is more likely to be used between modules rather than internally. For instance, imagine one legacy module operates on and produces its output in the form of polar coordinate points. A generic geometry library module consumes Cartesian coordinate points. Rather than iterate over polar points and convert them to Cartesian points that are fed into the generic library they can be converted on the fly through the generic interface. Rather than use the user point type for internal computation I use my own point type because I know it is fast. If the output of the Cartesian algorithm needs to be fed back into the polar coordinate module it can write out polar coordinates directly. The interface is providing the ability to refactor the application code and abstract away data type conversion. There are other good reasons not to require direct access to a data type's member data, most notable among them is that people typically don't provide such access in the public interfaces of their legacy classes because they learned in school that was a bad thing. For this reason, there has to be a temporary created by the generic interface and it might as well cast it to whatever global coordinate type the user deems best at that time. Luke

On Thu, Oct 9, 2008 at 4:17 PM, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
I don't see any value in carrying the individual coordinate data types through the interface into the algorithms.
They're already present at compile-time. No extra work required.
Well, yes and no. If I want to have a
template <int axis> coordinate_type get(const point_type& point);
accessor function I need to go through a lot of work to rewrite it as
template <int axis> typename coordinate_type<axis>::type get(const point_type& point);
and the user needs to go through more work to provide both
template<int axis> struct coordinate_type<axis> { typedef int type; }
and
struct general_coordinate_type { typedef int type; }
and will generally be puzzled about why the geometry library is so complicated to use.
Well, get is already written for Boost.Tuple and Boost.Fusion sequences. This works with built-ins, arrays, and user-defined types. More to the point, if you accept a type that conforms to a Fusion sequence, you have the types at compile time, and I exaggerated a little when I said "no extra work" - for user defined types the user *will* need to adapt his struct using a Fusion macro, but this is a one-time thing, and I think it's a one-liner.
You will have to cast at some point to use the value, why is it more expensive to cast up front rather than at the last possible moment. Assuming the value is going to be used more than once by the algorithm it will be cast more than once. Better to factor that to the front. A smart compiler would actually do that for you. Futhermore, there has to be a teomporary at the interface because as I've just discussed with Brandon, there is the clear need to make things like polar points conform to Cartesian interfaces, which is incompatible with direct access to the data members. Since we already have to make a temporary we should cast at that point instead of having yet another temoporary and cast later on.
The return value temporary is present in both cases, unless the compiler elides that. I was talking about parameters. What if the function takes 5 parameters? I have to make 5 unnecessary temporaries just to call it. --Michael Fawcett

Hello,
Ok, I take your point. I wasn't suggesting that a library not be able to handle more than 3 dimensions. Rather I would suggest that the population of developers who have such requirements must be very small. Considering that generalization on dimension is not trivial for many algorithms, this leads me to the conclusion that the specific algorithms will decide the requirements on their inputs. This is why I chose a traits based interface for the geometry library I've been implementing. There's nothing to stop one from creating a multi-dimensional Cartesian/polar/parametric access traits interface for any algorithm which requires it. So I suppose I agree, the geometry library should not constrain on the number of dimensions. But I don't agree with full agnosticism (to mean agnosticism of coordinate frame as well.) There should be a way to differentiate points further to align with the concrete traits of the dimension they describe.
This sounds reasonable. Certainly there are many 2D/3D algorithms for which there is little point in generalizing to higher dimensions.
But it's not a reason to limit the point concept to 2D/3D. Once you have your points defined dimension-agnosticly (which was really not a big deal compared to some other problems we had when defining our points...), nothing prevents you from writing an algorithm that will only work with a precise dimension because it doesn't make sense for the other ones. Let's take the cross product. Every 3D programmer use it, and it has a sense only in 3D even if it can be generalized with wedge product. Well your cross_product() function will just have to do a BOOST_STATIC_ASSERT to ensure your passing it a 3D point, and will let the rest of the algorithms be dimension agnostic. Bruno

Patrick Mihelich wrote:
I can't offhand think of a use case where the dimensions are of heterogeneous type
In GPS data, I store altitude to the nearest metre in an int16_t, and altitude and longitude in degrees in 32-bit fixed-point types. Although longitude needs one more bit than latitude I tend to use the same type for both, so my 2d algorithms are homogeneous. But anything that also involves altitude needs to be heterogeneous. Phil.

Wanting to code a simple 3D virtual world, I looked into the recent spatial indexes SOC work, which depends on the proposed geometry library.
I have to admit I was fairly disappointed. So I'm going to put some critics, some quite harsh maybe, along with some ramblings about what I would have liked to have. Note that I am no geometry or space indexing expert, so what I say may be perfectly silly.
[...] Mathias, Thanks for your interest in the SpatialIndex GSoC project. I agree about that the indexes lack a lot of features and they're very limited (for example, they are only for 2D now). But remember that this is the work that we have done in a bit more than two months from the ground up (this time include reading the papers about the indexes, writing little documentation, writing unit tests, finding data to test, integrate with the geometry proposal, etc), so it's still has a long road to be usable (as I say in my email). Some remarks: - You are able to find all elements within a certain range, if you mean by range a box. Or you can use a box and then refine the result filtering out the elements that are not in your area. - About "Being able to index arbitrary objects and not just points". You can insert the envelopes of arbitrary geometries and a pointer to the real object as the data. Maybe it's not the best interface to do this, but a lot of libraries (i.e. GEOS) use this approach. Since GSoC ended I didn't have the time to do much more than minor changes. I hope having more time in the following weeks/months. Anyway, your critics are very good and they will be part of my ToDo list from now on. Of course, any contribution is welcome too. Thanks again! -- federico

Federico J. Fernández wrote:
- You are able to find all elements within a certain range, if you mean by range a box. Or you can use a box and then refine the result filtering out the elements that are not in your area.
I meant it in a more abstract way. Usually a range is defined by a circle (rather than a box), but it could also be an area defined arbitrarily, so why not make it work with any geometrical object, using geometry::intersect and geometry::within in place of your overlap function? Maybe it would be more practical if geometry provided an overlap (not-completely-within) test function. Maybe having a search where you search for elements that are completely within an area is interesting too? In that case you might as well parameterize the criteria.
- About "Being able to index arbitrary objects and not just points". You can insert the envelopes of arbitrary geometries and a pointer to the real object as the data. Maybe it's not the best interface to do this, but a lot of libraries (i.e. GEOS) use this approach.
Which I can only do with R-trees and not Quadtrees, if I understand correctly. It will still lead to approximations when computing distances and the like though, the actual object may be more far away than its minimal axis-aligned bounding box and thus the nearest neighbor search might return wrong results.

- You are able to find all elements within a certain range, if you mean by
range a box. Or you can use a box and then refine the result filtering out the elements that are not in your area.
I meant it in a more abstract way. Usually a range is defined by a circle (rather than a box), but it could also be an area defined arbitrarily, so why not make it work with any geometrical object, using geometry::intersect and geometry::within in place of your overlap function? Maybe it would be more practical if geometry provided an overlap (not-completely-within) test function. Maybe having a search where you search for elements that are completely within an area is interesting too? In that case you might as well parameterize the criteria.
I understand. That's the plan, but it's not yet ready, I just wanted to point out that there is some basic functionality about this. insert the envelopes of arbitrary geometries and a pointer to the real
object as the data. Maybe it's not the best interface to do this, but a lot of libraries (i.e. GEOS) use this approach.
- About "Being able to index arbitrary objects and not just points". You can
Which I can only do with R-trees and not Quadtrees, if I understand correctly.
Yes.
It will still lead to approximations when computing distances and the like though, the actual object may be more far away than its minimal axis-aligned bounding box and thus the nearest neighbor search might return wrong results.
Yes, but we should be careful about the performance penalty of having arbitrary geometries in the index and using different algorithms for distances and other operations. Cheers, -- federico

-------------------------------------------------- From: "Patrick Mihelich" <patrick.mihelich@gmail.com> Sent: Thursday, October 09, 2008 6:23 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
On Thu, Oct 9, 2008 at 1:48 AM, Bruno Lalande <bruno.lalande@gmail.com>wrote:
Stop me if I'm wrong but what you're discussing now is how to implement the point class provided by an ideal geometry library, right? If so, I think it's not the most important concern since, as I've already said, the primary goal is not to provide a point class but to define how points are accessed and defined in general. I think that, by talking about tuples, Brandon was rather pointing out that points should be accessible the same way as tuples: get<N>(p). Which is quite a general consensus now, IMO.
OK, perhaps I misinterpreted Brandon's comment. I have been getting caught up on the March discussions, so apologies if I rehash old debates. If he was simply advocating compile-time indexing via get<N>(p) or similar syntax, then I completely agree. This should be required by the Point concept (or Coordinate concept? I do not quite understand the distinction).
On the other hand, I'm worried to see that the geometry::point implementation (at least the version included with SpatialIndexes) supports only compile-time indexing. I think we must also have a RuntimeIndexable concept, to which a particular point type may or may not conform. RuntimeIndexable would require an indexing operator[], so it can only be modeled by homogeneous points. Iterators via begin() and end() could be useful as well. I have a couple of reasons for wanting this, as usual motivated by high dimensional spaces...
Actually in the library I've been working on the definition of the point type is of lesser importance. All of my algorithms are designed to access point coordinates via a traits specialization which details how the point's coordinates are accessed. It works like this (please excuse email formatting :) I saw someone mention a fusion vector so I'll use that. First I have a macro which makes for an easy way to define the basic traits of the point: //! Macro for point type which does not have embedded traits - User Defined Points #define BOOST_DEFINE_USER_POINT_TRAITS( Point, Coordinate, Dimension ) \ template <> \ struct point_traits< Point > \ { \ BOOST_STATIC_ASSERT( Dimension > 0 ); \ typedef Point point_type; \ typedef dimension_traits<Dimension> dimension_type; \ typedef coordinate_traits<Coordinate>::coordinate_type coordinate_type;\ }; So we need: BOOST_DEFINE_USER_POINT_TRAITS(fusion::vector2<double, double>,double,2); BOOST_DEFINE_USER_POINT_TRAITS(fusion::vector2<int, int>,int,2); Next you need to specialize the access traits for the coordinate framework which the point implicitly describes. //! Access traits for points in Cartesian coordinates template <typename Coordinate> struct cartesian_access_traits<fusion::vector2<Coordinate, Coordinate> > { typedef Point point_type; typedef typename point_traits<point_type>::coordinate_type coordinate_type; typedef typename point_traits<point_type>::dimension_type dimension_type; template <unsigned int D> static inline coordinate_type get(const point_type& p) { BOOST_MPL_ASSERT_MSG ( ( dimension_traits< D >::value >= 0 && dimension_traits< D >::value < dimension_type::value ) , POINT_GET_CALLED_WITH_INDEX_OUT_OF_BOUNDS , ( dimension_traits< D > ) ); return fusion::at_c< D >::( p ); } //! factory traits. static inline point_type construct(const coordinate_type& x, const coordinate_type& y) { return point_type( x, y ); } }; Then the type may be passed directly into algorithms: //! Function to find the cross product between two vectors formed by the specified three points which share an endpoint at A. template <typename Point> point_traits<Point>::coordinate_type cross_product( const Point& A, const Point& B, const Point& C ) { typedef cartesian_access_traits<Point > access_traits; boost::function_requires<Point2DConcept<Point> >(); boost::function_requires < CartesianCoordinateAccessorConcept<access_traits>
();
typename point_traits< Point >::coordinate_type cross = ( ( access_traits::get<0>( B ) - access_traits::get<0>( A ) ) * ( access_traits::get<1>( C ) - access_traits::get<1>( A ) ) - ( access_traits::get<0>( C ) - access_traits::get<0>( A ) ) * ( access_traits::get<1>( B ) - access_traits::get<1>( A ) ) ); return cross; } //! Orientation test to check if point C is left, collinear, or right of the line formed by A-B. enum orientation_type { oriented_right = -1, oriented_collinear = 0, oriented_left = 1 }; template <typename PrecisionPolicy, typename Point> orientation_type get_orientation( const Point& A, const Point& B, const Point& C, const PrecisionPolicy& compare ) { typedef typename point_traits<Point>::coordinate_type coordinate_type; coordinate_type cross = cross_product( A, B, C ); coordinate_type zero = 0; if( compare.less_than( cross, zero ) ) { return oriented_right; } else if( compare.greater_than( cross, zero ) ) { return oriented_left; } else { return oriented_collinear; } } So you could do the following: typedef fusion::vector2<double,double> point; point p1( 0., 0. ); point p2( 1., 0. ); point p3( 2., 0. ); orientation_type oType = get_orientation( p1, p2, p3, fraction_tolerance_comparison_policy<double>(1e-10) ); assert( oType == oriented_collinear ); Or with ints: typedef fusion::vector2<int,int> point; point p1( 0, 0 ); point p2( 1, 1 ); point p3( 2, 2 ); orientation_type oType = get_orientation( p1, p2, p3, direct_comparison_policy() ); assert( oType == oriented_collinear ); When I was talking about access like tuple, I meant using compile time indexing via the access traits. One of the main goals of my library is to facilitate use with legacy geometry code (proprietary in my case) simply by specializing access traits. Using access traits results in little constraint on the interface of the underlying point type. This frees one to develop algorithms for a public library while maintaining the ability to use said algorithms with the legacy code (without worrying about translating into a boost::point type.) The same idea is used segments and polylines/polygons.
First, some algorithms require runtime indexing. kd-trees are an interesting example. When constructing a kd-tree over a data set, we need some heuristic to decide which dimension to split on at each tree node. In 2 or 3 dimensions, usually you simply cycle through the dimensions in order, for which in principle only compile-time indexing is necessary. But this heuristic is not so appropriate in high dimensions; then it is usual to split each tree node on the dimension with the greatest variance in the data, which of course is unknown at compile time.
Second, there are efficiency and code size concerns. Consider the Euclidean distance function. For 2D/3D points, it is nice to use compile-time indexing and metaprogramming to guarantee tight, loop-less code. For high dimensional points, however, beyond some dimension (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer desirable, and we would rather just do the runtime iteration. If the points are RuntimeIndexable, we can choose an appropriate distance algorithm at compile time depending on the points' dimensionality.
I presume the reason one would not want loop-less code at high dimensions is due to bloat? I do not understand why runtime looping would be faster in this case. It would seem that even in the largest cases the bloat would still be small compared to memory sizes today. Runtime indexing can certainly be accommodated using a tag dispatch on the access traits. I think more research needs to be done to quantify the differences between the two to see if it is worth the maintenance overhead. Cheers, Brandon
I do think you are on the right track, and I look forward to seeing preview 3.
Cheers, Patrick _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Actually in the library I've been working on the definition of the point type is of lesser importance. All of my algorithms are designed to access point coordinates via a traits specialization which details how the point's coordinates are accessed. It works like this (please excuse email formatting :)
As said several times by several people including me, it's exactly what everybody out there is doing. There's nothing fundamentally new in what you propose and in the code you just pasted. That's precisely why I've just said (like you) that the definition of a point type in not really important. Bruno

Brandon Kohn wrote:
//! Function to find the cross product between two vectors formed by the specified three points which share an endpoint at A. template <typename Point> point_traits<Point>::coordinate_type cross_product( const Point& A,
const Point& B,
const Point& C )
Isn't the result of the cross product a vector/point?
{ typedef cartesian_access_traits<Point > access_traits; boost::function_requires<Point2DConcept<Point> >();
Isn't the cross product 3D-only? Anyway, wouldn't it be better to use SFINAE so that you can overload the algorithm to make it work with other concepts?
When I was talking about access like tuple, I meant using compile time indexing via the access traits. One of the main goals of my library is to facilitate use with legacy geometry code (proprietary in my case) simply by specializing access traits. Using access traits results in little constraint on the interface of the underlying point type.
Are you aware of the Fusion sequence concepts? Any type can be made into a fusion sequence non-intrusively. So there is no need to recreate the get<N> machinery.

Isn't the cross product 3D-only?
The 3-dimensional cross product can be extended to n-dimensions. The n-dimensional cross-product takes (n-1) n-vectors as arguments and returns an n-vector. If the argument vectors are linearly dependent, then the return vector is 0. If the argument vectors are linearly independent, then the return vector is the unique n-vector such that 1. it is perpendicular to all the argument vectors 2. its length is equal the (n-1)-dimensional volume of the parallellepiped spanned by the argument vectors 3. the argument vectors and the return vector form a positively oriented system. --Johan

On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
Second, there are efficiency and code size concerns. Consider the Euclidean
distance function. For 2D/3D points, it is nice to use compile-time indexing and metaprogramming to guarantee tight, loop-less code. For high dimensional points, however, beyond some dimension (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer desirable, and we would rather just do the runtime iteration. If the points are RuntimeIndexable, we can choose an appropriate distance algorithm at compile time depending on the points' dimensionality.
I presume the reason one would not want loop-less code at high dimensions is due to bloat? I do not understand why runtime looping would be faster in this case. It would seem that even in the largest cases the bloat would still be small compared to memory sizes today. Runtime indexing can certainly be accommodated using a tag dispatch on the access traits. I think more research needs to be done to quantify the differences between the two to see if it is worth the maintenance overhead.
Sure, I need to back up my claim. The fundamental problem with loop-less code at high dimensions is bloat, yes. Remember that bloat also affects efficiency. Suppose we're calculating distances from one point to a set of other points. If the size of the distance function makes the inner loop code larger than the L1 instruction cache, we take a performance hit. I'm attaching a simple benchmark that calculates the distance between two points some number of times, using both compile-time and runtime access. Note that this is awfully friendly to the compile-time access code, as we're doing nothing else in the inner loop. Here are some results on my machine (Intel Core Duo 2.4 GHz, gcc 4.2.3): $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 2.000000 Compile-time access: 5.460000s Runtime access: 9.070000s $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 3.000000 Compile-time access: 7.980000s Runtime access: 10.970000s So for 2D/3D we see the benefit to using compile-time access, although it wouldn't surprise me if the gap could be closed by better compile-flag twiddling. Let's see what happens at high dimensions. Let's also add in a partially unrolled runtime access version that operates 4 elements at a time (ideally the compiler would take care of this). $ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o runtime_test mihelich@adt:~/projects/stuff$ ./runtime_test Distance = 36.000000 Compile-time access: 1.620000s Runtime access: 1.280000s Runtime access (unrolled): 0.740000s $ g++ -I /u/mihelich/packages/boost-spacial-index -I /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 128.000000 Compile-time access: 5.960000s Runtime access: 4.900000s Runtime access (unrolled): 3.160000s Here runtime access is faster. 128 dimensions may sound excessive, but I did not choose these numbers arbitrarily. In vision applications it is very common to represent image patches by SIFT descriptors, which are points in 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but runtime access is still faster. In either case partial unrolling is a big win. In other applications we may go to even higher dimensions... $ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 1000.000000 Compile-time access: 8.150000s Runtime access: 4.090000s Runtime access (unrolled): 2.460000s Now runtime access is much faster. And once we get up this far I have to crank up the template depth just to calculate a distance. 1000 dimensions is high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky Survey have 4000 dimensions, for example. I understand if the authors do not want to concern themselves much with optimizing for non 2D/3D cases, as users who really care can always substitute their own highly optimized distance function using SSE, etc. But it would be nice to have something decent out of the box, and there is plenty of reason for a RuntimeIndexable concept. If the geometry library is to be used as a base for spatial index and perhaps clustering libraries, some consideration has to be taken for high dimensions. Cheers, Patrick

Mathias Gaunard a écrit :
Isn't GCC supposed to implement loop vectorization? Did you add in there SIMD instructions or not?
"Supposed" is the keyword. gcc 4.4 has a -Ox optimization that tries to SIMD code but when the basic block starts to get too long or too complex, it stops. For loop unrolling, it should be done automagically if loop bounds are statically known, if not, do it yourself. -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

Mathias Gaunard a écrit :
If the loops are fully unrolled, I don't see the difference between "static" and "dynamic" indexing.
What I mean is that it seems from experimental results that : for(int i=0;i<16;++i) // code is more often unrolled than for(int i = foo();i<bar();++i) // code Concerning your remark : "While access by index is theoretically slower than pointer increments, I'm pretty sure any compiler optimizes the former to the latter. " you're perfectly right. -- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35

-------------------------------------------------- From: "Bruno Lalande" <bruno.lalande@gmail.com> Sent: Thursday, October 09, 2008 9:51 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
Actually in the library I've been working on the definition of the point type is of lesser importance. All of my algorithms are designed to access point coordinates via a traits specialization which details how the point's coordinates are accessed. It works like this (please excuse email formatting :)
As said several times by several people including me, it's exactly what everybody out there is doing. There's nothing fundamentally new in what you propose and in the code you just pasted.
The library I've written is the only one in the vault which uses this mechanism. Your library in the vault directly accesses the interfaces for the underlying points, segments and polygons: Point: (from intersection_linestring.hpp) bool clip_segment( const box<PB>& b, segment<PS>& s, bool& sp1_clipped, bool& sp2_clipped) const { typedef typename select_type_traits < typename PB::coordinate_type, typename PS::coordinate_type
::type T; double t1 = 0; double t2 = 1; T dx = s.second.x() - s.first.x();///Segment type directly accessed followed by call to point interfaces .x()/.y() T dy = s.second.y() - s.first.y(); T p1 = -dx; T p2 = dx; T p3 = -dy; T p4 = dy; T q1 = s.first.x() - b.min().x(); T q2 = b.max().x() - s.first.x(); T q3 = s.first.y() - b.min().y(); T q4 = b.max().y() - s.first.y(); ...
Polygon: /*! \brief Centroid of a polygon. \note Because outer ring is clockwise, inners are counter clockwise, triangle approach is OK and works for polygons with rings. */ template<typename P, typename Y, typename S> inline void centroid_polygon(const Y& poly, P& c, const S& strategy) { if (ring_ok(poly.outer(), c)) { typename S::state_type state; loop(poly.outer(), strategy, state);//These calls directly access the specified type in Y. for (typename Y::inner_container_type::const_iterator it = poly.inners().begin(); it != poly.inners().end(); it++) { loop(*it, strategy, state); } state.centroid(c); } } This is very different from what I have put up on the vault. I do not understand why you don't see this. This code requires the user to either wrap their types in an adaptor or translate them into a type which supports the interface required by your algorithm implementations. I have looked through a lot of this code and it just doesn't do what I am describing. Your algorithms here require your points to have x() and y() defined. The segment must have first/second for the point access, and the polygon has to have outer()/inner().
That's precisely why I've just said (like you) that the definition of a point type in not really important.
I agree that it isn't important, but I don't see how your algorithms will work with the fusion example I have put up.
Bruno
Sincerely, Brandon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi Brandon, Brandon Kohn wrote:
The library I've written is the only one in the vault which uses this mechanism. Your library in the vault directly accesses the interfaces for the underlying points, segments and polygons:
Our library, as it is currently in the boost vault folder, is uploaded by Federico for his GSoC project (to create a compiling source code set). It is not managed by Bruno and me. You're looking at a random piece of code of an intermediary stage, something soon after preview 2. The new version will be published very soon. It generally refers to points as Bruno has mentioned, in the tuple way. However, there is also support for xy-points and latlong-points. Some algorithms might (still) refer to those types of points. Regards, Barend

The library I've written is the only one in the vault which uses this mechanism. Your library in the vault directly accesses the interfaces for the underlying points, segments and polygons:
But as we said, this version of the library is deprecated by the incoming one. You'll have a very good idea of the work made in the meanwhile by browsing the archives, as Luke suggested. Bruno

-------------------------------------------------- From: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org> Sent: Thursday, October 09, 2008 11:19 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
Brandon Kohn wrote:
//! Function to find the cross product between two vectors formed by the specified three points which share an endpoint at A. template <typename Point> point_traits<Point>::coordinate_type cross_product( const Point& A,
const Point& B,
const Point& C )
Isn't the result of the cross product a vector/point?
{ typedef cartesian_access_traits<Point > access_traits; boost::function_requires<Point2DConcept<Point> >();
Isn't the cross product 3D-only?
No the cross product is not 3D only. It can be used to calculate the signed area of the parallelogram formed by 2D vectors (the length of the vector which *is* normal to the 2D plane has this value.) This is how it is used here to calculate the orientation using right hand rule convention. This particular implementation is just a 2D specialized version. I have another version for 3D. Higher dimensions could be written using an indexed access scheme. This is all not the point however. The point is that the mechanisms allow one to write the algorithms as they see fit for whatever underlying point type. Perhaps this notion is now a dead horse that has been repeatedly beaten.
Anyway, wouldn't it be better to use SFINAE so that you can overload the algorithm to make it work with other concepts?
This was done for clarity of my point. The actual version is implemented like this: template <typename Point> typename boost::enable_if< boost::is_same< typename point_traits<Point>::dimension_type, dimension_traits<2> >, typename point_traits<Point>::coordinate_type >::type cross_product( const Point& A, const Point& B, const Point& C ) { typedef cartesian_access_traits< Point > access_traits; boost::function_requires< Point2DConcept< Point > >(); boost::function_requires< CartesianCoordinateAccessorConcept< access_traits > >(); typename point_traits< Point >::coordinate_type cross = ( ( access_traits::get_x( B ) - access_traits::get_x( A ) ) * ( access_traits::get_y( C ) - access_traits::get_y( A ) ) - ( access_traits::get_x( C ) - access_traits::get_x( A ) ) * ( access_traits::get_y( B ) - access_traits::get_y( A ) ) ); return cross; }
When I was talking about access like tuple, I meant using compile time indexing via the access traits. One of the main goals of my library is to facilitate use with legacy geometry code (proprietary in my case) simply by specializing access traits. Using access traits results in little constraint on the interface of the underlying point type.
Are you aware of the Fusion sequence concepts? Any type can be made into a fusion sequence non-intrusively.
So there is no need to recreate the get<N> machinery.
There is a need because the access_traits are what define the interface and the coordinate framework, not the point type. This is just an example of how it would be done for that fusion vector type. The access traits would be specialized differently for other types. struct point { double x, double y }; for example would be specialized differently (and more complexly for the access traits class). If you are saying that fusion can be used to make a struct like this one into a sequence with get<I>.. then it seems there is nothing stopping one from implementing the access traits in terms of that. To reiterate, the point isn't to say that fusion::vector3 makes a great concept for how one should define points or that one should constraint point types to use the get< Index >() interface. It is rather to say that the access_traits mechanism allows algorithm writers to accommodate both formats simply by specializing the access traits for whatever point type is used.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-------------------------------------------------- From: "Barend Gehrels" <barend@geodan.nl> Sent: Thursday, October 09, 2008 10:39 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
Hi Brandon,
Brandon Kohn wrote:
The library I've written is the only one in the vault which uses this mechanism. Your library in the vault directly accesses the interfaces for the underlying points, segments and polygons:
Our library, as it is currently in the boost vault folder, is uploaded by Federico for his GSoC project (to create a compiling source code set). It is not managed by Bruno and me. You're looking at a random piece of code of an intermediary stage, something soon after preview 2. The new version will be published very soon.
It generally refers to points as Bruno has mentioned, in the tuple way. However, there is also support for xy-points and latlong-points. Some algorithms might (still) refer to those types of points.
Regards, Barend
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-------------------------------------------------- From: "Bruno Lalande" <bruno.lalande@gmail.com> Sent: Thursday, October 09, 2008 10:41 AM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
The library I've written is the only one in the vault which uses this mechanism. Your library in the vault directly accesses the interfaces for the underlying points, segments and polygons:
But as we said, this version of the library is deprecated by the incoming one. You'll have a very good idea of the work made in the meanwhile by browsing the archives, as Luke suggested.
Bruno _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Bruno, Berand, I posted this to clarify how I implemented the ideas for Patrick or anyone else who was confused by my post. I think it is relevant because a) there are many long threads in the archive on geometry which are difficult to follow (so the points may be hard to uncover.) and b) I couldn't find any where someone has committed to this exact mechanism. Cheers, Brandon

I posted this to clarify how I implemented the ideas for Patrick or anyone else who was confused by my post. I think it is relevant because a)
Brandon wrote: there
are many long threads in the archive on geometry which are difficult to
follow (so the points may be hard to uncover.) and b) I couldn't find any where someone has committed to this exact mechanism.
You have access to the boost svn sandbox, which has my recent code. My library in the vault is fairly out of date. Below I am registering legacy point and polygon classes with my geometry library so that I know their conceptual type for tag dispatching purposes: template <> struct geometry_concept<CCGCoordPoint> { typedef point_concept type; }; template <> struct geometry_concept<CPolygonQuery> { typedef polygon_concept type; }; And here I specify the traits of the legacy point type such that I know what coordinate type it uses and how to access its data. I have runtime accessor instead of compile time because we like to use orientation as runtime data instead of compile time constant in our code. Construct is not strictly necessary because I could use default construction and set x and y. template <> struct point_traits<CCGCoordPoint> { typedef intDC coordinate_type; static inline coordinate_type get(const CCGCoordPoint& point, orientation_2d orient) { return point.Coord()[orient.to_int()]; } static inline void set(CCGCoordPoint& point, orientation_2d orient, coordinate_type value) { point.Coord()[orient.to_int()] = value; } static inline CCGCoordPoint construct(coordinate_type x_value, coordinate_type y_value) { return CCGCoordPoint(x_value, y_value); } }; And here I'm defining the traits for the legacy polygon which uses the legacy point as part of its interface. Specifically, pointer to legacy point type models an iterator semantic of objects that are conceptually points. template <> struct polygon_traits<CPolygonQuery> { typedef intDC coordinate_type; typedef const CCGCoordPoint* iterator_type; /// Get the begin iterator static inline iterator_type begin(const CPolygonQuery& t) { return t.Data(); } /// Get the end iterator static inline iterator_type end(const CPolygonQuery& t) { return t.Data() + t.GetSize(); } /// Set the data of a polygon with the unique coordinates in an iterator, starting with an x template <typename iT> static inline CPolygonQuery& set(CPolygonQuery& t, iT input_begin, iT input_end) { std::vector<intDC> pts; for(; input_begin != input_end; ++input_begin) { pts.push_back(point_concept::get((*input_begin), gtl::HORIZONTAL)); pts.push_back(point_concept::get((*input_begin), gtl::VERTICAL)); } t.SetPolygon(t.GetLayer(), pts.size()/2, &(pts[0])); return t; } /// Get the number of sides of the polygon static inline unsigned int size(const CPolygonQuery& t) { return t.GetSize(); } /// Get the winding direction of the polygon static inline winding_direction winding(const CPolygonQuery& t) { return unknown_winding; } }; I did the interface this way for exactly the same reason that you did. I have to interact with a large number of legacy geometry type systems in proprietary code. By the way, I have every respect for what you have done, if there is anyone in the world who can really appreciate your library you have found them here. Luke

-------------------------------------------------- From: "Simonson, Lucanus J" <lucanus.j.simonson@intel.com> Sent: Thursday, October 09, 2008 1:00 PM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
I posted this to clarify how I implemented the ideas for Patrick or anyone else who was confused by my post. I think it is relevant because a)
Brandon wrote: there
are many long threads in the archive on geometry which are difficult to
follow (so the points may be hard to uncover.) and b) I couldn't find any where someone has committed to this exact mechanism.
You have access to the boost svn sandbox, which has my recent code. My library in the vault is fairly out of date. Below I am registering legacy point and polygon classes with my geometry library so that I know their conceptual type for tag dispatching purposes:
template <> struct geometry_concept<CCGCoordPoint> { typedef point_concept type; };
template <> struct geometry_concept<CPolygonQuery> { typedef polygon_concept type; };
And here I specify the traits of the legacy point type such that I know what coordinate type it uses and how to access its data. I have runtime accessor instead of compile time because we like to use orientation as runtime data instead of compile time constant in our code. Construct is not strictly necessary because I could use default construction and set x and y.
template <> struct point_traits<CCGCoordPoint> { typedef intDC coordinate_type;
static inline coordinate_type get(const CCGCoordPoint& point, orientation_2d orient) { return point.Coord()[orient.to_int()]; } static inline void set(CCGCoordPoint& point, orientation_2d orient, coordinate_type value) { point.Coord()[orient.to_int()] = value; } static inline CCGCoordPoint construct(coordinate_type x_value, coordinate_type y_value) { return CCGCoordPoint(x_value, y_value); } };
And here I'm defining the traits for the legacy polygon which uses the legacy point as part of its interface. Specifically, pointer to legacy point type models an iterator semantic of objects that are conceptually points.
template <> struct polygon_traits<CPolygonQuery> { typedef intDC coordinate_type; typedef const CCGCoordPoint* iterator_type;
/// Get the begin iterator static inline iterator_type begin(const CPolygonQuery& t) { return t.Data(); }
/// Get the end iterator static inline iterator_type end(const CPolygonQuery& t) { return t.Data() + t.GetSize(); }
/// Set the data of a polygon with the unique coordinates in an iterator, starting with an x template <typename iT> static inline CPolygonQuery& set(CPolygonQuery& t, iT input_begin, iT input_end) { std::vector<intDC> pts; for(; input_begin != input_end; ++input_begin) { pts.push_back(point_concept::get((*input_begin), gtl::HORIZONTAL)); pts.push_back(point_concept::get((*input_begin), gtl::VERTICAL)); } t.SetPolygon(t.GetLayer(), pts.size()/2, &(pts[0])); return t; }
/// Get the number of sides of the polygon static inline unsigned int size(const CPolygonQuery& t) { return t.GetSize(); }
/// Get the winding direction of the polygon static inline winding_direction winding(const CPolygonQuery& t) { return unknown_winding; } };
I did the interface this way for exactly the same reason that you did. I have to interact with a large number of legacy geometry type systems in proprietary code. By the way, I have every respect for what you have done, if there is anyone in the world who can really appreciate your library you have found them here.
Luke
Hi Luke, I agree that this is certainly similar to what I have and that it is very good. There is however a distinction in that the access traits in my design are separate from the general point traits. I think that this is an important distinction. When you strip a basic point in N dimensions from any nomenclature, you essentially have a data structure that can be describing a location in any coordinate frame of dimension N. It is only through choosing a coordinate frame that the dimensions of the point takes on meaning (well, strictly speaking this isn't true... I have a text that speaks quite highly of coordinate free geometry..). This is why I have created access traits which are aligned with a particular coordinate frame. It gives me the means to have a legacy point type (which uses a particular framework, polar coordinates) which can then automatically work in an algorithm which requires Cartesian coordinates. By specializing an access traits type for the legacy point, I can make all required transformations from r, theta to x,y without doing any explicit translations to the underlying point type. The result is that I can put my polar points in and still use all these great algorithms which require Cartesian coordinates. Also any outputs or modifications to inputs are automatically written back in polar coordinates due to the same interface (via the construct method of the access traits.) Cheers, Brandon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I agree that this is certainly similar to what I have and that it is very good. There is however a distinction in that the access traits in my design are separate from the general point traits. I think that this is an important distinction. When you strip a basic point in N dimensions from >any nomenclature, you essentially have a data structure that can be describing >a location in any coordinate frame of dimension N. It is only through choosing a coordinate frame that the dimensions of the point takes on meaning (well, strictly speaking this isn't true... I have a text that speaks quite highly of coordinate free geometry..). This is why I have created access
Brandon wrote: traits
which are aligned with a particular coordinate frame. It gives me the means to have a legacy point type (which uses a particular framework, polar coordinates) which can then automatically work in an algorithm which requires Cartesian coordinates. By specializing an access traits type for the legacy point, I can make all required transformations from r, theta to x,y without doing any explicit translations to the underlying point type. The result is that I can put my polar points in and still use all these
great algorithms which require Cartesian coordinates. Also any outputs or modifications to inputs are automatically written back in polar coordinates due to the same interface (via the construct method of the access traits.)
My point traits above are implicitly for 2D Cartesian points. You could specialize it to access a 2D polar coordinate point data type and it would work similar to what you describe. I don't provide direct access to the underlying x and y values, so they can be converted to and from whatever the point does store in the accessor functions. I don't have an N dimensional point concept in my library because I didn't see a need for such. If I did, I might very well have come up with a design similar to yours or what Barend is going to show us soon. I'm not opposed to n-dimensional points, I just don't have any use for them, personally. Luke

-------------------------------------------------- From: "Simonson, Lucanus J" <lucanus.j.simonson@intel.com> Sent: Thursday, October 09, 2008 2:36 PM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
--Michael Fawcett
Boost.Typeof. Boost.Units also handles this gracefully, but I'm not sure how they ended up solving it. Regardless, this is not something the end-user cares how it works, just that it works. It's doable by the library writer, so should be done. ... This type of conversion is handled automatically by the compiler in much simpler expressions:
First off, relying on auto casting is a great way to write template code that only works when the template parameters are built-in types and doesn't even compile when user defined data types are used instead. Moreover, compiler always auto-casts built-in to user defined regardless of what your intention, and believe me, the user does care. The compiler doesn't always do the right thing by default.
The rationale that something is possible, therefore it should be done is no rationale at all. Heterogeneous coordinates in the interfaces and internals of algorithms is incompatible with runtime accessors to coordinates and high order constructs. I don't see any value in carrying the individual coordinate data types through the interface into the algorithms. Why not cast them up front in the user-provided accessor functions and let the user who knows and cares what casting they want to get control that. Having one coordinate data type within the algorithms *does* make the implementation of those algorithms simpler and simpler is usually something worth pursuing.
Luke
Definitely. This is the solution I would use to support this. I believe the complexity of having N dimensional heterogeneity in any algorithm isn't worth pursuing for a general case library. Also there is nothing to stop a sub-community from defining traits based access in a similar vein to the general case which does support a separate type for each coordinate dimension (and the algorithms to support it). Efforts like these could certainly coexist side by side without interfering with one another and it's likely that the general case can be used as you say. As for the general case, there is already enough on the plate to have algorithms which support both floating point and integral coordinate types :). Cheers, Brandon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Brandon wrote:
As for the general case, there is already enough on the plate to have algorithms which support both floating point and integral coordinate types :).
That's for darn sure. That is the one thing about your library that really intrigues me. I'm pretty much integer only, Barend is pretty much floating point exclusively. Joel suggested early on that generic programming could resolve the issue easily, but you are the first person to take that issue on. How do you do it? Luke

-------------------------------------------------- From: "Simonson, Lucanus J" <lucanus.j.simonson@intel.com> Sent: Thursday, October 09, 2008 2:59 PM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
Brandon wrote:
As for the general case, there is already enough on the plate to have algorithms which support both floating point and integral coordinate types :).
That's for darn sure. That is the one thing about your library that really intrigues me. I'm pretty much integer only, Barend is pretty much floating point exclusively. Joel suggested early on that generic programming could resolve the issue easily, but you are the first person to take that issue on. How do you do it?
Luke
Well, the jury is still out as to how effective the *real* solution will be. I start by adding the ability to perform type dispatch based on the number type of the coordinate. I manage that by again using a traits class (for the coordinates): //! Default point traits struct. //! NOTE: must be specialized for user types. template <typename Coordinate> struct coordinate_traits { typedef typename Coordinate coordinate_type; BOOST_MPL_ASSERT_MSG ( ( false ) , COORDINATE_TRAITS_NOT_DEFINED , (Coordinate) ); }; //! Macro for coordinate type with integral traits #define BOOST_DEFINE_INTEGRAL_COORDINATE_TRAITS( Coordinate ) \ template <> \ struct coordinate_traits< Coordinate > \ { \ typedef Coordinate coordinate_type; \ typedef boost::false_type is_float_t ; \ typedef boost::true_type is_integral_t; \ }; //! Macro for coordinate type with floating point traits #define BOOST_DEFINE_FLOATING_POINT_COORDINATE_TRAITS( Coordinate ) \ template <> \ struct coordinate_traits< Coordinate > \ { \ typedef Coordinate coordinate_type;\ typedef boost::true_type is_float_t; \ typedef boost::false_type is_integral_t; \ }; Then I can do a dispatch on the traits. The only issue I've tackled so far is for comparing segments on a sweepline. I did this by using boost::rational ( so overflow may still be a problem in some cases ). The other issue I have yet to tackle is intersection between two segments for integers. I have an implementation that uses O'Rourke's method from Computational Geometry in C. In his method the segments are first converted into parametric coordinates and then compared to see if the resulting system of equations results in a point which is contained in each segment. As you well know you can't do this with segments in integral coordinates as the intersection point often has fractional coordinates. I suppose the only thing you can do with that is resort to rationals. This will be necessary as you must track all event points during the sweep (including the intersections), and reporting of intersections requires that all segments intersecting in the same point be associated with that point in the sweepline. This leads me to think that the easiest solution is to make all the coordinates rational in the event structure (for sorting lexicographically) and then when actually reporting, making the rationals available to the visitor class (which is how the reporting is done) along with the offending segments. This way users who are able to use the fractional intersections can still use them.. and those who cannot still have the segments which intersect (and I assume they can glean whatever information they need from them.) In cases like yours where you constraint the slopes to be 0,1 or infinity, you can no doubt do a lot better on the intersection, but a general rational solution for intersection would probably still work. I still need to think about it more to see how I can mitigate issues with overflow. Cheers, Brandon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Brandon wrote:
Well, the jury is still out as to how effective the *real* solution will >be. I start by adding the ability to perform type dispatch based on the number type of the coordinate. I manage that by again using a traits class (for >the coordinates):
Interesting, I have coordinate_traits and register the coordinate type for tag dispatching purposes. My coordinate traits register related data types such as area data type. For 32 bit integer coordinates I use a 64 bit integer to compute and return area values, and I register it through the 32 bit integer coordinate traits.
Then I can do a dispatch on the traits. The only issue I've tackled so far is for comparing segments on a sweepline. I did this by using boost::rational ( so overflow may still be a problem in some cases ).
The other issue I have yet to tackle is intersection between two segments for integers. I have an implementation that uses O'Rourke's method from Computational Geometry in C. In his method the segments are first converted into parametric coordinates and then compared to see if the resulting system of equations results in a point which is contained in each segment. As you well know you can't do this with segments in integral coordinates as
I use mpq_class because some cases is way to many cases for me. I compute the y as a mpq_class at a given x, which allows me exact, overflow proof comparison, but I don't store the mpq_class value in the sweepline data structure. I compute it on the fly as needed. the
intersection point often has fractional coordinates. I suppose the only
thing you can do with that is resort to rationals.
Which is what I'm doing. I simplified the algebra so that there is only one equation for x and y respectively. x = (x11 * dy1 * dx2 - x21 * dy2 * dx1 + y21 * dx1 * dx2 - y11 * dx1 * dx2) / (dy1 * dx2 - dy2 * dx1); transpose x and y and it gives you the y value as well. When evaluated with gmp rational it gives the exact intersection. I also have a predicate that determines whether two line segments intersect, but does not compute the intersection point. I do that by comparing slope cross products looking for both points of one segment being on the same side of the other line segment. It does not need more than 64 bit integer to deal with 32 bit coordinate segments.
This will be necessary as you must track all event points during the sweep (including the intersections), and reporting of intersections requires that all segments intersecting in the same point be associated with that point in the sweepline. This leads me to think that the easiest solution is to make all the coordinates rational in the event structure (for sorting lexicographically) and then when actually reporting, making the rationals available to the visitor class (which is how the reporting is done) along with the offending segments. This way users who are able to use the fractional intersections can still use them.. and those who cannot still have the segments which intersect (and I assume they can glean whatever
information they need from them.) In cases like yours where you constraint the slopes to be 0,1 or infinity, you can no doubt do a lot better on the intersection, but a general rational solution for intersection would probably still work. I still need to think about it more to see how I can mitigate issues with overflow.
Don't forget -1. Actually, I've been extending the library to general polygons recently. You have pretty much no other option than to use gmp to avoid overflow, as far as I can tell. In general you need 96 bits of integer precision to compute the numerator of the x value in my expression above. Other than not including the gmp header file in boost licensed source, it is no problem to depend on it because it is open source and more than portable in that it explicitly targets almost every platform. My arbitrary line segment intersection algorithm snaps the intersection points to the integer grid and deals with the problems that causes when line segments move and collide with nearby verticies. This would be quite a bit harder to do when snapping to the floating point grid, I would imagine. The output of my integer line intersection algorithm is the input segments broken into smaller segments at the intersection points and snapped to the integer grid. The invariant it guarantees is that no pair of output segments intersect at a non-vertex position. Once you have that you can scan it quite simply and build a graph or do Booleans or whatever else you need to. I'm hoping to get the algorithms completed some time this quarter and released to boost. The intention is to have 100% robust integer linescan and boolean operations. I really haven't thought about how to accomplish the same with floating point coordinates. Luke

Mathias Gaunard wrote:
Wanting to code a simple 3D virtual world, I looked into the recent spatial indexes SOC work, which depends on the proposed geometry library.
Your message has prompted me to finally have a look at Federico's spatial index work.
I don't really get what boost::spatial_index::spatial_index is for. It does nothing but forward everything to the last template parameter.
Instead of writing boost::spatial_index::spatial_index< point_type, value_type, boost::spatial_index::rtree<point_type, value_type>
index(args...);
I might as well write boost::spatial_index::rtree<point_type, value_type> index(args...);
Also, I think the spatial indexes do not provide enough operations. The only things they allow is putting point => value pairs, retrieving the value associated to a point in O(log n) time (I assume, complexity is not documented), and finding all values within a certain box (in O(log n) too?). Many things one typically wants to do with spatial indexes are not possible: - Iterate all (points, value) pairs in the index - Find the element closest (or k-closest) to some point in space or other element. - Find all elements within a certain range of some point in space or other element - Find the elements that intersect with a given object (ray tracing, for example, searches for an element that intersects with a ray and that is the closest to its source) - Consider only a part of the indexed space (e.g. an area delimited by a range/sphere or box) - Being able to index arbitrary objects and not just points - Traverse the spatial index as a tree
I certainly agree that all or most of those operations should be provided. My own observations: - insert should take a std::pair for consistency with the std::associative_containers. - Iteration over a (rectangular) range should involve iterators, not returning a std::deque. - The current find(box) results return only the value_type, not the point. - When I use integer coordinates I get lots of warnings because of /2.0 in the code. - Defaults should be provided (or, values should be chosen at runtime) for the "fiddle factors". - Performance is poor; worse than 10x slower than the code that I'm currently using. - Rectangle iteration for rtrees is much slower than that; I suspect there's either a complexity problem or a bug. - There is no need to use shared_ptr to point to child nodes; this is probably a major reason for the performance problem. - In the quadtree, the bounding boxes of parent and child nodes have common edges; this should be exploited to reduce the size of the node. I understand that this is "only" a GSoC project, but it's sad that Federico didn't keep in touch with this list while he was working on this code. We could have quickly pointed out problems like the above long before the "final report" stage. (I see no messages here between "Congratulations to the Successful Summer of CodeStudents!" on 20080422 and "Final report of GSOC project 'Spatial Indexes'" on 20080911.) I'm curious to know if Federico's GSoC mentor had reviewed at least the interface to the proposed class. Regards, Phil.

-------------------------------------------------- From: "Patrick Mihelich" <patrick.mihelich@gmail.com> Sent: Friday, October 10, 2008 10:55 PM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
Second, there are efficiency and code size concerns. Consider the Euclidean
distance function. For 2D/3D points, it is nice to use compile-time indexing and metaprogramming to guarantee tight, loop-less code. For high dimensional points, however, beyond some dimension (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer desirable, and we would rather just do the runtime iteration. If the points are RuntimeIndexable, we can choose an appropriate distance algorithm at compile time depending on the points' dimensionality.
I presume the reason one would not want loop-less code at high dimensions is due to bloat? I do not understand why runtime looping would be faster in this case. It would seem that even in the largest cases the bloat would still be small compared to memory sizes today. Runtime indexing can certainly be accommodated using a tag dispatch on the access traits. I think more research needs to be done to quantify the differences between the two to see if it is worth the maintenance overhead.
Sure, I need to back up my claim. The fundamental problem with loop-less code at high dimensions is bloat, yes. Remember that bloat also affects efficiency. Suppose we're calculating distances from one point to a set of other points. If the size of the distance function makes the inner loop code larger than the L1 instruction cache, we take a performance hit.
I'm attaching a simple benchmark that calculates the distance between two points some number of times, using both compile-time and runtime access. Note that this is awfully friendly to the compile-time access code, as we're doing nothing else in the inner loop. Here are some results on my machine (Intel Core Duo 2.4 GHz, gcc 4.2.3):
$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 2.000000 Compile-time access: 5.460000s Runtime access: 9.070000s $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 3.000000 Compile-time access: 7.980000s Runtime access: 10.970000s
So for 2D/3D we see the benefit to using compile-time access, although it wouldn't surprise me if the gap could be closed by better compile-flag twiddling. Let's see what happens at high dimensions. Let's also add in a partially unrolled runtime access version that operates 4 elements at a time (ideally the compiler would take care of this).
$ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o runtime_test mihelich@adt:~/projects/stuff$ ./runtime_test Distance = 36.000000 Compile-time access: 1.620000s Runtime access: 1.280000s Runtime access (unrolled): 0.740000s $ g++ -I /u/mihelich/packages/boost-spacial-index -I /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 128.000000 Compile-time access: 5.960000s Runtime access: 4.900000s Runtime access (unrolled): 3.160000s
Here runtime access is faster. 128 dimensions may sound excessive, but I did not choose these numbers arbitrarily. In vision applications it is very common to represent image patches by SIFT descriptors, which are points in 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but runtime access is still faster. In either case partial unrolling is a big win. In other applications we may go to even higher dimensions...
$ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 1000.000000 Compile-time access: 8.150000s Runtime access: 4.090000s Runtime access (unrolled): 2.460000s
Now runtime access is much faster. And once we get up this far I have to crank up the template depth just to calculate a distance. 1000 dimensions is high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky Survey have 4000 dimensions, for example.
I understand if the authors do not want to concern themselves much with optimizing for non 2D/3D cases, as users who really care can always substitute their own highly optimized distance function using SSE, etc. But it would be nice to have something decent out of the box, and there is plenty of reason for a RuntimeIndexable concept. If the geometry library is to be used as a base for spatial index and perhaps clustering libraries, some consideration has to be taken for high dimensions.
Thank you for running some numbers Patrick, this certainly helps to clarify the issues. I'm going to investigate using Fusion as the underlying mechanism for access as I hear it supports both models. Out of curiosity, I wonder are these random accesses generally done iteratively? If so, how does direct iteration compare with index operator accesses? I've noticed in my own tests on measuring std::vector accesses that iteration is significantly faster than operator [ ] when you are iterating over the container (presumably due to bounds checking). Cheers, Brandon
Cheers, Patrick
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Brandon Kohn wote:
If so, how does direct iteration compare with index operator accesses? I've noticed in my own tests on measuring std::vector accesses that iteration is significantly faster than operator [ ] when you are iterating over the container (presumably due to bounds checking).
operator[] doesn't do bounds checking, unless you have the debug/secure mode of your standard library enabled (which is on by default on MSVC). While access by index is theoretically slower than pointer increments, I'm pretty sure any compiler optimizes the former to the latter.

Mathias Gaunard wrote:
Brandon Kohn wote:
If so, how does direct iteration compare with index operator accesses? I've noticed in my own tests on measuring std::vector accesses that iteration is significantly faster than operator [ ] when you are iterating over the container (presumably due to bounds checking).
operator[] doesn't do bounds checking, unless you have the debug/secure mode of your standard library enabled (which is on by default on MSVC).
While access by index is theoretically slower than pointer increments, I'm pretty sure any compiler optimizes the former to the latter.
At least with MSVC9 they aren't exactly equivalent. http://lists.boost.org/Archives/boost/2008/09/142657.php -- Michael Marcin
participants (16)
-
Barend Gehrels
-
Brandon Kohn
-
Bruno Lalande
-
Federico J. Fernández
-
Joel Falcou
-
Johan Råde
-
Jonathan Franklin
-
Mathias Gaunard
-
Matthias Schabel
-
Michael Fawcett
-
Michael Marcin
-
Patrick Mihelich
-
Phil Endecott
-
Simonson, Lucanus J
-
Steven Watanabe
-
Stjepan Rajko