
On Wed, Oct 8, 2008 at 7:51 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
Ok, I take your point. I wasn't suggesting that a library not be able to handle more than 3 dimensions. Rather I would suggest that the population of developers who have such requirements must be very small. Considering that generalization on dimension is not trivial for many algorithms, this leads me to the conclusion that the specific algorithms will decide the requirements on their inputs. This is why I chose a traits based interface for the geometry library I've been implementing. There's nothing to stop one from creating a multi-dimensional Cartesian/polar/parametric access traits interface for any algorithm which requires it. So I suppose I agree, the geometry library should not constrain on the number of dimensions. But I don't agree with full agnosticism (to mean agnosticism of coordinate frame as well.) There should be a way to differentiate points further to align with the concrete traits of the dimension they describe.
This sounds reasonable. Certainly there are many 2D/3D algorithms for which there is little point in generalizing to higher dimensions.
Yes, I have taken a slightly different route than the other libs I have found in the vault. In my library the algorithms require inputs conform to a specific access traits concept. I have been using get_x, get_y, get_z as the general interface for cartesian (and would have used get_r, get_theta, get_phi for the others.. if I had gotten to that), but I think now that I'm fairly well persuaded to investigate a compile time indexed mechanism like boost::tuple. Has anyone done a performance test on using such a mechanism in practical settings? While I'm sure it's constant, how big is that constant in practice? Is it worth paying in cases where an algorithm requires a constrained dimensionality? I'm guessing nobody has really addressed these questions... so perhaps I'll take a gander and see if it's worth providing both interfaces in my access traits.
Cool, compile time indexing is very useful. But, I'm not sure I see the need for using tuples. I can't offhand think of a use case where the dimensions are of heterogeneous type, so wouldn't a simple array work?
Back to Mathias' criticisms - although I realize it is an alpha, I was
similarly a bit disappointed when looking at the Spatial Indexes library. It is good code (I hope that Federico will not be discouraged by criticism!), but it is very limited in what it can do currently. It has no nearest neighbor search, nor even a traversal interface that would allow the user to implement it himself. It only provides 2D structures when it is very useful to support spaces of arbitrary dimension. I would like for the library to be generic in the dimension, not just the (2D) point type. Perhaps a good place to start is the quadtree - I do not see why the difference between a quadtree and an octree should be more than an int template parameter.
I'm admittedly opening up a bit of a can of worms here - nearest neighbor search in high dimensions is a Hard Problem. However, there are straightforward approximate nearest neighbor algorithms that are efficient and plenty good enough for many applications.
I can't say that I have seriously considered this. Just out of curiosity, how do kd-trees scale memory-wise on average in high dimension?
They scale quite well memory-wise - each node splits on a single dimension, so the size of the internal data is independent of the number of dimensions. Of course the size of the data points themselves is linear in the number of dimensions. The real problem with using kd-trees in high dimensions (for nearest neighbor search) is the "Curse of Dimensionality." Nearest neighbor search using kd-trees consists of (very succinctly) a depth-first search to the node containing the query point, followed by a back-tracking branch-and-bound search to ensure that we find the data point closest to the query point (as they need not be in the same node). For low dimension this tends to be quite efficient. For d > 10 or so, we often end up looking at most of the nodes in the tree during the back-tracking search... which can be slower than a naive linear scan of the data points. This is why we resort to approximate methods in high dimensions. One popular way is to simply cut off the back-tracking search after some number of nodes, which we expand in order of their nearness to the query point. In the vision community this algorithm is known as Best Bin First. Cheers, Patrick