Re: [boost] Geometry and spatial indexes, my opinion

8 Oct 2008

      On Wed, Oct 8, 2008 at 7:51 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
...
Ok, I take your point. I wasn't suggesting that a library not be able to
handle more than 3 dimensions. Rather I would suggest that the population of
developers who have such requirements must be very small. Considering that
generalization on dimension is not trivial for many algorithms, this leads
me to the conclusion that the specific algorithms will decide the
requirements on their inputs. This is why I chose a traits based interface
for the geometry library I've been implementing. There's nothing to stop one
from creating a multi-dimensional Cartesian/polar/parametric access traits
interface for any algorithm which requires it. So I suppose I agree, the
geometry library should not constrain on the number of dimensions. But I
don't agree with full agnosticism (to mean agnosticism of coordinate frame
as well.) There should be a way to differentiate points further to align
with the concrete traits of the dimension they describe.
This sounds reasonable. Certainly there are many 2D/3D algorithms for which
there is little point in generalizing to higher dimensions.
...
Yes, I have taken a slightly different route than the other libs I have
found in the vault. In my library the algorithms require inputs conform to a
specific access traits concept. I have been using get_x, get_y, get_z as the
general interface for cartesian (and would have used get_r, get_theta,
get_phi for the others.. if I had gotten to that), but I think now that I'm
fairly well persuaded to investigate a compile time indexed mechanism like
boost::tuple. Has anyone done a performance test on using such a mechanism
in practical settings? While I'm sure it's constant, how big is that
constant in practice? Is it worth paying in cases where an algorithm
requires a constrained dimensionality? I'm guessing nobody has really
addressed these questions... so perhaps I'll take a gander and see if it's
worth providing both interfaces in my access traits.
Cool, compile time indexing is very useful. But, I'm not sure I see the need
for using tuples. I can't offhand think of a use case where the dimensions
are of heterogeneous type, so wouldn't a simple array work?
...
Back to Mathias' criticisms - although I realize it is an alpha, I was
...
similarly a bit disappointed when looking at the Spatial Indexes library.
It
is good code (I hope that Federico will not be discouraged by criticism!),
but it is very limited in what it can do currently. It has no nearest
neighbor search, nor even a traversal interface that would allow the user
to
implement it himself. It only provides 2D structures when it is very
useful
to support spaces of arbitrary dimension. I would like for the library to
be
generic in the dimension, not just the (2D) point type. Perhaps a good
place
to start is the quadtree - I do not see why the difference between a
quadtree and an octree should be more than an int template parameter.
I'm admittedly opening up a bit of a can of worms here - nearest neighbor
search in high dimensions is a Hard Problem. However, there are
straightforward approximate nearest neighbor algorithms that are efficient
and plenty good enough for many applications.
I can't say that I have seriously considered this. Just out of curiosity,
how do kd-trees scale memory-wise on average in high dimension?
They scale quite well memory-wise - each node splits on a single dimension,
so the size of the internal data is independent of the number of dimensions.
Of course the size of the data points themselves is linear in the number of
dimensions.

The real problem with using kd-trees in high dimensions (for nearest
neighbor search) is the "Curse of Dimensionality." Nearest neighbor search
using kd-trees consists of (very succinctly) a depth-first search to the
node containing the query point, followed by a back-tracking
branch-and-bound search to ensure that we find the data point closest to the
query point (as they need not be in the same node). For low dimension this
tends to be quite efficient. For d > 10 or so, we often end up looking at
most of the nodes in the tree during the back-tracking search... which can
be slower than a naive linear scan of the data points. This is why we resort
to approximate methods in high dimensions. One popular way is to simply cut
off the back-tracking search after some number of nodes, which we expand in
order of their nearness to the query point. In the vision community this
algorithm is known as Best Bin First.

Cheers,
Patrick

Re: [boost] Geometry and spatial indexes, my opinion

Patrick Mihelich