Re: [boost] Geometry and spatial indexes, my opinion

11 Oct 2008

      --------------------------------------------------
From: "Patrick Mihelich" <patrick.mihelich@gmail.com>
Sent: Friday, October 10, 2008 10:55 PM
To: <boost@lists.boost.org>
Subject: Re: [boost] Geometry and spatial indexes, my opinion
...
On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
...
Second, there are efficiency and code size concerns. Consider the 
Euclidean
...
distance function. For 2D/3D points, it is nice to use compile-time
indexing
and metaprogramming to guarantee tight, loop-less code. For high
dimensional
points, however, beyond some dimension
(BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer
desirable, and we would rather just do the runtime iteration. If the
points
are RuntimeIndexable, we can choose an appropriate distance algorithm at
compile time depending on the points' dimensionality.
I presume the reason one would not want loop-less code at high dimensions
is due to bloat? I do not understand why runtime looping would be faster 
in
this case. It would seem that even in the largest cases the bloat would
still be small compared to memory sizes today. Runtime indexing can
certainly be accommodated using a tag dispatch on the access traits. I 
think
more research needs to be done to quantify the differences between the 
two
to see if it is worth the maintenance overhead.
Sure, I need to back up my claim. The fundamental problem with loop-less
code at high dimensions is bloat, yes. Remember that bloat also affects
efficiency. Suppose we're calculating distances from one point to a set of
other points. If the size of the distance function makes the inner loop 
code
larger than the L1 instruction cache, we take a performance hit.
I'm attaching a simple benchmark that calculates the distance between two
points some number of times, using both compile-time and runtime access.
Note that this is awfully friendly to the compile-time access code, as 
we're
doing nothing else in the inner loop. Here are some results on my machine
(Intel Core Duo 2.4 GHz, gcc 4.2.3):
$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o
runtime_test
$ ./runtime_test
Distance = 2.000000
Compile-time access: 5.460000s
Runtime access: 9.070000s
$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o
runtime_test
$ ./runtime_test
Distance = 3.000000
Compile-time access: 7.980000s
Runtime access: 10.970000s
So for 2D/3D we see the benefit to using compile-time access, although it
wouldn't surprise me if the gap could be closed by better compile-flag
twiddling. Let's see what happens at high dimensions. Let's also add in a
partially unrolled runtime access version that operates 4 elements at a 
time
(ideally the compiler would take care of this).
$ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o
runtime_test
mihelich@adt:~/projects/stuff$ ./runtime_test
Distance = 36.000000
Compile-time access: 1.620000s
Runtime access: 1.280000s
Runtime access (unrolled): 0.740000s
$ g++ -I /u/mihelich/packages/boost-spacial-index -I
/u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128
-O3 runtime_test.cpp -o runtime_test
$ ./runtime_test
Distance = 128.000000
Compile-time access: 5.960000s
Runtime access: 4.900000s
Runtime access (unrolled): 3.160000s
Here runtime access is faster. 128 dimensions may sound excessive, but I 
did
not choose these numbers arbitrarily. In vision applications it is very
common to represent image patches by SIFT descriptors, which are points in
128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but
runtime access is still faster. In either case partial unrolling is a big
win. In other applications we may go to even higher dimensions...
$ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024
runtime_test.cpp -o runtime_test
$ ./runtime_test
Distance = 1000.000000
Compile-time access: 8.150000s
Runtime access: 4.090000s
Runtime access (unrolled): 2.460000s
Now runtime access is much faster. And once we get up this far I have to
crank up the template depth just to calculate a distance. 1000 dimensions 
is
high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky
Survey have 4000 dimensions, for example.
I understand if the authors do not want to concern themselves much with
optimizing for non 2D/3D cases, as users who really care can always
substitute their own highly optimized distance function using SSE, etc. 
But
it would be nice to have something decent out of the box, and there is
plenty of reason for a RuntimeIndexable concept. If the geometry library 
is
to be used as a base for spatial index and perhaps clustering libraries,
some consideration has to be taken for high dimensions.
Thank you for running some numbers Patrick, this certainly helps to clarify 
the issues. I'm going to investigate using Fusion as the underlying 
mechanism for access as I hear it supports both models. Out of curiosity, I 
wonder are these random accesses generally done iteratively? If so, how does 
direct iteration compare with index operator accesses? I've noticed in my 
own tests on measuring std::vector accesses that iteration is significantly 
faster than operator [ ] when you are iterating over the container 
(presumably due to bounds checking).

Cheers,

Brandon
...
Cheers,
Patrick

...
_______________________________________________
Unsubscribe & other changes: 
http://lists.boost.org/mailman/listinfo.cgi/boost