
-------------------------------------------------- From: "Patrick Mihelich" <patrick.mihelich@gmail.com> Sent: Friday, October 10, 2008 10:55 PM To: <boost@lists.boost.org> Subject: Re: [boost] Geometry and spatial indexes, my opinion
On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn@hotmail.com> wrote:
Second, there are efficiency and code size concerns. Consider the Euclidean
distance function. For 2D/3D points, it is nice to use compile-time indexing and metaprogramming to guarantee tight, loop-less code. For high dimensional points, however, beyond some dimension (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer desirable, and we would rather just do the runtime iteration. If the points are RuntimeIndexable, we can choose an appropriate distance algorithm at compile time depending on the points' dimensionality.
I presume the reason one would not want loop-less code at high dimensions is due to bloat? I do not understand why runtime looping would be faster in this case. It would seem that even in the largest cases the bloat would still be small compared to memory sizes today. Runtime indexing can certainly be accommodated using a tag dispatch on the access traits. I think more research needs to be done to quantify the differences between the two to see if it is worth the maintenance overhead.
Sure, I need to back up my claim. The fundamental problem with loop-less code at high dimensions is bloat, yes. Remember that bloat also affects efficiency. Suppose we're calculating distances from one point to a set of other points. If the size of the distance function makes the inner loop code larger than the L1 instruction cache, we take a performance hit.
I'm attaching a simple benchmark that calculates the distance between two points some number of times, using both compile-time and runtime access. Note that this is awfully friendly to the compile-time access code, as we're doing nothing else in the inner loop. Here are some results on my machine (Intel Core Duo 2.4 GHz, gcc 4.2.3):
$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 2.000000 Compile-time access: 5.460000s Runtime access: 9.070000s $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 3.000000 Compile-time access: 7.980000s Runtime access: 10.970000s
So for 2D/3D we see the benefit to using compile-time access, although it wouldn't surprise me if the gap could be closed by better compile-flag twiddling. Let's see what happens at high dimensions. Let's also add in a partially unrolled runtime access version that operates 4 elements at a time (ideally the compiler would take care of this).
$ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o runtime_test mihelich@adt:~/projects/stuff$ ./runtime_test Distance = 36.000000 Compile-time access: 1.620000s Runtime access: 1.280000s Runtime access (unrolled): 0.740000s $ g++ -I /u/mihelich/packages/boost-spacial-index -I /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128 -O3 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 128.000000 Compile-time access: 5.960000s Runtime access: 4.900000s Runtime access (unrolled): 3.160000s
Here runtime access is faster. 128 dimensions may sound excessive, but I did not choose these numbers arbitrarily. In vision applications it is very common to represent image patches by SIFT descriptors, which are points in 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but runtime access is still faster. In either case partial unrolling is a big win. In other applications we may go to even higher dimensions...
$ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024 runtime_test.cpp -o runtime_test $ ./runtime_test Distance = 1000.000000 Compile-time access: 8.150000s Runtime access: 4.090000s Runtime access (unrolled): 2.460000s
Now runtime access is much faster. And once we get up this far I have to crank up the template depth just to calculate a distance. 1000 dimensions is high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky Survey have 4000 dimensions, for example.
I understand if the authors do not want to concern themselves much with optimizing for non 2D/3D cases, as users who really care can always substitute their own highly optimized distance function using SSE, etc. But it would be nice to have something decent out of the box, and there is plenty of reason for a RuntimeIndexable concept. If the geometry library is to be used as a base for spatial index and perhaps clustering libraries, some consideration has to be taken for high dimensions.
Thank you for running some numbers Patrick, this certainly helps to clarify the issues. I'm going to investigate using Fusion as the underlying mechanism for access as I hear it supports both models. Out of curiosity, I wonder are these random accesses generally done iteratively? If so, how does direct iteration compare with index operator accesses? I've noticed in my own tests on measuring std::vector accesses that iteration is significantly faster than operator [ ] when you are iterating over the container (presumably due to bounds checking). Cheers, Brandon
Cheers, Patrick
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost