[GGL] performance

Hi list, As there were doubts expressed on this list regarding performance in more challenging environments, an additional test is provided. The resulting figures of GGL are still good, sometimes the best, sometimes the second best. For example, for a union on polygons with many diagonals, holes, and many points on both inputs: ggl 3.6 seconds b.p. 7.8 seconds terralib 8.2 seconds geos 35.9 seconds cgal 77.8 seconds (a cgal-intersection (instead of union) is the fastest in these tests) Polygons have large diagonals, holes are tested, and the number of vertices of both input polygons can be varied in the tests. More information: http://trac.osgeo.org/ggl/wiki/IntersectionStarComb, including references to sources for reproducing the figures, everyone is invited to do that. Regards, Barend

Barend Gehrels wrote:
Hi list,
As there were doubts expressed on this list regarding performance in more challenging environments, an additional test is provided. The resulting figures of GGL are still good, sometimes the best, sometimes the second best.
For example, for a union on polygons with many diagonals, holes, and many points on both inputs: ggl 3.6 seconds b.p. 7.8 seconds terralib 8.2 seconds geos 35.9 seconds cgal 77.8 seconds (a cgal-intersection (instead of union) is the fastest in these tests)
Polygons have large diagonals, holes are tested, and the number of vertices of both input polygons can be varied in the tests.
More information: http://trac.osgeo.org/ggl/wiki/IntersectionStarComb, including references to sources for reproducing the figures, everyone is invited to do that.
Thank you! This benchmark is quite even handed. I want to start out by explaining my motivation for reviewing your library. If it is accepted I would like to specialize my polygon_set_data<> class for single, double and long double and rely on your implementation of boolean operations to support floating point coordinates. Before I do that I want to be sure that any problems with your library have been addressed before the library is accepted into boost. That's one reason I was so keen on seeing your polygon concept fixed to make it generic enough to adapt my polygon_with_holes_concept, since that would be an obstacle to integrating the two libraries. Performance problems, bugs and numerical robustness problems are also reasons why I would not want to integrate your algorithm into my library, so I am investigating these issues. I followed your suggestion and constructed a benchmark. I found it somewhat challenging because your interface supports only two polygon operations, so I couldn't use the benchmark I designed for boostcon. In my benchmark I construct an square grid of trianglular holes in an enclosing rectangle. I vary the number of rows and columns with the 2nd and 3rd input parameters to your benchmark app. What I found is that your algorithm scales quadratically wrt. number of edges (as expected), and quickly performs 100X slower than my own implementation. Before reporting that your implementation is 100X faster than my own and faster than any other you should have done more careful work benchmarking to understand whether such an assertion was in fact accurate. I realize that you plan to fix this performance problem with a query tree based algorithm. I've already offered to fix it by applying my own optimizations as a patch. There is no reason not to accept your library into boost due to performance, but I would like to see it required that this performance problem be fixed before your library is accepted, which should be easy. That said, I feel you owe me an apology for publishing misleading benchmark results during the review of my library that could very easily have prevented its acceptance into boost. As you can well imagine it caused me no small ammount of consternation. If I have ever said or done anything on the list or off that you feel I should appologize to you for, please tell me what it is so that I can do so. Now that we have gotten the issue of performance out of the way, I'd like to move on to the issues of bugs and testing. During the benchmarking of your library I encountered a bug in your polygon intersection algorithm. I've attached the rectangle_bug.cpp file so that you can easily reproduce the problem. Apparently when the enclosing rectangles in my benchmark happen to be identical your library drops that polygon from the output of both the union and intersection operation. This leads to the area computation resulting in an awkwardly negative value. Duplicate edges and points (and indeed polygons) are well known examples of input degeneracies (as I discussed at boostcon) and it is probably in the handling of the degenerate cases that you will find the error that caused the polygon to be dropped. This isn't too surprising, because the Atherton/Wieler algorithm you implemented is notoriously difficult to get correct because handling all possible degeneracies (and their potential interaction) explicitly in the visitor policy for walking the graph is hard to do correctly. I would require that the bug I've found be fixed before your library is accepted into boost. Now I'll discuss testing. The fact that there is a bug in your algorithm indicates to me that insufficient testing has been done. Is this algorithm used in production code? It is very difficult to think of all the things that can go wrong in a complicated algorithm and construct tests for each potential problem. What I did to validate my own implementation was auto generate large numbers of random test data of increasing scale until I was confident that anything that could go wrong would have gone wrong. The benchmark I showed at boostcon (the poster on the wall) is a good example of such a test at a large scale. It is no accident that it found a bug in polyboolean, it isn't likely for an algorithm with a bug in it to pass that kind of test. You should start small, merging many iterations of two randomly generate triangles, then three, then five and so on until you can perform a union on one million randomly generated triangles. The reason to start small is because it is easier to root cause a problem in a small test case, so you want to find any problems in the smallest possible testcase where the problem can occur. Since the policies are different for union and intersection you will also have to do the same for intersecting the union of large numbers of random triangles (which is what my boostcon benchmark did.) Compare your results to those produced by CGAL with an XOR operation performed by CGAL and make sure that no polygons in the result of the XOR have any regions larger than can be explained my numerical error. Actually, CGAL's licence probably doesn't allow you to use it for validation purposes, since you work for a commercial enterprise. Use GPC instead, oh wait... use something free. I'd like to see this kind of testing done before your library is accepted into boost. I'd also like to see XOR as well as AND NOT implemented and supported by your interface before your library is accepted. A good testing methodology will help you write the visitor policies for these algorithms. I'd like you to add the interfaces to intersect containers of polygons with eachother. I figured out I could probably fake it by adding reverse winding holes outside of the enclosing polygon, but I'd like your interface for that to be defined and working before I use it to specialize polygon_set_data<>, so I'd like that to be done before your library is accepted into boost as well. Obviously you will need to merge each input before you can intersect them (unless you want to write a fiendishly difficult visitor policy). I know full well that the union of more than two polygons is harder to implement in the Atherton-Wieler algorithm than the union of two polygons. I don't believe your algorithm supports the union of more than two polygons in all cases, but a good set of tests would convince me. I'll defer discussion of numerical robustness until tomorrow since this email is already overlong. Thanks, Luke Benchmark results: GGL: area: 147058 speed: 0.05 dlxc0129> ggl_starcomb 1 30 30 10 200 0 GGL: area: 337291 speed: 0.26 dlxc0129> ggl_starcomb 1 40 40 10 200 0 GGL: area: 605332 speed: 0.77 dlxc0129> ggl_starcomb 1 50 50 10 200 0 GGL: area: 951182 speed: 1.89 dlxc0129> ggl_starcomb 1 60 60 10 200 0 GGL: area: 1.37484e+06 speed: 4.63 dlxc0129> ggl_starcomb 1 70 70 10 200 0 GGL: area: 1.87631e+06 speed: 16.85 dlxc0129> ggl_starcomb 1 80 80 10 200 0 GGL: area: 2.45559e+06 speed: 62.04 (my original algorithm) dlxc0129> gtl_starcomb 1 30 30 10 200 0 B.P.: area: 338517 speed: 0.05 dlxc0129> gtl_starcomb 1 40 40 10 200 0 B.P.: area: 606996 speed: 0.1 dlxc0129> gtl_starcomb 1 50 50 10 200 0 B.P.: area: 953295 speed: 0.19 dlxc0129> gtl_starcomb 1 60 60 10 200 0 B.P.: area: 1.37741e+06 speed: 0.3 dlxc0129> gtl_starcomb 1 70 70 10 200 0 B.P.: area: 1.87935e+06 speed: 0.45 dlxc0129> gtl_starcomb 1 80 80 10 200 0 B.P.: area: 2.45911e+06 speed: 0.65 (with my optimization from two days ago) dlxc0129> gtl_starcomb 1 30 30 10 200 0 B.P.: area: 338517 speed: 0.03 dlxc0129> gtl_starcomb 1 40 40 10 200 0 B.P.: area: 606996 speed: 0.06 dlxc0129> gtl_starcomb 1 50 50 10 200 0 B.P.: area: 953295 speed: 0.1 dlxc0129> gtl_starcomb 1 60 60 10 200 0 B.P.: area: 1.37741e+06 speed: 0.15 dlxc0129> gtl_starcomb 1 70 70 10 200 0 B.P.: area: 1.87935e+06 speed: 0.21 dlxc0129> gtl_starcomb 1 80 80 10 200 0 B.P.: area: 2.45911e+06 speed: 0.28

Luke,
That said, I feel you owe me an apology for publishing misleading benchmark results
I've sent you the results, off-list, long before publishing them, including a report showing where things went wrong at your side. There were months for you to improve things before the figures were published. Our benchmark compares 6 libraries, not only 2. It's comparing 7 algorithms, not only 1. If you found them misleading you could have objected easily before publishment. It's surprising to read this now. And not necessary, after acceptance of your library. We've done our comparisons in a careful and honest way. We didn't hack things in frenetically to find bug X. Our website says: "We are aware of the weaknesses of performance tests and that it is hard to design objective benchmarks. So, this comparison is not considered to be objective in terms of intentions. However, we did what we can to achieve most comparable results." and "Therefore, everyone can review it and provide his critique as well as point bugs and weaknesses". So yes, we know that things can go wrong. Maybe I didn't discover option Y in library Z which speeds things up. We're happy to read that you'll replace your algorithm with ours. I'd modify your tone then, otherwise you might prevent acceptance. I don't think it wise for me to react on more of your attempts to avoid that. Regards, Barend

Barend Gehrels wrote:
Luke,
That said, I feel you owe me an apology for publishing misleading benchmark results
I've sent you the results, off-list, long before publishing them, including a report showing where things went wrong at your side. There were months for you to improve things before the figures were published.
Our benchmark compares 6 libraries, not only 2. It's comparing 7 algorithms, not only 1. If you found them misleading you could have objected easily before publishment. It's surprising to read this now. And not necessary, after acceptance of your library.
We've done our comparisons in a careful and honest way. We didn't hack things in frenetically to find bug X. Our website says: "We are aware of the weaknesses of performance tests and that it is hard to design objective benchmarks. So, this comparison is not considered to be objective in terms of intentions. However, we did what we can to achieve most comparable results." and "Therefore, everyone can review it and provide his critique as well as point bugs and weaknesses". So yes, we know that things can go wrong. Maybe I didn't discover option Y in library Z which speeds things up.
We're happy to read that you'll replace your algorithm with ours. I'd modify your tone then, otherwise you might prevent acceptance. I don't think it wise for me to react on more of your attempts to avoid that.
Not replace, but put along side, so calling a boolean op on polygons with integer would call my algorithm, but with floating point would call yours. Even so, I will have to add compile time static asserts to prevent all the unsupported operations from being called on floating point polygons. Your implementation cannot keyhole holes to outer shells or slice polygons into trapezoids. You also don't support the n-layer map overlay operation, so I'd have to ensure that the users get a good error msg if they try to call it on floating point. However, I won't integrate your algorithm into my library until your algorithm is ready to be used. I am using that as my criteria for judging whether your library should be accepted into boost. I am not trying to prevent acceptance of your library, but I do think that there are issues that need to be addressed before the library is accepted. Given the unfinished state of the polygon clipping I'm a little confused that you felt now to be a good time to bring the library for review. I fully expect your library will be accepted to boost eventually, either in this review, contingent on a list of changes or in a follow up review several months from now. So now I'll discuss robustness. I looked at the implementation of line segment pair intersection. You are actually solving for the intersection itself and using the intersection point in the intermediate stages of the algorithm, even special casing the unfortunate circumstance where numberical error results in the intersection point being equal to one of the input segment end points. It looks to me like the predicate for determining whether two line segments intersect produced as a byproduct of computing the intersection point is not robust. Your suggestion that self intersecting output polygons can be cleaned up by pushing them back through the algorithm is therefore not a valid solution for the robustness problem. I'd like to see the code that determines whether a pair of line segments intersect enhanced such that the result is robust for floating point coordinates. I'd also like to see the method for doing so described in the documentation. If we can be confident that the algorithm will find self intersections when the output is fed back through then I would accept that as a solution for the self-intersection/robustness issue. I'll summarize my review presently. Thanks, Luke

What I found is that your algorithm scales quadratically wrt. number of edges (as expected), and quickly performs 100X slower than my own implementation. Thanks for pointing this out, Luke. This quadratic behaviour is solved now (but we've still room for improvement). This new case is indeed different than our three cases tested. Here the number of input polygons (or holes) is varied. We'll add this case to our suite.
During the benchmarking of your library I encountered a bug in your polygon intersection algorithm. This leads to the area computation resulting in an awkwardly negative value. Thanks also for pointing this out. It is also solved. What happened was
When using 60 polygons (the number your reported), the performance is now similar. Using more polygons, Boost.Polygon is faster, so scales better. Using a spatial index we can improve more, but for now the solution proves that the design is reasonable and we can quickly adapt it. that the outer ring was detected equal, but the assemble function (which is relatively new indeed) failed to output one of them. Therefore only the inner rings, which were intersected correctly, were outputted, resulting in a negative area (which is correct without outer ring). Luke, you wrote that the outer ring was dropped and that was correct, but you immediately gave the wrong reason for it, criticizing the whole algorithm. I think you should not guess. This found bug was in the logics of gathering correctly intersected rings. Not an error in numerical calculations or degeneracies. I'm explaining this explicitly because of the other discussions on this topic. However, I'm glad that these things were discovered and that they could be solved easily. Regards, Barend
participants (2)
-
Barend Gehrels
-
Simonson, Lucanus J