[geometry] robustness approaches

newer
[static] A useful static variable...

Fernando Cacciola

13 Mar 2009 13 Mar '09

9:36 p.m.

Hi boost geometers ;) When Barend first presented his geometric library proposal I argued that the so-called "epsilon-based" approach used by the library is just fundamentally wrong, and most importantly, totally unnecessary since much better methodologies exist. It deeply worries me to find that the GGL proposal haven't been updated in this regard, because the instrumentation of the proper techniques have a significant impact on the overall design. Robustness can't be treated as an implementation detail to be deal with later on (and I'm sorry if I haven't made this point clear when I warn about it before). If I where to review the library today I would have to strongly vote for rejection because of this. Here is a simple example of the fundamental problems I see all over the proposed library: There is a point_in_circle algorithm, which (as I can see from its implementation), essentially returns whether the distance from the point to the center is smaller than the circle radius. No doubt *that* is the one and only correct *mathematical* formulation for the predicate, but is one that just doesn't stand in practice at all. Computing *that very particular predicate ROBUSTELY* has been the subject of active research during the past decade, to such an extent, that pretty much every paper or article on the subject of robustness in geometric computing uses the in-circle test as a common example. The library proposed by Lucannus, OTOH, at least acknowledges the problem appropriately and approaches a solution in the right direction (there is still some important design-influencing considerations but are not showstopers) Here is a starting point http://www.mpi-inf.mpg.de/departments/d1/ClassroomExamples/ Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Show replies by date

Barend Gehrels

13 Mar 13 Mar

9:53 p.m.

...

It deeply worries me to find that the GGL proposal haven't been updated in this regard, because the instrumentation of the proper techniques have a significant impact on the overall design. Robustness can't be treated as an implementation detail to be deal with later on (and I'm sorry if I haven't made this point clear when I warn about it before). I know you warned before and if I'm right you offered to help us with

...

If I where to review the library today I would have to strongly vote for rejection because of this. That phase is not yet there... It is in preview until we publish it for review. The library proposed by Lucannus, OTOH, at least acknowledges the problem appropriately and approaches a solution in the right direction (there is still some important design-influencing considerations but are not showstopers) You'll probably found this on the mails and not on the library itself. It has no circles so it is a bit unfair to compare the approaches like

Hi Fernando, that. I was not forgotten that, although it is more than a year ago. In my opinion we've left this issue open, there were many issues to handle, because of requests of the list. Concepts, taking any geometry, coordinate systems, dimensions, ranges, we've handled many. But not all, in that sense you are right. I'm still optimistic that we can handle also this issue, even now. You are right, it has been unchanged indeed (as a I mentioned). this. It focusses on integer arithmetic and 45/90 angled polygons so it is different, more or less complimentary. Regards, Barend

Fernando Cacciola

14 Mar 14 Mar

3:40 p.m.

Hi Barend,

...

Hi Fernando,

...
It deeply worries me to find that the GGL proposal haven't been updated in this regard, because the instrumentation of the proper techniques have a significant impact on the overall design. Robustness can't be treated as an implementation detail to be deal with later on (and I'm sorry if I haven't made this point clear when I warn about it before).

I know you warned before and if I'm right you offered to help us with that.

Right... is just that I don't know *how* to help you yet. Is like I don't know where to start. At the very least I need you to fully understand the problem and totally see why your current approach is just fundamentally wrong. So, can you read very very carefully the link I posted and come back to me when you can yourself identify what's exactly wrong with GGL? Then we can talk about what it should do instead.

...

I was not forgotten that, although it is more than a year ago.

Wow.. that much already? :)

...

In my opinion we've left this issue open, there were many issues to handle, because of requests of the list. Concepts, taking any geometry, coordinate systems, dimensions, ranges, we've handled many. But not all, in that sense you are right. I'm still optimistic that we can handle also this issue, even now.

OK, but keep in mind that it would affect your design in the same fundamental way concepts did.

...

You are right, it has been unchanged indeed (as a I mentioned).

...
If I where to review the library today I would have to strongly vote for rejection because of this. That phase is not yet there... It is in preview until we publish it for review.

Fair enough.

...

...
The library proposed by Lucannus, OTOH, at least acknowledges the problem appropriately and approaches a solution in the right direction (there is still some important design-influencing considerations but are not showstopers)

...

You'll probably found this on the mails and not on the library itself.

If "this" is my assesment of the robustness of Luke's library, no. I know that from what I privately discussed with him.

...

It has no circles so it is a bit unfair to compare the approaches like this.

No it isn't because I picked on the incircle test only because it is a textbook example. But I've looked at the source code of most of the library and it just totally missing the right approach.

...

It focusses on integer arithmetic and 45/90 angled polygons so it is different, more or less complimentary.

Then you are missing an important point, so let me try to explain it: Please read the following very carefully. It is well known that algebraic operations on integers are exact *IFF THE RESUT DOES NOT OVERFLOW*. By extension, division of integer rationals is exact (since there is no real division at all), but then again, provided there is no overflow. Many years ago it was commonly believed that operations on geometries with integer rational coordinates where just incidentally robust because of the integer properties stated above. And that belief proved itself to be correct in a large number of real world cases, so much in fact that it was common for geometric libraries, and even algorithm research, to focus only integer rational coordinates as a fast ad simple mean to achieve robustness. However, that recipe has a fundamental and very practical limitation: integer overflow. There are always non-degenerate input cases where the algorithm fails, and there are algorithms which fail very fast because integer rationals quickly overflow if you cascade too many multiplications. If robustness is defined, as it is or should be in our context, as the capability of obtaining correct results for *all* of the non-degenerate input, instead of some (however large), then rationals over plain machine integers are simply not the *right* solution since in reality that only reduces the probabiliy of failure in the same way epsilon-geometry or fixed-point artihmetic (which also fails on overflow) does. Machine integer rationals are, as a general technique for achieving robust geometric computations, much better than floating points, but still just not good enough. In fact, some algorithms work "very well" with machine integer rationals because they stay away from overflow far enough, but some other algorithms fail even much more qickly and destructively than using machine floating points because, on the contrary, they overflow in a snap (while floats have a much much wider range so floating point overflow is very rare in geometric computing) One obvious solution is to use software based number types. Bigints for example are easy to understand, implement and optimize, so Bigrationals have once been considered as perfect drop in replacements for machine integers in robust geometric algorithms. This would be ideal since the library code could just pretend that the underlying computational model where the RealRAM. However, those algorithms which "explode" the bitsize requirement due to large cascaded multiplications, totally ruin the performance of adaptive unlimited precision integer rationals. Similarly, using unlimited precision floating point numbers, or those which keep the expression trees, etc, is just too slow in practice to be a choice for industrial applications. Simply using a special number type does produce correct results, but it is still not fast enough (by far) to be used in industry, so, one of the keys of the *modern* approach to robust geometric computing is to computing using native types as far as possible and switch to specialized types when is needed. With integer types this can be done easily since the only source of errors is overflow and it is entirely possible to known in advanced whether an operation will overflow or not. And in C++ is possible to abstract all that away using the right design (in dynamic languages the virtual machine could also do the same totally under the hood, but I don't know any doing it) *THAT* is what Luke's library does, and that's the reason I said it aproached robustness in the right direction. It wasn't just the fact that it uses whatever integer coordinates or restricted forms of geometries (which isn't the case anymore) In the response to Luke I'll sketch the approach for floating point robustess, which follows the same principle, and how that affects the overall design of a library. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Barend Gehrels

4:24 p.m.

Hi Fernando,

...

Right... is just that I don't know *how* to help you yet. Is like I don't know where to start.

Thanks, we'll read your links, we'll come back to it (and it will not take another year :-) ). Of course it is (have always been) our intention to implement robustness, and if it affects design, OK, we're used to that. Regards, Barend

Fernando Cacciola

16 Mar 16 Mar

2:55 a.m.

Hi Barend,

...

Hi Fernando,

...
Right... is just that I don't know *how* to help you yet. Is like I don't know where to start.

Thanks, we'll read your links, we'll come back to it (and it will not take another year :-) ). Of course it is (have always been) our intention to implement robustness, and if it affects design, OK, we're used to that.

Great I'm looking forward to see your progress :) Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Simonson, Lucanus J

13 Mar 13 Mar

10:22 p.m.

Fernando Cacciola wrote:

...

The library proposed by Lucannus, OTOH, at least acknowledges the problem appropriately and approaches a solution in the right direction (there is still some important design-influencing considerations but are not showstopers)

Thank you, Fernando. I agree with you 100% that robustness needs be designed in up front and not tacked onto an existing geometry code base as an afterthought. In particular, the robustness strategy needs to be comprehensive and not patchwork. I acknowledge that my library needs a lot of work in this area to make it robust for both integer and floating point, particularly since it was only designed with integer robustness in mind. I will have to redesign the low level stuff quite significantly to get both integer and floating point fully robust and sharing the same algorithm. I don't really have the knowledge I need to make my library floating point robust. I need to read some of the papers you mention. Making the coordinate data type a template parameter and "allowing" the user to specify infinite precision numerical data type if they want robustness is not a solution, because there are very simple and effective ways to make integer and floating point operations robust. I don't know if we need to design in points of customization to expose the ability to do that to the user, but it needs to be at least possible for the library developer to specify different strategies for different coordinate data types and different contexts. I think Brandon's library is trying to handle both integer and floating point robustness, and I've noticed he is at least as interested in multi-precision numerical data types as I am, which is a good sign that he's doing the right things. I'm interested in learning more about his approach. Performance is fine as a goal, but if the algorithms fail (crash, hang, drop polygons or "gracefully" throw an exception) even a small percentage of the time due to robustness issues then they are worthless, regardless of how fast. If we tried to use a non-robust library and it failed in a mission critical situation it costs us over one million dollars a day to fix the problem. Failure is not an option. When I say 100% integer robust I'm serious. Its all or nothing and the stakes for me are high. Regards, Luke

Fernando Cacciola

14 Mar 14 Mar

6:26 p.m.

Hi Luke,

...

Fernando Cacciola wrote:

...
The library proposed by Lucannus, OTOH, at least acknowledges the problem appropriately and approaches a solution in the right direction (there is still some important design-influencing considerations but are not showstopers)

Thank you, Fernando. I agree with you 100% that robustness needs be designed in up front and not tacked onto an existing geometry code base as an afterthought. In particular, the robustness strategy needs to be comprehensive and not patchwork. I acknowledge that my library needs a lot of work in this area to make it robust for both integer and floating point, particularly since it was only designed with integer robustness in mind. I will have to redesign the low level stuff quite significantly to get both integer and floating point fully robust and sharing the same algorithm. I don't really have the knowledge I need to make my library floating point robust. I need to read some of the papers you mention.

I'll try to sketch the general ideas below to get you started.

...

Making the coordinate data type a template parameter and "allowing" the user to specify infinite precision numerical data type if they want robustness is not a solution

Precisely :) life would be too easy it that very already fast enough.

...

, because there are very simple and effective ways to make integer and floating point operations robust. I don't know if we need to design in points of customization to expose the ability to do that to the user, but it needs to be at least possible for the library developer to specify different strategies for different coordinate data types and different contexts.

Exactly... and the library as a service to other higher level libraries should and can provide a general methodology and the corresponding tools for doing that. Ultimately users shouldn't be concerned at all with all this, except when there is reasonable speed/robustness tradeoff that users would like to threshold themselves.

...

I think Brandon's library is trying to handle both integer and floating point robustness, and I've noticed he is at least as interested in multi-precision numerical data types as I am, which is a good sign that he's doing the right things. I'm interested in learning more about his approach.

As I'll try to sketch below, bigfloat are part of the toolset, but there is more to it.

...

Performance is fine as a goal, but if the algorithms fail (crash, hang, drop polygons or "gracefully" throw an exception) even a small percentage of the time due to robustness issues then they are worthless, regardless of how fast.

Exactly.

...

If we tried to use a non-robust library and it failed in a mission critical situation it costs us over one million dollars a day to fix the problem. Failure is not an option. When I say 100% integer robust I'm serious. Its all or nothing and the stakes for me are high.

And so are for end users.. it is the developers that think: "this works in all cases I tried, then that's good enough". But it isn't, and there is no need to compromise since lots of techniques exists and while they don't cover all of the possible algorithms, there is clearly little justification for not using the techniques where they are applicable. === LONG EXPLANTION BELOW === OK, I'll try to sketch the general idea behid floating point robustness in geometric computing. There is a lot on the subject on the literature, is just difficult to find, so consider this just an intrduction. On the one hand, if an algorithm only needs to perform algebraic operations, a "Bigfloat", i.e. a software-based adaptive unlimited precision floating point type, delivers totally exact results. If division is needed then you can use a rational of bifgloats. If other operations are needed, such as sqrt() or transcendentals (say cos, sin) they you can use a number type that tracks the expression trees and avoids actually performing the non-exact operations as much as possible, specially when you query for discrete properties such as its sign. On the other hand, it is possible to design and implement any geometric algorithm in terms of "predicates" and "constructions" and it is possible to consider the output as having separate topological and geometric parts. Let's consider an example: the straight skeleton (and the voronoi diagram in this case) of a square contains one and only one vertex in the center and one and only one edge from that center node to each square corner. That result has a topology (how vertices and skeleton nodes are connected) and a geometry (the coordinates of the center node). The topology might be correct even if the geometry is not: i.e. the center node is not exactly in the center, yet they are connected correctly. For many many geometric applications, it is the topology what needs to be correct, not the geometry. If you do clipping for instance, it might very well not matter at all (for instance in drawing applications) if the vertices are slightly off. But if an entire region is included in the result when it should not, then that is a disaster for whatever application. Since the input to an algorithm is error free no matter the representation (or more precisely: there is nothing the algorithm can do about its implicit error), it is easy to see that, if an algoritm is designed in such a way that the topological construction is based exclusively on the results of predicates called exclusively on the input (and never ever on an intermediate result), then the topology would be correct no matter what capabilities the underlying input arithmetic has, provided the predicates can do whatever magic is neccesary to produce exact and correct results given input of any type. That is, if there is a predicate that can take machine integer or floating point coordinates, and, in whatever way, produce a guaranteed error free result, then you can write the code of an algorithm totally agnostic of the limitations in the type of input coordinates. That is, the agorithm itsef can just pretend the computation is performed in the RealRAM. Researchers spent a lot of time figuring out how to implement exact predicates based on finite precision integer or floating-point input. While the results are limited and somewhat complicated from an engineering POV they do allow for incredibly fast algorithms that take simple native type coordinates (int or double) and produce 100% robust code. But even then, the *concept* of a black box predicate is still useful because there are other programming-specific techniques to provide simple-to-program but fast predicates: the so called "floating point filters" which I'll explan below. IMO, any geometric library should provide a toolbox of exact predicates (which shoud be exact independetly of the input type) and state as a guideline that any algorithm should only make critical decisions based on the result of predicates, and, unless totally impossible, the predicates should never be called with intermediate results. Here's a simple example: T d1 = distance(a,b) T d2 = distance(c,d) if ( d1 < d2 ) blabla that is fundamentally flawed beyond repair because rebustness becomes the resposability of T, and at the end of the day the poor user finds himself wrapping his input into a fancy numeric type just because of that flaw. If OTOH that were written like so: if ( compare_distace(a,b,c,d) ) blabla then the input could still be of any type yet some basic library tool could go to whatever extent is needed to give the correct answer. And if you are like me and love to exploit C++ unsurpreased expressive power, you can even write that like so: distance_result d1 = distance(a,b) distance_result d2 = distance(c,d) if ( d1 < d2 ) blabla where the comparison between distance_result types does the magic of producing an exact answer no matter what the type of input coordinates are. And this would even prevent the algorithms in the library to mistakely do the wrong thing. Following this methodology at large needs support from the bottom layer of the library, i.e. it needs a exact precicates toolbox that can operate on arbitrary input types and always yield correct results, but it also needs a general methodology: i.e. predicates needs to be used istead of "direct" logic and predicates need to be given only the input and never intermediate results (or as I'll explain below, that must be carefully cosidered and instrumented with its own special technique). These guidelines can be enforced by the lirary given the right design (i.e. having "disntace" return "distance_type" which actually stores the input) All that above allows algorithms to be called on arbitraty input types that don't need to carry the weight of providing robustness. i.e. machine integer and floating point numbers can be used. Couldn't be any simpler for the end user. Constructions OTOH are similarly back-boxes like predicates are, but serve different purposes since it is presumed that the algorithm is well written and never uses the results from constructions to feed predicates (or rather they do so only very carefully and by design, and only in those cases when it was impossible to express the algorithm output as a direct function of its input). That is, while the library provides, say, Point p = centroid(a,b,c) that 'p' is never used to drive the decisions on the algorithm (is never passed to a predicate), yet still that centroid can be a higher level facility that *interally* uses whatever it needs to come up with a reasonable approximated result. Whenever you can express an algorithm in terms of predicates and constructions such that predicates only operate on the input and never on the result of constructions, then your algorithm is robust automagically without any requirement on the input type and without the need to to anything at all to workaround numerical issues within the algorithm itself. That's the job of the predicate oracles. If an algorithm just can't call predicates with only the input and needs to step over intermediate results, things get more complicated. However, if all those intermediate results that would be feed into predicates are ONLY produced with "constructions", sound alternatives can be explored. One alternative, the one used the most, is to have the constructions produce higher level abstactions instead of actual numerical results. For instance, they can return the expression trees that refer to the input, and have the predicates evaluate that lazily on demand (so the predicate is still only really operating on the input). Again, this needs clever support from the library at design type in tgat, for example, the result from "centroid" would't be a point but a centroid_result, storing the input points as well as an approximated result, which could be perfecttly feed into predicates that could do the right thing having the input at hand. This works reasonably well given the inherent complexity but is still the subject of active research. Most of the time the use of DAGs for results is not really needed, so the ideal framework would be able to mix both native input and lazy expressions in a coherent manner. One starting idea (that I had many years ago but never had the time to evolve) would be to have a hybrid geometric type that uses a dynamic dual representation: input type or DAG. Now a word on predicates. In C++ it is actually somewhat trivial to have a blackbox predicate than can produce correct results in spite of the input type using the following technique known as a floating-point filter (and would be equally trivial *within* a virtual machine for any dynamic language FWIW): template<class T> bool pred( T a, T b ) { interval<T> ia = a ; interval<T> ib = b ; interval<T> r = some operations on ia and ib if ( certainly( r == 0 ) ) return true ; else if ( certainly( r != 0 ) ) return false ; else return pred<some_exact_type>(a,b); } That exploits the capability of interval arithmetic to implement certified logic. I won't explain how or why here, but "r == 0" doesn't return bool but TriBool, so you can explicitely query for the "I don't know" answer, and *only* in that case fall back to exact solution. All these machinery must ultimately use some mapping from input type and the interval and exact types needed. Designing that mapping needs to consder different aspects. For instance, not all algoritm require the same exact_type for instance: some might need an exact sqrt() for example. So, the mapping could need to be something like this: coordinate_trats<T>::template exact<requiremets_tag>::type Also, predicates and constructions serve different purposes so they also have differet requirement on the exact types. Going back to the previous straight skeleton example, while the topology must be right, the coordinates of the center nodes need not, BUT, a user could reasonably expect a black box costruction to neverthess come up with a decent approximation, to avoid for instance points totaly off because of floating point overflow. In this case the algorithm should use a "constructor" object that can somehow be told what exact type to use, which might be different than the one used for predicates. Thus, the traits might need to be more like this: coordinate_trats<T> ::template exact<for_predicate_tag, requiremets_tag>::type Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Simonson, Lucanus J

8:36 p.m.

Fernando Cacciola wrote:

...

I'll try to sketch the general ideas below to get you started. ... BIG SNIP ... distance_result d1 = distance(a,b) distance_result d2 = distance(c,d) if ( d1 < d2 ) blabla

The sage has come down from the mountain. I like to see your posts to the list. It makes me feel less self-conscious about my own long winded emails. Your distance_result example is sort of like my template <typename T> struct coordinate_traits { typedef T difference_type; } template <> struct coordinate_traits<int> { typedef long long distance_type; } Template <typename coordinate_type> foo(coordinate_type a, coordinate_type b, coordinate_type c coordinate_type d) { typedef typename coordinate_traits<T>::difference_type diff_type; diff_type d1 = distance(a, b); diff_type d2 = distance(c, d); //distance between coordinates is difference_type if( d1 < d2) blabla } I am exremely careful about overflow, because as you point out, it is pretty much the one thing that causes robustness problems in integer. I need to promote input integers to a larger data type even for addition and subtraction because these operations add one bit of precision and may overflow. I need to register what types to promote to for given situations. It is really amazing how fast I exceed the limits of builtins and need to resort to gmp to represent results of simple algebraic expressions. I have a scheme in mind to make floating point robust using my fixed-point implementation. The idea is to take the bounding box of the Boolean operation at hand and map that to the full integer extents. When mapping the floating point coordinates to integer we construct a map from integer back to floating point to ensure that we don't lose any precision by mapping back. Then we run the Boolean as fixed point, which is 100% robust and cannot fail. Finally we map the output of the Boolean back to floating point and look up each coordinate in our x and y integer to floating point maps to restore the original precision if any was lost in conversion. In this way, any and every floating point input can be handled robustly with a minimum loss of precision at the intersection points and none at all at the original vertices. To improve precision just use a larger integer data type. I think this scheme is workable and attractive because it avoids all the well documented pitfalls of making floating point robust. Also integer arithmetic is faster than floating point. This gives an integer algorithm a significant performance advantage that might offset the cost of sorting the input and extra time and storing the mapping vector. Most applications need uniform precision (cartography is a good example) whereas graphics is the only application I can think of that needs high precision around the origin and low precision far from the origin. This scheme ought to cover most of the needs for floating point geometry. Can you identify any problem with the idea? What about the case where the skeleton of a 1x1 integer unit square results in degenerate topologies when the construction of its center point snaps to one of its corners? The algorithm has to be robust to the degenerate cases that it itself might produce, as well as those present in the input. Furthermore, it is preferable not to output degenerate geometries because this can cause downstream code to choke. I find it very unconvincing when a booleans algorithm claims to handle, for example, self intersecting polygons, but upon closer examination leaves them self intersecting at the output. This is hardly handling the degeneracy in my opinion. Robust to degeneracy should rightly mean robust to degeneracy at the input and free of degeneracy at the output, otherwise you're setting a trap in the path of the guy who uses the result. Thanks, Luke

Thomas Klimpel

10:26 p.m.

Simonson, Lucanus J wrote:

...

Most applications need uniform precision (cartography is a good example) whereas graphics is the only application I can think of that needs high precision around the origin and low precision far from the origin. This scheme ought to cover most of the needs for floating point geometry. Can you identify any problem with the idea?

I really like the idea, but I can't help pointing out a mismatch between the integer and floating-point philosophies. The typical integer philosophy would allow to introduce new points into existing segments, but would "forbid" the computed result (taken literally as rounded to integer coordinates) to have more intersections than actually reported. The typical floating-point philosophy would allow the computed result (when represented as by floating-point numbers) to have more intersections than actually reported (as long as these additional intersections would only be artifacts of the floating-point representation), but would "forbid" to introduce new points into existing segments. Because the initial topology might be different than the topology when converted to integer coordinates, the computed result might (in admittedly rare circumstances) have more intersections than actually reported, and the computation might also have introduced additional points into existing segments, in order to assure that the computed result in integer coordinates doesn't have more intersections than actually reported. I still think it's a good idea to use the described scheme. You already have the integer algorithms, so it's only natural to reuse them. When I read 'Ulrike Bartuschka, Kurt Mehlhorn, Stefan N¨aher', "A robust and Efficient Implementation of a Sweep Line algorithm for the Straight Line Segement Intersection Problem", it soon become clear that the LEDA rat_point/rat_segment data-types where there first, so that it was only natural to benefit from them. (I know that this interpretation is a bit unfair. The authors themselves give a better justification of why they choose their approach: "The first approach is the most general and is known under the the exact geometric computation (EGC) paradigm and [...]. The EGC approach has been adopted for the software libraries LEDA and CGAL and other successful implementations.") Regards, Thomas

Simonson, Lucanus J

15 Mar 15 Mar

1:20 a.m.

Thomas Klimpel wrote:

...

I really like the idea, but I can't help pointing out a mismatch between the integer and floating-point philosophies. The typical integer philosophy would allow to introduce new points into existing segments, but would "forbid" the computed result (taken literally as rounded to integer coordinates) to have more intersections than actually reported. The typical floating-point philosophy would allow the computed result (when represented as by floating-point numbers) to have more intersections than actually reported (as long as these additional intersections would only be artifacts of the floating-point representation), but would "forbid" to introduce new points into existing segments.

Because the initial topology might be different than the topology when converted to integer coordinates, the computed result might (in admittedly rare circumstances) have more intersections than actually reported, and the computation might also have introduced additional points into existing segments, in order to assure that the computed result in integer coordinates doesn't have more intersections than actually reported.

You are right that I introduce intersections on segments in order to assure that the computed result doesn't have more intersections than actually reported. I'm surprised, however, that unreported intersections are commonly considered allowable in floating point if they are below a threshold. Wouldn't mismatches between thresholds in different floating point modules lead to such unreported intersections in the output causing an invariant to be violated in someone else's input and their code to fail? Isn't it equally possible to introduce intersections on floating point segments if they were "close enough" to intersecting to assure that the computed result in floating point doesn't have more intersections than actually reported. I was working on devising a way to compute x and y epsilon values from the bounding box of the floating point input such that I could bound the lateral movement of segments due to floating point grid snapping to those epsilons and assure that the computed result doesn't contain unreported intersections. It is basically trying to apply my integer methodology to floating point. Is this the opposite of the normal approach to floating point, which you suggest is to ignore the potential to creat such intersections as artifacts of floating point grid snapping since they should be below some threshold and can be ignored? I know the commercial, floating-point-based, 3D modeling, meshing and physical simulation tools we use are frequently failing due to numerical robustness issues. They place very tight restrictions on the cleanlyness of their inputs, and meeting those kinds of tight restrictions should be the goal of a geometry library, or it precludes its interoperability with quite a few applications. Working around these numerical issues in commercial software consumes significant engineering effort inside Intel. It also keeps the software vendors busy providing us with support and software updates. By this evidence, floating point robustness doesn't appear to be a solved problem. There is no way that a boost library can provide that kind of labor intensive support. In order to be both free and useful, it has to be absolutely reliable and generate no support load for the maintainer due to robustness issues and not need such support to be used for real world applications. I would recommend against accepting a geometry library to boost if I thought it would result in an avalanche of robustness issues being reported by unhappy users. Nor would I want to be responsible for supporting such a boost library. I am only proposing my library because I'm confident in the reliability of what I've done for integer coordinates. Do you think employing my fixed point scheme for floating point coordinates would result in issues when using it in applications? If so I need to understand what those issues are to decide whether going forward with the scheme is a good idea. Thanks for the LEDA reference, by the way, that will help me with my literature search for the boostcon paper. Thanks, Luke

Thomas Klimpel

1 p.m.

Simonson, Lucanus J wrote:

...

I'm surprised, however, that unreported intersections are commonly considered allowable in floating point if they are below a threshold.

Whether such intersections are allowable or not depends on what you want to do, and on the tool you use for doing it. It didn't intent to say that tools taking their input as floating-point (should) consider unreported intersections allowable in floating-point if they are below a threshold. The tagging with "integer philosophy" and "floating-point philosophy" was just meant to distinguish two possible choices for what is considered as valid result. My underlying statement is that one is forced to take a choice here. More precisely, I think that the following two statements apply in cases where the 'exact' result cannot be represented in the used output format: If you "forbid" to introduce new points into existing segments, you must allow for certain unreported (spurious) intersection. If you "forbid" to have certain unreported (spurious) intersection, you must allow to introduce new points into existing segments. The tagging of these choices with "integer philosophy" and "floating-point philosophy" was probably a bad idea. You asked "Can you identify any problem with the idea?", and I pointed out that you risk violating both conditions for what is considered as valid result. That said, I really think that this idea is the way to go, but that it is important to be explicit about its limitations. In order to be explicit, I will try to describe the conversion process in detail, so that the places where information is lost (without hope of recovery) can clearly be identified. The input is in floating-point, so that each x- and y-coordinate of the points has it's own exponent. The conversion to fixed point involves fixing a common exponent for all x-coordinates and a common exponent for all y-coordinates. Choosing the highest occurring exponent of the x-coordinates as common exponent for the x-coordinates and the highest occurring exponent of the y-coordinates as common exponent for the y-coordinates looks like a good idea to me, so this is how I will proceed (remember we are just talking about a factor 2 for binary numbers, not a factor 10 as for decimal numbers). Now I assume that the fixed-point type has at least as many bits as the mantissa (including sign) of the floating point type, so that the coordinates with the highest exponent will be representable without loss of accuracy. So the points that will loose the most accuracy will be the one around the origin, that will basically all be mapped to 0 (assuming their exponent is sufficiently different from the highest occurring exponent). These points might have described some geometry with intersections, but we will not be able to report any of these intersection, because they are "invisible" on the fixed-point grid. So the statement "In this way, any and every floating point input can be handled robustly with a minimum loss of precision at the intersection points and none at all at the original vertices." might be a bit too optimistic about the loss of precision at the original vertices. Inserting the neighboring grid-points into such segments directly after the real endpoints might remove the unreported intersections, but the effect of this is basically equivalent to modifying the coordinates of the original point (only that we now introduced overlapping segments, that might cause downstream code to choke, as you say). The mapping to the original coordinates might still be useful, but I think that modifying some of the original coordinates might be OK in cases where the alternative would be to introduce overlapping segments. Regards, Thomas

Simonson, Lucanus J

16 Mar 16 Mar

5:41 p.m.

Thomas Klimpel wrote:

...

The mapping to the original coordinates might still be useful, but I think that modifying some of the original coordinates might be OK in cases where the alternative would be to introduce overlapping segments.

I realize after giving the matter more thought that there is not necessarily a 1:1 mapping from fixed point back to floating point coordinates, this makes the idea of a mapping back to floating point less workable. I suppose I could use a multi-map and map back to the centroid of multiple points. I was already aware that converting back to floating point would cause some small movement of line segments which could result in non-reported intersections. These would not be an issue if fed back into my algorithm, but could pose problems for other code. As you point out, there is no way to output the correct topology without ignoring intersections that are artifacts of approximation of the intersection points and there is no way to report those intersections without modifying the output topology. In some circumstances the output topology is paramount, such as the straight skeleton. In my case I think topology is secondary to the output being free of non-reported intersections. If I convert fixed point output to floating point coordinates I cannot guarentee either output topology or absence of unreported intersections, as you also observe. Topologically the thing that must be true about the output of my algorithm is that it is closed polygons. That invariant cannot be harmed by introducing vertices as artifacts of snapping, or even by merging vertices that are too close. The other invariant that must be true about my output is that it contains no intersecting line segments. My own code relies upon that invariant, and it is a requirement at the input to all the commercial software we use downstream from my code. Luke

Fernando Cacciola

4:52 p.m.

Hi Luke,

...

Thomas Klimpel wrote:

...
The mapping to the original coordinates might still be useful, but I think that modifying some of the original coordinates might be OK in cases where the alternative would be to introduce overlapping segments.

I realize after giving the matter more thought that there is not necessarily a 1:1 mapping from fixed point back to floating point coordinates, this makes the idea of a mapping back to floating point less workable. I suppose I could use a multi-map and map back to the centroid of multiple points. I was already aware that converting back to floating point would cause some small movement of line segments which could result in non-reported intersections. These would not be an issue if fed back into my algorithm, but could pose problems for other code. As you point out, there is no way to output the correct topology without ignoring intersections that are artifacts of approximation of the intersection points and there is no way to report those intersections without modifying the output top ology. In some circumstances the output topology is paramount, such as the straight skeleton. In my case I think topology is secondary to the output being free of non-reported intersection s. If I convert fixed point output to floating point coordinates I cannot guarentee either output topology or absence of unreported intersections, as you also observe. Topologically the thing that must be true about the output of my algorithm is that it is closed polygons. That invariant cannot be harmed by introducing vertices as artifacts of snapping, or even by merging vertices that are too close. The other invariant that must be true about my output is that it contains no intersecting line segments. My own code relies upon that invariant, and it is a requirement at the input to all the commercial software we use downstream from my code.

For all that I would recommend that you simply NOT support booleans of floating point polygons at all, and let that one be later on added via an algorithm that considers whatever is appropiate for floats from the ground up. Without any benchmark I would imagine that your integer based solution is faster than, say, CGAL's clipper, which is robust for floating points but uses the floating-point filtering I outlined before, which as I said has some overhead likely higher than yours. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Fernando Cacciola

12:17 a.m.

Hi Luke,

...

You are right that I introduce intersections on segments in order to assure that the computed result doesn't have more intersections than actually reported. I'm surprised, however, that unreported intersections are commonly considered allowable in floating point if they are below a threshold.

Tolerance-based decisions in floating point computatioms are the very source of robustness issues.

...

Wouldn't mismatches between thresholds in different floating point modules lead to such unreported intersections in the output causing an invariant to be violated in someone else's input and their code to fail? Isn't it equally possible to introduce intersections on floating point segments if they were "close enough" to intersecting to assure that the computed result in floating point doesn't have more intersections than actually reported. I was working on devising a way to compute x and y epsilon values from the bounding box of the floating point input such that I could bound the lateral movement of segments due to floating point grid snapping to those epsilons and assure that the c omputed result doesn't contain unreported intersections.

NO. Please don't go that route. Epsilon-geometry is just NOT the right way, even if it might appear to be in some particular algorithm where the effect of arbitrary snapping within an uncertainity disk is less catastrophic. This would be a non-robust implementation just like any other one.

...

It is basically trying to apply my integer methodology to floating point. Is this the opposite of the normal approach to floating point, which you suggest is to ignore the potential to creat such intersections as artifacts of floating point grid snapping since they should be below some threshold and can be ignored?

FWIW, the "normal approach" is the wrong one.

...

I know the commercial, floating-point-based, 3D modeling, meshing and physical simulation tools we use are frequently failing due to numerical robustness issues. They place very tight restrictions on the cleanlyness of their inputs, and meeting those kinds of tight restrictions should be the goal of a geometry library,

Not even.. those restrictions exist to begin with because of the incorrect handling of robustness.

...

or it precludes its interoperability with quite a few applications. Working around these numerical issues in commercial software consumes significant engineering effort inside Intel.

How you ever tried CGAL? I have a customer of my polygon offsetting implementation (the one in CGAL) in the VLSI verification industry. They perform offseting of polygons whose vertices might very well be separated less than 1e-5, for instance. And of course there isn't any threshold adjusted for that.

...

It also keeps the software vendors busy providing us with support and software updates.

Each time I had to provide support for that code it turned out to be a plain old bug. Never a numerical issue.

...

By this evidence, floating point robustness doesn't appear to be a solved problem.

The general purpose floating-point filtering technique I outlined in my previous post involves an overhead ranging from 20% to 50%, compared to "naive" epsilon-based implementations. If having to pay that overhead is unacceptable, then it is not solved, but if it is, then it is definitely a solved problem.

...

There is no way that a boost library can provide that kind of labor intensive support.

Indeed

...

In order to be both free and use ful, it has to be absolutely reliable and generate no support load for the maintainer due to robustness issues and not need such support to be used for real world applications.

Right

...

I would reco mmend against accepting a geometry library to boost if I thought it would result in an avalanche of robustness issues being reported by unhappy users. Nor would I want to be responsible for supporting such a boost library. I am only proposing my library because I'm confident in the reliability of what I've done for integer coordinates.

Great

...

Do you think employing my fixed point scheme for floating point coordinates would result in issues when using it in applications?

Would it use any sort of epsilon to decide on the snapping? if so, then it will result in issues, so, don't do it :) Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Fernando Cacciola

15 Mar 15 Mar

8:55 p.m.

Hi Thomas,

...

Simonson, Lucanus J wrote:

...
Most applications need uniform precision (cartography is a good example) whereas graphics is the only application I can think of that needs high precision around the origin and low precision far from the origin. This scheme ought to cover most of the needs for floating point geometry. Can you identify any problem with the idea?

I really like the idea, but I can't help pointing out a mismatch between the integer and floating-point philosophies. The typical integer philosophy would allow to introduce new points into existing segments, but would "forbid" the computed result (taken literally as rounded to integer coordinates) to have more intersections than actually reported. The typical floating-point philosophy would allow the computed result (when represented as by floating-point numbers) to have more intersections than actually reported (as long as these additional intersections would only be artifacts of the floating-point representation), but would "forbid" to introduce new points into existing segments.

It easy to see how that mismatch comes only from using intermediate results (which includes the snapped input) to drive the topology of the result. If OTOH the algorithm used the input (*as is* without even the snapping) then it would matter at all what number type is used, integer or floating point (provided the algorithm uses the predicates correctly)

...

Because the initial topology might be different than the topology when converted to integer coordinates,

Which is why snap rounding can be used for certain algorithms.

...

the computed result might (in admittedly rare circumstances) have more intersections than actually reported, and the computation might also have introduced additional points into existing segments, in order to assure that the computed result in integer coordinates doesn't have more intersections than actually reported.

I would say that an implementation shouldn't do any of that all. Based on my experience, all that hacking becomes needed when then algorithm *logic* is driven by its own intermediate results, so it is *that* what needs to be avoided to begin with. Many many "existing" algorithms can be refactored to do that.

...

I still think it's a good idea to use the described scheme. You already have the integer algorithms, so it's only natural to reuse them. When I read 'Ulrike Bartuschka, Kurt Mehlhorn, Stefan N¨aher', "A robust and Efficient Implementation of a Sweep Line algorithm for the Straight Line Segement Intersection Problem", it soon become clear that the LEDA rat_point/rat_segment data-types where there first, so that it was only natural to benefit from them.

Indeed. As I said to Luke, intersection-based problems naturally fit into snap rounding. But other don't. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Thomas Klimpel

17 Mar 17 Mar

10:33 p.m.

Hi Fernando, The clear position of Lucanus Simonson with respect to the "allowed" output of a line segments intersection algorithm and your reactions to his position makes me wonder which output you consider "allowed". The figures 1a), 1b), 2) and 3b) of John Hobby's publication http://cutebugs.net/files/curve/2-27.ps clearly illustrate the discussed issue. I considered 2) as non-acceptable output and Lucanus Simonson considered 1b) as non-acceptable. (The output 3b) seems to be the most attractive one for this example.) Your position that geometric algorithms should use exactly evaluated "predicates" called explicitly on the input (and never ever on an intermediate result) to drive the topological construction doesn't seem to settle this question. Why not? Because the output 1b) would not be allowed as input to your algorithm, it's hard to see why you should allow it as output from your algorithm. The output 2) and 3b) on the other hand clearly fall into the category "modification of input" (because a straight segment was replaced by a bent segment), which you explicit call "hacking". And it clearly is hacking in the sense that it doesn't generalize to other problems (like the straight skeleton for example). I somehow got the impression that 1b) is indeed the output that you consider "allowable", but I would like you to confirm or correct this impression. Just to be clear, output 1a) is not an option, because the "output-format" doesn't allow reporting it. Regards, Thomas

Simonson, Lucanus J

11:25 p.m.

Thomas Klimpel wrote:

...

Hi Fernando,

The clear position of Lucanus Simonson with respect to the "allowed" output of a line segments intersection algorithm and your reactions to his position makes me wonder which output you consider "allowed".

The figures 1a), 1b), 2) and 3b) of John Hobby's publication http://cutebugs.net/files/curve/2-27.ps clearly illustrate the discussed issue.

I considered 2) as non-acceptable output and Lucanus Simonson considered 1b) as non-acceptable. (The output 3b) seems to be the most attractive one for this example.)

Just to be clear, 3b is what my algorithm does. I always round fractional values down with a floor() operation whereas Hobby suggests rounding to the closest whole number. I think he also has a complicated scheme for collapsing point clusters whereas I just collapse points together if they are in the same 1x1 unit grid. I'll be sure to cite Hobby in my boostcon submission. I did read his paper before I devised my robustness strategy.

...

Your position that geometric algorithms should use exactly evaluated "predicates" called explicitly on the input (and never ever on an intermediate result) to drive the topological construction doesn't seem to settle this question.

I think Fernando did mention that the requirements for the output of line intersection differed from that of straight skeleton, which was his example. Also, my implementation never stores intermetiate results as keys in the tree and never computes intersections based on intermediate results. It works purely by storing input line segments in the tree and comparing them with exact predicates and also computes intersections based on the input line segments only, exactly as Fernando suggests. The snapping of intersection points may introduce benign topological changes in the output, as you point out. These changes are benign because in the case of line intersection for polygon clipping the critical topological consideration is that all closed cycles remain closed (or degenerate) after any topological change. Since the algorithm is robust to all colinear/duplicate point type degeneracy it is also robust to such degeneracy introducted by intersection snapping. The algorithms that consume straight skeleton have different topological requirements. In order to understand what is allowable in the output of an algorithm you have to understand the broader context in which that algorithm is applied. For example it is frequently the case that subsequent to polgyon clipping the polygon is fed into meshing. That means that the output of the clipping should satisfy the input requirements of the meshing algorithms. If it doesn't you've got a square peg and a round hole. I don't know the context for straight skeleton, since my own uses for it are limited to centerline extraction of VLSI physical layout data. We don't really encounter numerical issues except if the width of a wire happens to be an odd number, in which case some un-cautious implementations of integer skeleton of manhattan geometry fail spectacularly. ;) I wonder if my generic scanline framework can be adapted to provide straight skeleton, the problem is ammenable to scanline if I'm not mistaken.... Regards, Luke

Fernando Cacciola

19 Mar 19 Mar

3:48 p.m.

Hi Thomas, Sorry for the late response.

...

Hi Fernando,

The clear position of Lucanus Simonson with respect to the "allowed" output of a line segments intersection algorithm and your reactions to his position makes me wonder which output you consider "allowed".

The figures 1a), 1b), 2) and 3b) of John Hobby's publication http://cutebugs.net/files/curve/2-27.ps clearly illustrate the discussed issue.

3b) would be a reasonable result. 1b) could also depending on the application: in rendering for example it could be totally acceptable.

...

I considered 2) as non-acceptable output and Lucanus Simonson considered 1b) as non-acceptable. (The output 3b) seems to be the most attractive one for this example.) Your position that geometric algorithms should use exactly evaluated "predicates" called explicitly on the input (and never ever on an intermediate result) to drive the topological construction doesn't seem to settle this question.

Put like that, no it doesn't :) But notice that I carefully said "unless you really have to use intermediate results, and if you do, use them very very carefully"

...

Why not? Because the output 1b) would not be allowed as input to your algorithm, it's hard to see why you should allow it as output from your algorithm. The output 2) and 3b) on the other hand clearly fall into the category "modification of input" (because a straight segment was replaced by a bent segment), which you explicit call "hacking". And it clearly is hacking in the sense that it doesn't generalize to other problems (like the straight skeleton for example). I somehow got the impression that 1b) is indeed the output that you consider "allowable", but I would like you to confirm or correct this impression.

Yes, (1b) could be considered allowed as well, depending on what you need to do with the result. These questions are excelent to put what I've said before in a much better persective: What I outlined are general guidelines for all types of geometric computation problems. As such, it has particular cases where something case-specific must be done, as in this case. OTOH the very fact that the guidelines don't directly apply calls for reconsidering the inmediate solution a bit further. In the case of booleans, for instance, one could notice the following: Let's label the points in the figures, in topological order, for the sake of discussion: P, Q, R, S such that the segments are PQ, QR, RS and segments PQ and RS intersect at point I Given exact predicates operating exclusively in the input, it is possible to know that segments PQ and RS intersect at I, but there is no other intersection, specially NOT with QR. That indicates that all the problems stated in the cited paper comes from approximating the intersection point embeeding it into a finite space, which is a problem that can be considered separated from the boolean operation algorithm (which a researcher could express in tems of a RealRAM). Therefore, a boolean operations algorithm could be designed in a such a way that it just doesn't attempt to ouput ANY approximation for intersections points at all. Instead, it could merely return the Planar Directed Graph (PDG) that corresponds to the result, with all intersection points given "expressively": i.e. using a representation that actually stores the four points that define the two intersecting segments. Such an algorithm would have the luxury of totally ignoring the potential side effect of introducing "unreported intersections" as you call it. It is extremely useful to be able to implement a geometric algorithm with such a freedom. Then a second, separate and composable "embeeding" algorithm could take on the job of approximating intersection points fullfilling certain output requirements. Given a DAG, such an algorithm can produce (3b) since it can know using the exact predicates that point I is to one side of segment QR but the best approximation it can generate is (as reported by the predicate but this time called on the potential result) to the other side, so it MUST modify the idealized result splitting and snapping QR. Granted, this is using the predicate with an intermediate result, but in a place specifically designed to carefully consider the effect of rounding and with the "ideally correct" result already in hand (the DAG). Notice how the general exact computation paradigm lead us to decompose an algorithm into concerns that could not have been obviously separated. As a consequence, one can now consider the possibility of providing a toolbox of distinct "embeeding" steps that, for instance produce (1b), which could be more than acceptable, for example, for rendering. Users would choosen the one they need. Even more, the entire geometric computing framework could follow a similar design by pushing the actual computation of intermediate results as far ahead as possible. That is, there could be several algorithms taking the DAG produced by the "error-free" boolean operation and go from there. Only in the very end of the processing pipeline the results would have to be embeeded, but now in a context where approximation erros have little, if any, impact. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Paul A. Bristow

7:15 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Fernando Cacciola Sent: 19 March 2009 15:48 To: boost@lists.boost.org Subject: Re: [boost] [geometry] robustness approaches

I've been following this erudite discussion with interest and a modest degree of understanding. But one thing seems clear to me - that any quality implementation is going to require at the very least 'big' ints and 'big' floats, and probably exact int and exact floats. It would seem that we need tried and tested Boost license implementations - preferably before starting work on a complex geometry problem. No solution can be considered for Boost if it uses any restrictive-licensed components. Is this/these something that someone could propose and mentor for a GSoC project? (I'm not offering - just asking ;-) Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Simonson, Lucanus J

8:10 p.m.

Paul A. Bristow wrote:

...

I've been following this erudite discussion with interest and a modest degree of understanding.

But one thing seems clear to me - that any quality implementation is going to require at the very least 'big' ints and 'big' floats, and probably exact int and exact floats.

It would seem that we need tried and tested Boost license implementations - preferably before starting work on a complex geometry problem. No solution can be considered for Boost if it uses any restrictive-licensed components.

GMP is licensed under the LGPL. That's not such a bad license. I don't know why we need to bypass it with a free implementation. Even so I don't depend on gmp header files in my geometry library. Instead I allow the "big" type to be overridden by specializing a meta-function that looks it up. I don't need to depend on GMP header files in my library code to use GMP with my library in an application. There is, incidently, a boost multi-precision library under development. Brandon tried it out about a year ago. It is a tall order to get performance up to the level of GMP, which uses inline assembly and clever C++ tricks to eliminate unnecesary copying of the heavy number types. Luke

Paul A. Bristow

20 Mar 20 Mar

8:55 a.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Simonson, Lucanus J Sent: 19 March 2009 20:11 To: boost@lists.boost.org Subject: Re: [boost] [geometry] robustness approaches

Paul A. Bristow wrote:

...
I've been following this erudite discussion with interest and a modest degree of understanding.

But one thing seems clear to me - that any quality implementation is going to require at the very least 'big' ints and 'big' floats, and probably exact int and exact floats.

It would seem that we need tried and tested Boost license implementations - preferably before starting work on a complex geometry problem. No solution can be considered for Boost if it uses any restrictive-licensed components.

GMP is licensed under the LGPL. That's not such a bad license. I don't know why we need to bypass it with a free implementation. Even so I don't depend on gmp header files in my geometry library. Instead I allow the "big" type to be overridden by specializing a meta-function that looks it up. I don't need to depend on GMP header files in my library code to use GMP with my library in an application. There is, incidently, a boost multi-precision library under development. Brandon tried it out about a year ago. It is a tall order to get performance up to the level of GMP, which uses inline assembly and clever C++ tricks to eliminate unnecesary copying of the heavy number types.

I'm sure GMP and mfpr are good. But the license issue should be fully resolved before you/we get too far down the line? Paul --- Paul A. Bristow Prizet Farmhouse Kendal, UK LA8 8AB +44 1539 561830, mobile +44 7714330204 pbristow@hetp.u-net.com

Fernando Cacciola

12:28 p.m.

Hi Paul,

...

I'm sure GMP and mfpr are good.

But the license issue should be fully resolved before you/we get too far down the line?

Right. IIRC there has never been a precedent of a Boost library practically requiring (to be actually useful) an external non-boost library, so this deserves its own discussion. While in the case of GTL, it is thechnically up to the user to pick up "their own" multipresicion number types, they actually can't do that without an external non-boost library since such number types do not exists in the SCL. So I think you are right in that, without that discussion, the formnal review of GTL will quickly wind up in a silly licensing war. Paul, can you start a new thread on the topic? TIA -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Sebastian Redl

12:53 p.m.

Fernando Cacciola wrote:

...

Hi Paul,

...
I'm sure GMP and mfpr are good.

But the license issue should be fully resolved before you/we get too far down the line?

Right.

IIRC there has never been a precedent of a Boost library practically requiring (to be actually useful) an external non-boost library, so this deserves its own discussion. As has been pointed out to me during an XML discussion, there's Boost.MPI and Boost.Python.

Sebastian

Simonson, Lucanus J

10 p.m.

Fernando Cacciola wrote:

...

IIRC there has never been a precedent of a Boost library practically requiring (to be actually useful) an external non-boost library, so this deserves its own discussion.

While in the case of GTL, it is thechnically up to the user to pick up "their own" multipresicion number types, they actually can't do that without an external non-boost library since such number types do not exists in the SCL.

That may be too strong a statement. Long double provides quite a few bits of precision. There is some upper limit of input bit width which long double is still sufficient to compute the exact result. As that bit width is exceeded the odds of a fault increase as the input bit width increases. If we say the library is by default robust only up to 25 bits (or whatever I determine to be the case) then users have the option of performing robust calculations within that constraint. If they want to handle larger inputs they can provide high precsion data types to override long double. I have been quite successful in executing very large test cases without needing gmp. I performed the n-layer map overlay on dozens of layers of thousands of segmented circles each that were intersecting quite frequently and it ran just fine without gmp. Luke

Fernando Cacciola

10:26 p.m.

Hi Luke,

...

Fernando Cacciola wrote:

...
IIRC there has never been a precedent of a Boost library practically requiring (to be actually useful) an external non-boost library, so this deserves its own discussion.

While in the case of GTL, it is thechnically up to the user to pick up "their own" multipresicion number types, they actually can't do that without an external non-boost library since such number types do not exists in the SCL.

That may be too strong a statement. Long double provides quite a few bits of precision. There is some upper limit of input bit width which long double is still sufficient to compute the exact result. As that bit width is exceeded the odds of a fault increase as the input bit width increases. If we say the library is by default robust only up to 25 bits (or whatever I determine to be the case) then users have the option of performing robust calculations within that constraint.

The problem with this view is that users have no way to restrain the input in such a way that all calculations are exact within N bits, for whatever N. Users are completely helpless since they just have their input, whatever that is, and they can't do anything about it to make the algorithm work with a number type of N bits because they just wouldn't know what to do. That is, there is no know normalization process that can be applied to the input so as to keep the required number of working bits within long double that can be applied to general geometric algorithms. There has been some research, for example, on using single precision in the input so as to keep exact results within long double when doing line intersections (so that applies to clipping) but it doesn't generalize to other algoritrhms and pose certain restrictions on the input data itself (sort of force a snapping on it), which may very well be unacceptable if the input uses floating-point to begin with (instead of integer or fixed-precision coordinates) Keep in mind that we don't see GTL as a "clipping" library only, so its robustness must be general. Floating point operarions won't even trap if they loose significant precision (if programmed in portable C++), so the ibrary will just silently fail to produce a correct result. I like to consider Robustness as an all or nothing game: it can't work in 100%-epsilon cases. It is either 100% robust or not all.

...

If they want to handle larger inputs they can provide high precsion data types to override long double.

This won't work. There is no definition of "larger input" that a user can measured. Here's a practical example: Implement a predicate to compare the distance between two intersection points of two *lines* given as two points each against a third using long double. That is, the predicate takes 9 points. Now feed into that two almost parallel lines like these: the first being vertical and VERY SHORT, say with the top point 1e-2 away fromm the botton. the second almost vertical but VERY LARGE, with the top and bottom vertices separated by, say 1e4 or so, and with one of the vertices slightly shifted by a very small amount, say 1e-3 (to make it NOT really vertical) And a similar setup but with vertical lines. Now lay with the small distances in there to see how way off the long double precision you need to be to compute that predicate exactly.

...

I have been quite successful in executing very large test cases without needing gmp. I performed the n-layer map overlay on dozens of layers of thousands of segmented circles each that were intersecting quite frequently and it ran just fine without gmp.

That experience just isn't sufficient. Here's a tale of a similar experience: When I first incorporated GPC (which btw has, easily, thousands of users) into a project, I spend a lot of time studying its robutness by feeding it specially designed test cases. It passed them all, so it appeared to evident that it was robust for all practical purposes. The project evolved until one day I started to recieve, with increasing frequency, bug reports that boiled down (after many wated hours) to be robustness issues within GPC. Soon a geometric pattern popped up as the culprit: a complex figure F and another figure F2 obtained by shifting F a really *really* small amount, then subtrating F2 from F. That pattern totally choked GPC, even after years of succidingly producing correct results. In the end my client had to buy the CGAL clipper wich is slower, nowhere cheap, but truly 100% robust. For general geometric problems there is no *practical* input geometry constrain (that a user can measure and handle) which would guarantee that only M bits of working precision are needed, so 100% robustness can ONLY come from the ability of the implementation to adaptively use as many bits as needed. Only in a case-by-case basis it is possible to measure the upper bounds on the number of bits needed and then state whether long double is good enough ot not. But then, for the specific case of clipping, it turns out that the number of required working bits for ALL posible input can be shown to be way off the precision of long double, even if the input is restricted to floats unless it is restricted to a float grid. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Fernando Cacciola

11:49 p.m.

Hi,

...

That pattern totally choked GPC, even after years of succidingly producing correct results. In the end my client had to buy the CGAL clipper wich is slower, nowhere cheap, but truly 100% robust.

Btw, since then I have a "classic procedure" to stress test the robustness of a polygon clipper: Take a typical real world GIS polygon, which has the interesting property of having regions with very very short segments, interleaved with regions with very very long ones. Call it polygon A. Create a polygon B by moving A a very short amount horizontally: say 1e-5. Then, the following: A - ( (A-B) + (B-A) ), without any expression simplification (that is with the intermediate binary results effectively computed) should be EXACTLY the empty set. This is a test that can be automated, so it doesn't rely on the *impression* that the result is correct. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Simonson, Lucanus J

21 Mar 21 Mar

12:14 a.m.

Fernando Cacciola wrote:

...

When I first incorporated GPC (which btw has, easily, thousands of users) into a project, I spend a lot of time studying its robutness by feeding it specially designed test cases. It passed them all, so it appeared to evident that it was robust for all practical purposes.

The project evolved until one day I started to recieve, with increasing frequency, bug reports that boiled down (after many wated hours) to be robustness issues within GPC. Soon a geometric pattern popped up as the culprit: a complex figure F and another figure F2 obtained by shifting F a really *really* small amount, then subtrating F2 from F.

That pattern totally choked GPC, even after years of succidingly producing correct results. In the end my client had to buy the CGAL clipper wich is slower, nowhere cheap, but truly 100% robust.

That's really interesting. I didn't realize GPC was so used. We are testing GPC to compare its performance to GTL. Our impression of GPC was not very favorable based on its web site, API and feature set. I figured out that its input polygons can be "multiple disjoint contours" but it doesn't handle multiple overlapping contours. If I have thousands of polygons and they are not disjoint, how do I merge them together with GPC? Luke

Fernando Cacciola

20 Mar 20 Mar

11:37 p.m.

Hi Luke,

...

Fernando Cacciola wrote:

...
When I first incorporated GPC (which btw has, easily, thousands of users) into a project, I spend a lot of time studying its robutness by feeding it specially designed test cases. It passed them all, so it appeared to evident that it was robust for all practical purposes.

The project evolved until one day I started to recieve, with increasing frequency, bug reports that boiled down (after many wated hours) to be robustness issues within GPC. Soon a geometric pattern popped up as the culprit: a complex figure F and another figure F2 obtained by shifting F a really *really* small amount, then subtrating F2 from F.

That pattern totally choked GPC, even after years of succidingly producing correct results. In the end my client had to buy the CGAL clipper wich is slower, nowhere cheap, but truly 100% robust.

That's really interesting. I didn't realize GPC was so used. We are testing GPC to compare its performance to GTL. Our impression of GPC was not very favorable based on its web site, API and feature set. I figured out that its input polygons can be "multiple disjoint contours" but it doesn't handle multiple overlapping contours. If I have thousands of polygons and they are not disjoint, how do I merge them together with GPC?

That's actually possible. I just found that is missing in the online docs though, but it is in the actual API. At least in the one I still have lurking on the code base (though unsued). There is this function: void gpc_polygon_overlap (xgpc_op overlap_operation, int depth, gpc_polygon *subject_polygon, gpc_polygon *result_polygon); Using this operation code: typedef enum { GPC_EQUAL, /* Overlaps == depth value */ GPC_EQUAL_MORE, /* Overlaps >= depth value */ GPC_EQUAL_B, /* Overlaps == depth value (bridged) */ GPC_EQUAL_MORE_B /* Overlaps >= depth value (bridged) */ } xgpc_op; Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Fernando Cacciola

1:54 a.m.

Hi Paul,

...

I've been following this erudite discussion with interest and a modest degree of understanding.

But one thing seems clear to me - that any quality implementation is going to require at the very least 'big' ints and 'big' floats, and probably exact int and exact floats.

Yes, at the very least.

...

It would seem that we need tried and tested Boost license implementations - preferably before starting work on a complex geometry problem. No solution can be considered for Boost if it uses any restrictive-licensed components.

I thought about this as well. I think we could really appreciate if Boost could finally get a big int/float number type. There even is partial work lurking on the vault for so long I lost count. OTOH, I don't think that a library CAN'T push that into user space, even if in practice it *requires* users to use a non-boost licensed implementation, provided it is LPGLed or some other reasonably free enough. At the very least, I think gmp, which is LPGL and "needed" by GTL (for all practical purposes), deserves a special consideration. It is a de-facto standard to such an extent that I even question if it is really worth the effort of preproducing that. The same goes for mpfr in the floating point arena. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Thomas Klimpel

19 Mar 19 Mar

8:26 p.m.

Fernando Cacciola wrote:

...

Sorry for the late response.

Your response was very helpful. It made it clear to me how you intent the geometry library to work. I will have to think about whether the proposed output converter will be able to resolve the problem of the floating-point output. (Note that the other discussed approaches were unable to produce floating-point output fulfilling Luke's requirements, so this is a nontrivial question.) Regards, Thomas

Fernando Cacciola

15 Mar 15 Mar

8:41 p.m.

Hi Luke,

...

Fernando Cacciola wrote:

...
I'll try to sketch the general ideas below to get you started. ... BIG SNIP ... distance_result d1 = distance(a,b) distance_result d2 = distance(c,d) if ( d1 < d2 ) blabla

The sage has come down from the mountain. I like to see your posts to the list. It makes me feel less self-conscious about my own long winded emails.

:) it was a bit long wasn't it?

...

Your distance_result example is sort of like my template <typename T> struct coordinate_traits { typedef T difference_type; }

template <> struct coordinate_traits<int> { typedef long long distance_type; }

Nice. And greacefully correct for int... io most platforms o)

...

Template <typename coordinate_type> foo(coordinate_type a, coordinate_type b, coordinate_type c coordinate_type d) { typedef typename coordinate_traits<T>::difference_type diff_type; diff_type d1 = distance(a, b); diff_type d2 = distance(c, d); //distance between coordinates is difference_type if( d1 < d2) blabla }

I am exremely careful about overflow, because as you point out, it is pretty much the one thing that causes robustness problems in integer. I need to promote input integers to a larger data type even for addition and subtraction because these operations add one bit of precision and may overflow. I need to register what types to promote to for given situations. It is really amazing how fast I exceed the limits of builtins and need to resort to gmp to represent results of simple algebraic expressions.

Indeed.

...

I have a scheme in mind to make floating point robust using my fixed-point implementation. The idea is to take the bounding box of the Boolean operation at hand and map that to the full integer extents. When mapping the floating point coordinates to integer we construct a map from integer back to floating point to ensure that we don't lose any precision by mapping back. Then we run the Boolean as fixed point, which is 100% robust and cannot fail. Finally we map the output of the Boolean back to floating point and look up each coordinate in our x and y integer to floating point maps to restore the original precision if any was lost in conversion. In this way, any and every floating point input can be handled robustly with a minimum loss of precision at the intersection points and none at all at the original vertices.

Indeed.. this is a particular implementation of the well known snap rounding technique, which was first develop in the context of itersections thus it naturally fits boolean operations between polylines,

...

To improve precision just use a larger integer data type.

How uses that? library or user?

...

I think this scheme is workable and attractive because it avoids all the well documented pitfalls of making floating point robust.

FWIW snap rounding can be done directly on the float coordinates, but that's a detail we don't need to discuss now.

...

Also integer arithmetic is faster than floating point.

I'm not sure if that true these days.. you'll quickly find arguments against this old timer truth on the internet. Never benchmarked it myself though.

...

This gives an integer algorithm a significant performance advantage that might offset the cost of sorting the input and extra time and storing the mapping vector.

It would be interesting to see some benchmarks of ints vs doubles to measure how far that holds today.

...

Most applications need uniform precision (cartography is a good example) whereas graphics is the only application I can think of that needs high precision around the origin and low precision far from the origin.

Can you elaborate on this?

...

This scheme ought to cover most of the needs for floating point geometry. Can you identify any problem with the idea?

Well, it doesn't cover most of the needs for floating point geometry, just some. That is, it is undoubtly useful when doing booleans but, for example, it can't be used for straight skeleton without risking a complete topological change (for the same reason perturbations can't be used). And for say, convex hulls, or triangulations, is over kill and most likely slower than the general purpose technique I outlined before (floating-point filters)

...

What about the case where the skeleton of a 1x1 integer unit square results in degenerate topologies when the construction of its center point snaps to one of its corners?

This is a good example of the importance of separatintg topology from geometry *in the algorithm*. It shouldn't matter at all if the coordinates of the center point happen to be coincident with one of the corners. The topology must be still correct. The only way to achieve this is by simply not using the coordinate of the center point (which is the result of a construction, i.e. an intermediate result) *at all* when constructing the output topology. My straight skeleton and polygon offset implementation in CGAL: http://www.cgal.org/Manual/3.4/doc_html/cgal_manual/Straight_skeleton_2/Chap... does exactly that, it will always give you the correct topology even if several output nodes end up in the same coordinate. They would nevertheless be distinct nodes properly connected to each other. You can't achieve this with snap rounding.

...

The algorithm has to be robust to the degenerate cases that it itself might produce,

If the algorithm follows the general principle I stated before of using excusively its input to drive any logic, then it is just naturally inmune to the degenerancies it might produce itself.

...

as well as those present in the input.

Well this is different story. Handling degenerate input typically involves an overhead some users won't appreciate, so this fall within the design decisions of the algorithm in particular and the library phylosophy in general. From a user POV I generally prefer algorithms that require certain degree of non-degenerancies in the input and provide ortoghonal tools to fix that if needed (say to remove self intersections)

...

Furthermore, it is preferable not to output degenerate geometries because this can cause downstream code to choke.

Absolutely. Following the general methodologies I outlined before, since algorithms have separate predicates and constructions, each with its own exactness property, a user can choose to use exact constructions (with the added overhead usually significant) if they need to pipeline results down further processing. In all cases of course the topologies must be correct, hence predicates must be exact.

...

I find it very unconvincing when a booleans algorithm claims to handle, for example, self intersecting polygons, but upon closer examination leaves them self intersecting at the output. This is hardly handling the degeneracy in my opinion.

Indeed. It is much better to otherwise not attempt to handle the degenerancy at all, and provide other tools to "fix" that before clipping.

...

Robust to degeneracy should rightly mean robust to degeneracy at the input and free of degeneracy at the output, otherwise you're setting a trap in the path of the guy who uses the result.

Precisely. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Beman Dawes

2:38 a.m.

On Sat, Mar 14, 2009 at 2:26 PM, Fernando Cacciola <fernando.cacciola@gmail.com> wrote:

...

... For many many geometric applications, it is the topology what needs to be correct, not the geometry.

In the early years of digital geography, it was not understood that the topology had to be correct, and virtually every digital mapping project turned into a disaster. Once it became known (in the late 1960's) that the topology had to always be totally correct (no topological errors whatsoever, no matter how large the database) then it became possible to build large scale digital maps, and digital mapping became a reality.

...

If you do clipping for instance, it might very well not matter at all (for instance in drawing applications) if the vertices are slightly off.

In commercial digital mapping, data like water features are often "cartooned" to save memory and speed processing. In effect, intermediate coordinates are deliberately distorted, but only when it does not break the topology.

...

But if an entire region is included in the result when it should not, then that is a disaster for whatever application.

The GIS industry has a bunch of colorful names for such topological errors; zingers, leaks, etc, based on the appearance of map plots with various characteristic mistakes. And, yes, once the underlying rules were understood, most or even all of the geometric computation code had to be not just rewritten, but usually thrown in the trash and a fresh start made. It was actually possible to reduce the precision of coordinates and yet produce far more robust code. --Beman

Fernando Cacciola

16 Mar 16 Mar

1:16 a.m.

Hi Beman, So we finally discover your area of expertise besides C++ :)

...

On Sat, Mar 14, 2009 at 2:26 PM, Fernando Cacciola <fernando.cacciola@gmail.com> wrote:

...
... For many many geometric applications, it is the topology what needs to be correct, not the geometry.

In the early years of digital geography, it was not understood that the topology had to be correct, and virtually every digital mapping project turned into a disaster. Once it became known (in the late 1960's) that the topology had to always be totally correct (no topological errors whatsoever, no matter how large the database) then it became possible to build large scale digital maps, and digital mapping became a reality.

I can imagine.

...

...
If you do clipping for instance, it might very well not matter at all (for instance in drawing applications) if the vertices are slightly off.

In commercial digital mapping, data like water features are often "cartooned" to save memory and speed processing. In effect, intermediate coordinates are deliberately distorted, but only when it does not break the topology.

Right.

...

...
But if an entire region is included in the result when it should not, then that is a disaster for whatever application.

The GIS industry has a bunch of colorful names for such topological errors; zingers, leaks, etc, based on the appearance of map plots with various characteristic mistakes.

:) interesting to know

...

And, yes, once the underlying rules were understood, most or even all of the geometric computation code had to be not just rewritten, but usually thrown in the trash and a fresh start made. It was actually possible to reduce the precision of coordinates and yet produce far more robust code.

Indeed :) The straight skeleton code that exists in CGAL and which I'm using on this thread to show how floating point numerical issues *can* be handled correctly, was my third round for that algorithm. Once I figured out "the right way to do it" I just had to trash my two previous implementations and start over from scratch. There was no other choice because the algorithm as stated in paper was based on computing intersections between bisectors to generate new bisectors that would subsequently intersect. There is just no way to make that robust (as it uses intermediate results to drive the logic) without heavy support from the input numeric type (meaning without asking users to buy a Cry to run this). It just had to work with plain double, or even float, to be usable in industry. So I had to refactor the algorithm itself, that is, develop a new algorithm, to make it work exclusively using the input vertices... outputing skeleton nodes with approximated coordinates as a sort of side effect which the algorithm itself never looks at. -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Steve M. Robbins

15 Mar 15 Mar

6:04 a.m.

On Sat, Mar 14, 2009 at 04:26:35PM -0200, Fernando Cacciola wrote:

...

IMO, any geometric library should provide a toolbox of exact predicates (which shoud be exact independetly of the input type)

I agree. I've always thought that the adaptive floating-point arithmetic of Douglas Priest [1] and Jonathan Shewchuk [2] would be a good way to do this for floats and doubles. Any interest in developing a boost version of this code? [1]http://www.cs.cmu.edu/afs/cs/project/quake/public/papers/related/Priest.ps [2] http://www.cs.cmu.edu/~quake/robust.html -Steve

Fernando Cacciola

16 Mar 16 Mar

1:01 a.m.

Hi Steve,

...

On Sat, Mar 14, 2009 at 04:26:35PM -0200, Fernando Cacciola wrote:

...
IMO, any geometric library should provide a toolbox of exact predicates (which shoud be exact independetly of the input type)

I agree.

I've always thought that the adaptive floating-point arithmetic of Douglas Priest [1] and Jonathan Shewchuk [2] would be a good way to do this for floats and doubles.

These techniques are excelent tools for providing (adaptive) extended precision number types. OTOH, it would hurt performance unneccesarily too much if these are not coupled with interval arithmentic to detect the need for the extra precision in the first place. I know that as a fact because I've implemented Priest's expansions once, many years ago. IOW, following the simple foating-point filter I sketeched before, Priest's expasions would be the exact number type in there.

...

Any interest in developing a boost version of this code?

[1]http://www.cs.cmu.edu/afs/cs/project/quake/public/papers/related/Priest.ps [2] http://www.cs.cmu.edu/~quake/robust.html

I would love to see a boost implementation of Priest's expansions. For algebraic operations is, AFAICT, faster than conventional big-floats. A down side though is that you can't have exact roots with that, and there are some algorithm that can't go without them. Best -- Fernando Cacciola SciSoft Consulting, Founder http://www.scisoft-consulting.com

Steve M. Robbins

4:38 a.m.

On Sun, Mar 15, 2009 at 11:01:06PM -0200, Fernando Cacciola wrote:

...

Hi Steve,

...
On Sat, Mar 14, 2009 at 04:26:35PM -0200, Fernando Cacciola wrote:

...
IMO, any geometric library should provide a toolbox of exact predicates (which shoud be exact independetly of the input type)

I agree.

I've always thought that the adaptive floating-point arithmetic of Douglas Priest [1] and Jonathan Shewchuk [2] would be a good way to do this for floats and doubles.

These techniques are excelent tools for providing (adaptive) extended precision number types. OTOH, it would hurt performance unneccesarily too much if these are not coupled with interval arithmentic to detect the need for the extra precision in the first place. I know that as a fact because I've implemented Priest's expansions once, many years ago.

IOW, following the simple foating-point filter I sketeched before, Priest's expasions would be the exact number type in there.

Yes. Roughly speaking, that's what Shewchuk's predicates do: a floating point filter that, if the filter fails, is followed by an exact calculation using Priest's expansions. It's been a while since I read the papers but IIRC, the filter computations provide the initial expansions so you don't have to re-start the exact computation from scratch. This could be an additional speed benefit. -Steve

Simonson, Lucanus J

3:59 a.m.

...

On Sat, Mar 14, 2009 at 04:26:35PM -0200, Fernando Cacciola wrote:

...
IMO, any geometric library should provide a toolbox of exact predicates (which shoud be exact independetly of the input type)

Steve M. Robbins wrote:

...

I agree.

I've always thought that the adaptive floating-point arithmetic of Douglas Priest [1] and Jonathan Shewchuk [2] would be a good way to do this for floats and doubles. Any interest in developing a boost version of this code?

I'm aware of Shewchuk's work. He is a friend of one of my team mates. His 2D Delauney library (Triangles) is considered definitive. However, his software license is not free. I think providing in boost a library such as the adaptive floating point arithmetic Shewchuk's Triangles is based on is a great idea, but it can't be a derivitive work of Shewchuk's code. To be sucessful such a library should be developed along side an application for it such as his Delauney triangulation that will drive its development. It would be hard to develop in isolation. Luke

5931

Age (days ago)

5938

Last active (days ago)

List overview

Download

36 comments

8 participants

participants (8)

Barend Gehrels
Beman Dawes
Fernando Cacciola
Paul A. Bristow
Sebastian Redl
Simonson, Lucanus J
Steve M. Robbins
Thomas Klimpel