Re: [boost] [geometry] robustness approaches

14 Mar 2009

      Hi Luke,
...
Fernando Cacciola wrote:
...
The library proposed by Lucannus, OTOH, at least acknowledges the 
problem appropriately and approaches a solution in the right
direction (there is still some important design-influencing
considerations but are not showstopers)
Thank you, Fernando.  I agree with you 100% that robustness needs be
designed in up front and not tacked onto an existing geometry code
base as an afterthought.  In particular, the robustness strategy
needs to be comprehensive and not patchwork.  I acknowledge that my
library needs a lot of work in this area to make it robust for both
integer and floating point, particularly since it was only designed
with integer robustness in mind.  I will have to redesign the low
level stuff quite significantly to get both integer and floating
point fully robust and sharing the same algorithm.  I don't really
have the knowledge I need to make my library floating point robust.
I need to read some of the papers you mention.
I'll try to sketch the general ideas below to get you started.
...
Making the coordinate data type a template parameter and "allowing"
the user to specify infinite precision numerical data type if they
want robustness is not a solution
Precisely :) life would be too easy it that very already fast enough.
...
, because there are very simple and
effective ways to make integer and floating point operations robust.
I don't know if we need to design in points of customization to
expose the ability to do that to the user, but it needs to be at
least possible for the library developer to specify different
strategies for different coordinate data types and different
contexts.
Exactly... and the library as a service to other higher level libraries 
should and can provide a general methodology and the corresponding tools 
for doing that.

Ultimately users shouldn't be concerned at all with all this, except 
when there is reasonable speed/robustness tradeoff that users would like 
to threshold themselves.
...
I think Brandon's library is trying to handle both integer and
floating point robustness, and I've noticed he is at least as
interested in multi-precision numerical data types as I am, which is
a good sign that he's doing the right things.  I'm interested in
learning more about his approach.
As I'll try to sketch below, bigfloat are part of the toolset, but there 
is more to it.
...
Performance is fine as a goal, but if the algorithms fail (crash,
hang, drop polygons or "gracefully" throw an exception) even a small
percentage of the time due to robustness issues then they are
worthless, regardless of how fast.
Exactly.
...
If we tried to use a non-robust
library and it failed in a mission critical situation it costs us
over one million dollars a day to fix the problem.  Failure is not an
option.  When I say 100% integer robust I'm serious.  Its all or
nothing and the stakes for me are high.
And so are for end users.. it is the developers that think: "this works 
in all cases I tried, then that's good enough".
But it isn't, and there is no need to compromise since lots of 
techniques exists and while they don't cover all of the possible 
algorithms, there is clearly little justification for not using the 
techniques where they are applicable.

=== LONG EXPLANTION BELOW ===

OK, I'll try to sketch the general idea behid floating point robustness 
in geometric computing.  There is a lot on the subject on the 
literature, is just difficult to find, so consider this just an intrduction.

On the one hand, if an algorithm only needs to perform algebraic 
operations, a "Bigfloat", i.e. a software-based adaptive unlimited 
precision floating point type, delivers totally exact results. If 
division is needed then you can use a rational of bifgloats. If other 
operations are needed, such as sqrt() or transcendentals (say cos, sin) 
they you can use a number type that tracks the expression trees and 
avoids actually performing the non-exact operations as much as possible, 
specially when you query for discrete properties such as its sign.

On the other hand, it is possible to design and implement any geometric 
algorithm in terms of "predicates" and "constructions" and it is 
possible to consider the output as having separate topological and 
geometric parts.

Let's consider an example: the straight skeleton (and the voronoi 
diagram in this case) of a square contains one and only one vertex in 
the center and one and only one edge from that center node to each 
square corner. That result has a topology (how vertices and skeleton 
nodes are connected) and a geometry (the coordinates of the center node).

The topology might be correct even if the geometry is not: i.e. the 
center node is not exactly in the center, yet they are connected correctly.

For many many geometric applications, it is the topology what needs to 
be correct, not the geometry. If you do clipping for instance, it might 
very well not matter at all (for instance in drawing applications) if 
the vertices are slightly off. But if an entire region is included in 
the result when it should not, then that is a disaster for whatever 
application.

Since the input to an algorithm is error free no matter the 
representation (or more precisely: there is nothing the algorithm can do 
about its implicit error), it is easy to see that, if an algoritm is 
designed in such a way that the topological construction is based 
exclusively on the results of predicates called exclusively on the input 
(and never ever on an intermediate result), then the topology would be 
correct no matter what capabilities the underlying input arithmetic has, 
provided the predicates can do whatever magic is neccesary to produce 
exact and correct results given input of any type.

That is, if there is a predicate that can take machine integer or 
floating point coordinates, and, in whatever way, produce a guaranteed 
error free result, then you can write the code of an algorithm totally 
agnostic of the limitations in the type of input coordinates.  That is, 
the agorithm itsef can just pretend the computation is performed in the 
RealRAM.

Researchers spent a lot of time figuring out how to implement exact 
predicates based on finite precision integer or floating-point input. 
While the results are limited and somewhat complicated from an 
engineering POV they do allow for incredibly fast algorithms that take 
simple native type coordinates (int or double) and produce 100% robust 
code. But even then, the *concept* of a black box predicate is still 
useful because there are other programming-specific techniques to 
provide simple-to-program but fast predicates: the so called "floating 
point filters" which I'll explan below.

IMO, any geometric library should provide a toolbox of exact predicates 
(which shoud be exact independetly of the input type) and state as a 
guideline that any algorithm should only make critical decisions based 
on the result of predicates, and, unless totally impossible, the 
predicates should never be called with intermediate results.

Here's a simple example:

   T d1 = distance(a,b)
   T d2 = distance(c,d)
   if ( d1 < d2 )
      blabla

that is fundamentally flawed beyond repair because rebustness becomes 
the resposability of T, and at the end of the day the poor user finds 
himself wrapping his input into a fancy numeric type just because of 
that flaw.

If OTOH that were written like so:

   if ( compare_distace(a,b,c,d) )
      blabla

then the input could still be of any type yet some basic library tool 
could go to whatever extent is needed to give the correct answer.

And if you are like me and love to exploit C++ unsurpreased expressive 
power, you can even write that like so:

   distance_result d1 = distance(a,b)
   distance_result d2 = distance(c,d)
   if ( d1 < d2 )
      blabla

where the comparison between distance_result types does the magic of
producing an exact answer no matter what the type of input coordinates 
are. And this would even prevent the algorithms in the library to 
mistakely do the wrong thing.

Following this methodology at large needs support from the bottom layer 
of the library, i.e. it needs a exact precicates toolbox that can 
operate on arbitrary input types and always yield correct results, but 
it also needs a general methodology: i.e. predicates needs to be used 
istead of "direct" logic and predicates need to be given only the input 
and never intermediate results (or as I'll explain below, that must be 
carefully cosidered and instrumented with its own special technique). 
These guidelines can be enforced by the lirary given the right design 
(i.e. having "disntace" return "distance_type" which actually stores the 
  input)

All that above allows algorithms to be called on arbitraty input types 
that don't need to carry the weight of providing robustness. i.e. 
machine integer and floating point numbers can be used. Couldn't be any 
simpler for the end user.

Constructions OTOH are similarly back-boxes like predicates are, but 
serve different purposes since it is presumed that the algorithm is well 
written and never uses the results from constructions to feed predicates 
(or rather they do so only very carefully and by design, and only in 
those cases when it was impossible to express the algorithm output as a 
direct function of its input).

That is, while the library provides, say, Point p = centroid(a,b,c)
that 'p' is never used to drive the decisions on the algorithm (is never 
passed to a predicate), yet still that centroid can be a higher level 
facility that *interally* uses whatever it needs to come up with a 
reasonable approximated result.

Whenever you can express an algorithm in terms of predicates and 
constructions such that predicates only operate on the input and never 
on the result of constructions, then your algorithm is robust 
automagically without any requirement on the input type and without the 
need to to anything at all to workaround numerical issues within the 
algorithm itself. That's the job of the predicate oracles.

If an algorithm just can't call predicates with only the input and needs 
to step over intermediate results, things get more complicated. However, 
if all those intermediate results that would be feed into predicates are 
ONLY produced with "constructions", sound alternatives can be explored.

One alternative, the one used the most, is to have the constructions 
produce higher level abstactions instead of actual numerical results. 
For instance, they can return the expression trees that refer to the 
input, and have the predicates evaluate that lazily on demand (so the 
predicate is still only really operating on the input).
Again, this needs clever support from the library at design type in 
tgat, for example, the result from "centroid" would't be a point but a 
centroid_result, storing the input points as well as an approximated 
result, which could be perfecttly feed into predicates that could do the 
  right thing having the input at hand.

This works reasonably well given the inherent complexity but is still 
the subject of active research. Most of the time the use of DAGs for 
results is not really needed, so the ideal framework would be able to 
mix both native input and lazy expressions in a coherent manner. One 
starting idea (that I had many years ago but never had the time to 
evolve) would be to have a hybrid geometric type that uses a dynamic 
dual representation: input type or DAG.

Now a word on predicates.

In C++ it is actually somewhat trivial to have a blackbox predicate than 
can produce correct results in spite of the input type using the 
following technique known as a floating-point filter (and would be 
equally trivial *within* a virtual machine for any dynamic language FWIW):

template<class T>
bool pred( T a, T b )
{
   interval<T> ia = a ;
   interval<T> ib = b ;

   interval<T> r = some operations on ia and ib

   if ( certainly( r == 0 ) )
      return true ;
   else
   if ( certainly( r != 0 ) )
      return false ;
   else
      return pred<some_exact_type>(a,b);
}

That exploits the capability of interval arithmetic to implement 
certified logic. I won't explain how or why here, but "r == 0" doesn't 
return bool but TriBool, so you can explicitely query for the "I don't 
know" answer, and *only* in that case fall back to exact solution.

All these machinery must ultimately use some mapping from input type and 
the interval and exact types needed. Designing that mapping needs to 
consder different aspects. For instance, not all algoritm require the 
same exact_type for instance: some might need an exact sqrt() for 
example. So, the mapping could need to be something like this:

    coordinate_trats<T>::template exact<requiremets_tag>::type

Also, predicates and constructions serve different purposes so they also 
have differet requirement on the exact types. Going back to the previous 
  straight skeleton example, while the topology must be right, the 
coordinates of the center nodes need not, BUT, a user could reasonably 
expect a black box costruction to neverthess come up with a decent 
approximation, to avoid for instance points totaly off because of 
floating point overflow. In this case the algorithm should use a 
"constructor" object that can somehow be told what exact type to use, 
which might be different than the one used for predicates. Thus,
the traits might need to be more like this:

    coordinate_trats<T>
     ::template exact<for_predicate_tag, requiremets_tag>::type

Best

--
Fernando Cacciola
SciSoft Consulting, Founder
http://www.scisoft-consulting.com