[geometry] area, length, centroid, behavior

Barend Gehrels

24 Feb 2009 24 Feb '09

12:29 p.m.

I would like to hear the opinion of potential geometry users on the list about the following. For the three geometries, point, linestring (= sequence of points), polygon: What should the generic "area" algorithm return for point and linestring? - 0 (zero) - compiler error (so this function is not defined for them at all) - exception Personally I would feel for 0 (zero) because you can then request the "area" in a generic collection, regardless of what is inside. What should the generic "length" algorithm return for polygon? - 0 (zero) - its perimeter - compiler error - exception What should the generic "centroid" algorithm return for point and linestring? - the point / the average its points - compiler error - exception and for an invalid polygon (e.g. no points at all, or not closed)? - exception - return "false" (if centroid is passed as a reference parameter) - ? (we've not always the compiler error option here because points are usually stored in a runtime collection) Thanks, Barend

Show replies by date

Michael Fawcett

24 Feb 24 Feb

4:31 p.m.

On Tue, Feb 24, 2009 at 7:29 AM, Barend Gehrels <barend@geodan.nl> wrote:

...

What should the generic "area" algorithm return for point and linestring? - 0 (zero) - compiler error (so this function is not defined for them at all) - exception

I can't decide between compiler error and 0. To be formally correct, I believe it should be compiler error because it doesn't exist, but 0 also signals non-existance.

...

What should the generic "length" algorithm return for polygon? - 0 (zero) - its perimeter - compiler error - exception

Compiler error. Provide functions like perimeter, chord_length, length_along_axis, or whatever else.

...

What should the generic "centroid" algorithm return for point and linestring? - the point / the average its points - compiler error - exception

To be formally correct, I believe linestring should be compiler error. Centroid is the point on which it would balance when placed on a needle, and obviously that doesn't often happen with a non-straight linestring. You could provide average_point (not sure I like the name) that returns the average of the points and have centroid call it, only with a stricter interface.

...

and for an invalid polygon (e.g. no points at all, or not closed)? - exception - return "false" (if centroid is passed as a reference parameter) - ? (we've not always the compiler error option here because points are usually stored in a runtime collection)

This is a much bigger question than you ask. Where do you want the error checking to occur? Do you provide an is_closed, or is_overlapped, or things of that nature for checking the validity of a polygon? Is a valid polygon a precondition of all functions that work on polygons, or do you intend to check for validity in every function? If I *know* my polygon is valid, I don't want to pay for the error checking. --Michael Fawcett

Brandon Kohn

4:58 p.m.

-------------------------------------------------- From: "Barend Gehrels" <barend@geodan.nl> Sent: Tuesday, February 24, 2009 7:29 AM To: <boost@lists.boost.org> Subject: [boost] [geometry] area, length, centroid, behavior

...

I would like to hear the opinion of potential geometry users on the list about the following.

For the three geometries, point, linestring (= sequence of points), polygon:

What should the generic "area" algorithm return for point and linestring? - 0 (zero) - compiler error (so this function is not defined for them at all) - exception Personally I would feel for 0 (zero) because you can then request the "area" in a generic collection, regardless of what is inside.

I think failing to compile would be the most reasonable. I cannot think of any reasonable use-case where one would want the area of a point, or a line. I suspect the most often encountered case would be that the user has made a logic-error. The same applies to volume (or any integral quantity which is a function of dimension). I would suggest that area would require a 2D concept and that the shape of the object be at minimum a 2-simplex (triangle.. possibly degenerate). Then volume would require a 3D concept and at minimum a 3-simplex (tetrahedron also possibly degenerate) and so forth. By assuming a value of 0 for the area of a point or line, you are saying that these represent a degenerate 2-simplex (even though these representations do not maintain the topology of a 2-simplex.) The only quibble I can see would be for calculating things like 'surface area' or 'perimeter length' which implies a dimensionally constrained integration on a higher dimension object. I think being more rigorous for these will give users a better experience than having higher abstraction.

...

What should the generic "length" algorithm return for polygon? - 0 (zero) - its perimeter - compiler error - exception

I would not define 'length' for a polygon. Perimeter has a distinct meaning from length and so I think should the name of the function that calculates it. As for the length of a polyline/linestring, I think this falls in the same camp as the 'surface area'. These really describe breaking up the higher dimensional object into many lower dimensional objects, solving them piecemeal under their dimensional constraints and then aggregating the results. For these I don't know what would be best. My advice would be to err on the side of making sure the user knows what she's doing. I should take my own advice :D.

...

What should the generic "centroid" algorithm return for point and linestring?

The centroid of a collection of points is well defined. Perhaps this is the function you should provide? I would suggest having this separate from functions calculating the centroid of a solid.

...

and for an invalid polygon (e.g. no points at all, or not closed)? - exception - return "false" (if centroid is passed as a reference parameter) - ? (we've not always the compiler error option here because points are usually stored in a runtime collection)

Thanks, Barend

HTH, Brandon

Mathias Gaunard

5:37 p.m.

Brandon Kohn wrote:

...

I think failing to compile would be the most reasonable. I cannot think of any reasonable use-case where one would want the area of a point, or a line. I suspect the most often encountered case would be that the user has made a logic-error. The same applies to volume (or any integral quantity which is a function of dimension). I would suggest that area would require a 2D concept and that the shape of the object be at minimum a 2-simplex (triangle.. possibly degenerate).

Shapes, volumes, etc. are sets of points. Various sets have a null area. This is perfectly valid and is not a logic error. Points (a point is just a set of points which one element) and lines have zero area, this is common knowledge. This is even stated in the definition on "area" on wikipedia, for example.

Michael Fawcett

6:13 p.m.

On Tue, Feb 24, 2009 at 12:37 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

Shapes, volumes, etc. are sets of points. Various sets have a null area. This is perfectly valid and is not a logic error.

Points (a point is just a set of points which one element) and lines have zero area, this is common knowledge. This is even stated in the definition on "area" on wikipedia, for example.

http://mathworld.wolfram.com/Area.html says: "Area - The area of a surface or lamina is the amount of material needed to "cover" it completely. The area of a surface or collection of surfaces bounding a solid is called, not surprisingly, the surface area." which seems to restrict "area" to a surface or lamina. --Michael Fawcett

Phil Endecott

5:21 p.m.

Barend Gehrels wrote:

...

What should the generic "area" algorithm return for point and linestring? What should the generic "length" algorithm return for polygon? What should the generic "centroid" algorithm return for point and linestring?

There are two cases I can think of here: - Maybe I have a heterogeneous collection of geometric objects and I want to do something involving one of these algorithms to each element. I can't imagine doing this in a real scenario, so I have no comment about what the behaviour should be. - I have made a typo in my code. In this case I really want a compiler error.

...

and for an invalid polygon (e.g. no points at all, or not closed)?

One option is to prohibit invalid polygons, i.e. to make it impossible to construct one with too few points. I suspect that this would lead to other problems though. So I suggest that you have a simple is_valid() test that the user can call if they care about this case and make centroid (etc) return an arbitrary answer like (0,0), but guarantee not to crash (e.g. not divide by zero). Phil.

Mathias Gaunard

5:33 p.m.

Barend Gehrels a écrit :

...

I would like to hear the opinion of potential geometry users on the list about the following.

For the three geometries, point, linestring (= sequence of points), polygon:

What should the generic "area" algorithm return for point and linestring? - 0 (zero) - compiler error (so this function is not defined for them at all) - exception Personally I would feel for 0 (zero) because you can then request the "area" in a generic collection, regardless of what is inside.

0, indeed.

...

What should the generic "length" algorithm return for polygon? - 0 (zero) - its perimeter - compiler error - exception

Rename length to perimeter, since that is more generic.

...

What should the generic "centroid" algorithm return for point and linestring? - the point / the average its points - compiler error - exception

For all geometries, which are only sets of points, the centroid should be the average point.

...

and for an invalid polygon (e.g. no points at all, or not closed)? - exception - return "false" (if centroid is passed as a reference parameter) - ? (we've not always the compiler error option here because points are usually stored in a runtime collection)

Undefined behaviour. An invalid polygon shouldn't be allowed in the first place. Ideally, that should be prevented by the type system.

Simonson, Lucanus J

5:48 p.m.

...

...
and for an invalid polygon (e.g. no points at all, or not closed)? - exception - return "false" (if centroid is passed as a reference parameter) - ? (we've not always the compiler error option here because points are usually stored in a runtime collection)

Undefined behaviour. An invalid polygon shouldn't be allowed in the first place.

Ideally, that should be prevented by the type system.

We are not defining the type system, we are defining a system of concepts. It places an unreasonable restriction on the definition of polygon to insist that it can't be default constructable and empty. I'm OK with undefined behavior being the result of calling area() on a default constructed legacy polygon type, but preferablly the undefined behavior should be deterministic. I'd also like to point out that roughly half the legacy polygon types floating around are invalid if you insist that they are "closed" because the last edge may be implied and it is more memory efficient to omit the redundant point. In such cases it is better to assume the implied edge that closes the polygon and produce the correct result rather than throw exceptions. It is also unreasonable to assume a winding direction convention. If your polygon concept requires these three conditions (not empty, closed and CC winding) you are excluding 7/8 legacy data types if we assume each decision is made arbitrarily. Making algorithms invariant to these conditions takes some extra coding and testing, but isn't too difficult. Luke

Mathias Gaunard

25 Feb 25 Feb

1:54 p.m.

Simonson, Lucanus J wrote:

...

It places an unreasonable restriction on the definition of polygon to insist that it can't be default constructable and empty.

Construction is not part of the concept, no. Only Copy-construction.

...

I'd also like to point out that roughly half the legacy polygon types floating around are invalid if you insist that they are "closed" because the last edge may be implied and it is more memory efficient to omit the redundant point. In such cases it is better to assume the implied edge that closes the polygon and produce the correct result rather than throw exceptions. It is also unreasonable to assume a winding direction convention. If your polygon concept requires these three conditions (not empty, closed and CC winding) you are excluding 7/8 legacy data types if we assume each decision is made arbitrarily.

If it's not closed, and doesn't contain at least one vertex, it's not a polygon. Whether the last edge is inferred or not is irrelevant. That's only a mean of definition. The retroactive modeling should adapt whatever mean the legacy type uses to define a polygon to the mean the library expects.

Brandon Kohn

24 Feb 24 Feb

6:39 p.m.

-------------------------------------------------- From: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org> Sent: Tuesday, February 24, 2009 12:37 PM To: <boost@lists.boost.org> Subject: Re: [boost] [geometry] area, length, centroid, behavior

...

Shapes, volumes, etc. are sets of points. Various sets have a null area. This is perfectly valid and is not a logic error.

I don't agree. There are perhaps conventions that say a point or line has zero area, but it doesn't follow that these should be viable arguments to calculate an area. My thoughts are that you really need at minimum a representation which supports 3 points (the 2-simplex ... 2 vectors ... etc) to define an area. If you have three equal points then fine 0 area. If you have 3 points where the third lies on the line defined by the other two, fine 0 area. If you have a circle with 0 radius, fine 0 area. For anything else you are making assumptions or accepting the assumptions made implicit via a convention. I do not think that these should dictate the interface. In my opinion the only shapes which would make sense are those that may actually have an area (non-zero.) Can anyone think of a use-case?

...

Points (a point is just a set of points which one element) and lines have zero area, this is common knowledge. This is even stated in the definition on "area" on wikipedia, for example.

I don't agree that this is common knowledge. It may seem like 'common-sense', but I don't think such notions have much merit in the face of something as mathematically rigorous as a computational geometry library.

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Sebastian Redl

6:47 p.m.

Brandon Kohn wrote:

...

-------------------------------------------------- From: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org> Sent: Tuesday, February 24, 2009 12:37 PM To: <boost@lists.boost.org> Subject: Re: [boost] [geometry] area, length, centroid, behavior

...
Points (a point is just a set of points which one element) and lines have zero area, this is common knowledge. This is even stated in the definition on "area" on wikipedia, for example.

I don't agree that this is common knowledge. It may seem like 'common-sense', but I don't think such notions have much merit in the face of something as mathematically rigorous as a computational geometry library. It's a matter of interpretation. If you consider a point a 0-dimensional object placed in the n-dimensional space you're working with, it doesn't have an extent over any number of dimensions. If, however, you consider it an n-dimensional object with an extent of 0 along every dimension, it does have length, area, volume, and whatever you want to call hyper-volumes in more than 3 dimensions, all of them being 0.

Sebastian

Brandon Kohn

7:13 p.m.

-------------------------------------------------- From: "Sebastian Redl" <sebastian.redl@getdesigned.at>

...

It's a matter of interpretation. If you consider a point a 0-dimensional object placed in the n-dimensional space you're working with, it doesn't have an extent over any number of dimensions. If, however, you consider it an n-dimensional object with an extent of 0 along every dimension, it does have length, area, volume, and whatever you want to call hyper-volumes in more than 3 dimensions, all of them being 0.

I agree that you can interpret these things as you have stated. The part that I find hard to accept is that I should somewhere in the code want: point a; line b; .... //! Assuming the "dimensionally agnostic" function for the 'area' metric is called content. double aArea = content( a ); double bArea = content( b ); My intuition tells me that such a call is far more likely to be a logic-error than the case that I really want to know the area of a point or line. So if I were defining the interface I would make the choice that I have advocated. Of course the world won't end if zero area is returned for points, lines (or type void for that matter.) Brandon

Matt Calabrese

8:43 p.m.

On Tue, Feb 24, 2009 at 12:33 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:

...

0, indeed.

Rename length to perimeter, since that is more generic. For all geometries, which are only sets of points, the centroid should be

...

the average point. Undefined behaviour. An invalid polygon shouldn't be allowed in the first place. Ideally, that should be prevented by the type system.

Agreed with all of these. As I see it, with any reasonable generic definition of the functions mentioned, their behavior seems to fall into place just as Mathias specified for points, lines, etc. I understand that some people may be worried that you might accidentally pass points to an area function when you didn't mean to, but in my opinion, that's not very likely and it is not worth putting an unnecessary requirement on a function simply because in some cases people might use it incorrectly. On Tue, Feb 24, 2009 at 2:13 PM, Brandon Kohn <blkohn@hotmail.com> wrote:

...

I agree that you can interpret these things as you have stated. The part that I find hard to accept is that I should somewhere in the code want:

point a; line b; .... //! Assuming the "dimensionally agnostic" function for the 'area' metric is called content. double aArea = content( a ); double bArea = content( b );

My intuition tells me that such a call is far more likely to be a logic-error than the case that I really want to know the area of a point or line. So if I were defining the interface I would make the choice that I have advocated. Of course the world won't end if zero area is returned for points, lines (or type void for that matter.)

I question both how common such a logical error would be and why the library should be responsible for trying to prevent it. While I would agree that all attempts should be made to make the library (or rather any library) intuitive and difficult to misuse, I don't believe preventing a hypothetical mistake by making a function have more strict requirements than necessary is a good design decision. If in practice a lot of people were accidentally passing points to the area function, then maybe this would be more of a concern, but I really don't see that as being a problem here. I'd rather have the function work in the way that makes sense. -- -Matt Calabrese

Brandon Kohn

10:02 p.m.

-------------------------------------------------- From: "Matt Calabrese" <rivorus@gmail.com> Sent: Tuesday, February 24, 2009 3:43 PM To: <boost@lists.boost.org> Subject: Re: [boost] [geometry] area, length, centroid, behavior

...

I question both how common such a logical error would be and why the library should be responsible for trying to prevent it. While I would agree that all attempts should be made to make the library (or rather any library) intuitive and difficult to misuse, I don't believe preventing a hypothetical mistake by making a function have more strict requirements than necessary is a good design decision.

I think in this case it's not a matter of making the function more restrictive. The function to calculate area must be specialized for each type (or categorical concept) in some way. To create an area function that explicitly specializes the behavior for point or line types, one would have to create an additional function which does so explicitly. I.e. the developer has to add this functionality rather than curtail existing functions.

...

If in practice a lot of people were accidentally passing points to the area function, then maybe this would be more of a concern, but I really don't see that as being a problem here. I'd rather have the function work in the way that makes sense.

I suppose this is just where I differ from everyone. A point is a 0-simplex. It has no units (dimensionless) in content space. A line segment is a 1-simplex. It has units of length in content space. The 2-simplex (triangle) has units of area. A 3-simplex has units of volume and so forth with increasing dimension. It doesn't make sense IMO to query the area of a point any more than it does to query the length or volume of a point. I cannot see a viable use-case for wanting it, and based on that I don't see why the developer would want to add additional code to support it. Brandon

Michael Fawcett

10:06 p.m.

On Tue, Feb 24, 2009 at 5:02 PM, Brandon Kohn <blkohn@hotmail.com> wrote:

...

I suppose this is just where I differ from everyone.

Well, my posts before yours don't differ, so not *everyone*... --Michael Fawcett

Simonson, Lucanus J

10:32 p.m.

...

From: "Matt Calabrese" <rivorus@gmail.com>

...
If in practice a lot of people were accidentally passing points to the area function, then maybe this would be more of a concern, but I really don't see that as being a problem here. I'd rather have the function work in the way that makes sense.

Brandon Kohn wrote:

...

I suppose this is just where I differ from everyone. A point is a 0-simplex. It has no units (dimensionless) in content space. A line segment is a 1-simplex. It has units of length in content space. The 2-simplex (triangle) has units of area. A 3-simplex has units of volume and so forth with increasing dimension. It doesn't make sense IMO to query the area of a point any more than it does to query the length or volume of a point. I cannot see a viable use-case for wanting it, and based on that I don't see why the developer would want to add additional code to support it.

I'm going to back Brandon up on this one. There is no valid usecase for asking for the area of a point. It should be a syntax error. I also agree that writing such a function is a waste of time even if it were not also harmful to the integrity of the API by breaking down conceptual type safety. I also think it is a bad idea to provide a catchall default area() function that returns zero for every concept that doesn't have an override area() function. This changes all compile time errors related to type misidentification at compile time into runtime errors, the worst possible thing the library author could do. This makes the job of library author much harder because it turns his own mistakes into runtime errors rather than compile time errors. Case1: Polygon p1; //lots of code if(some condition) { Point p1; //some more code result = something + area(p1); //oops, which p1 did they mean? Is it really so unlikely? } Case2: Polygon p1; //this polygon doesn't properly model polygon concept Result = something + area(p1); //default version of area assumes object type must not have an area and returns 0! All user error looks unlikely in isolation. In fact, programmer error only happens at a rate of about 1%. For every 100 lines of code the library user writes we want them to get an extra syntax error instead of an extra runtime error because the runtime error costs 10 to 100X more to fix. Luke

Topher Cooper

25 Feb 25 Feb

12:43 a.m.

I agree entirely with Brandon that this is a frequent way of defining what a point is. I disagree that it is the only useful way of defining a point. A point is equally the limit of a sequence of polygons as their diameter shrinks to zero. There is nothing intrinsically, mathematically (and certainly not practically) wrong with considering a polyline as a special ("degenerate") case of a polygon, a line segment as a special case of a polyline and thus of a polygon (or as a special case of a polygon directly -- one whose "width" has reached the limit of zero) or a point as a special case of a line segment. What /would /you call a line whose length happened to be zero? The centroid is even more clear cut. It has always been a concept defined for an arbitrary set of points, which includes lines, line segments, polylines and individual points. On the other hand, I would say that the "length" of a polygon is too ambiguous -- some degeneracy needs to be available to disambiguate. Is the "length" the perimeter, the minimum diameter, the maximum diameter, the average diameter, or, as most developments of fractal geometry would have it, an infinite length (the fractal dimensionality of a geometric object is the dimension of the measure on that object is neither zero nor infinite; by the way this would give a line or a point an area of 0)? I would go with an error here. Use case: I have a collection of objects in a 2-d space. Some do not have a measurable extent so they are modeled as points. Some have a measurable extent only in a single dimension so that are modeled as oriented line segments. I want the centroid of the system. I also want the center of mass where mass is proportional to substance, i.e., area. Here points and line segments have an area of zero and hence a mass of zero. I could handle this, of course, by adding code that extends the concept of area so that by explicit test the area+ is equal to the area of a polygon and equal to 0 for a line-segment a polyline or a point, but that is just saying that the model is inadequate for this purpose and I would need to write special code to extend it in this quite reasonable way. If I go to the trouble of declaring things so that points and line-segments and polylines and polygons can all be used "together" then I probably want to treat them this way. If I don't, then I should have to eliminate that case explicitly by testing -- just as I would if I wanted to exclude the use of area of a concave or self-intersecting polygon. Topher Brandon Kohn wrote:

...

I suppose this is just where I differ from everyone. A point is a 0-simplex. It has no units (dimensionless) in content space. A line segment is a 1-simplex. It has units of length in content space. The 2-simplex (triangle) has units of area. A 3-simplex has units of volume and so forth with increasing dimension. It doesn't make sense IMO to query the area of a point any more than it does to query the length or volume of a point. I cannot see a viable use-case for wanting it, and based on that I don't see why the developer would want to add additional code to support it.

Brandon

Barend Gehrels

10 p.m.

Thanks for all your answers!

...

I have a collection of objects in a 2-d space. Some do not have a measurable extent so they are modeled as points. Some have a measurable extent only in a single dimension so that are modeled as oriented line segments. I want the centroid of the system. I also want the center of mass where mass is proportional to substance, i.e., area. Here points and line segments have an area of zero and hence a mass of zero. I could handle this, of course, by adding code that extends the concept of area so that by explicit test the area+ is equal to the area of a polygon and equal to 0 for a line-segment a polyline or a point, but that is just saying that the model is inadequate for this purpose and I would need to write special code to extend it in this quite reasonable way.

If I go to the trouble of declaring things so that points and line-segments and polylines and polygons can all be used "together" then I probably want to treat them this way. If I don't, then I should have to eliminate that case explicitly by testing -- just as I would if I wanted to exclude the use of area of a concave or self-intersecting polygon.

I agree with this completely. I have a layer, templated by geometry object, so layer<G>. Inside it has a geometry collection, e.g. std::vector<G>. I have a selection dialog where users can define expressions such as area > 0. The displayed functions/properties might be hidden for certain geometry types. However, if area is not defined at all, the system will not compile and as a library user I would have to invent tricks such as the area+ or templatizing all the list of functions/properties. This is not a hypothetical use case, I have a layer like this and a dialog like this.

...

Shapes, volumes, etc. are sets of points. Various sets have a null area. This is perfectly valid and is not a logic error I am glad with this definition because this is also the way we use it. We have modelled our library by Egenhofer/OGC mathematical concepts, so also relations like intersection/union/intersects/overlaps are all "Point-Set Topological Spatial Relations". So null area makes sense. If you consider a point a 0-dimensional object placed in the n-dimensional space you're working with, it doesn't have an extent over any number of dimensions. If, however, you consider it an n-dimensional object with an extent of 0 along every dimension, it does have length, area, volume, and whatever you want to call hyper-volumes in more than 3 dimensions, all of them being 0.

"Formally" we have a point defined as an object with topological dimension of 0 (so a 0-dimensional object, there are many types of dimensions around)

...

I also think it is a bad idea to provide a catchall default area() function that returns zero for every concept that doesn't have an override area() function. You can make a default which does returns area of zero for any geometry, and a separate specialization which gives a compilation error for any non-defined type.

...

Do you provide an is_closed, or is_overlapped, or things of that nature for checking the validity of a polygon? Is a valid polygon a precondition of all functions that work on polygons, or do you intend to check for validity in every function? is_closed will be provided. is_valid will be provided. is_simple (meaning there are no self-intersections and self-tangencies) will be provided. There is also a "correct" algorithm which will close the polygon and correct its winding direction.

...

it is also unreasonable to assume a winding direction convention. If your polygon concept requires these three conditions (not empty, closed and CC winding) you are excluding 7/8 legacy data types if we assume each decision is made arbitrarily. We assume closed polygons with specific windings in our algorithms. For Legacy-polygons we provide the "correct" algorithm.

...

An invalid polygon shouldn't be allowed in the first place. Ideally, that should be prevented by the type system. One option is to prohibit invalid polygons, i.e. to make it impossible to construct one with too few points. I suspect that this would lead to other problems though

I do not think that is possible, we handle polygons from 4 to millions of points, it is a runtime collection.

...

A point is a 0-simplex. It has no units (dimensionless) in content space. A line segment is a 1-simplex. It has units of length in content space. The 2-simplex (triangle) has units of area. A 3-simplex has units of volume and so forth with increasing dimension. Topological_dimensions are indeed a very useful meta-function with which we templatize some dispatch structures.

...

On the other hand, I would say that the "length" of a polygon is too ambiguous -- some degeneracy needs to be available to disambiguate. I think most if not all people considered the length for a polygon for its perimeter a bad idea. Zero then? Because there still is the usecase for the generic collection and my selection dialog having the length function/property as well. So it should compile. We still have the exception to be thrown, nobody reacted on that. In this specific case it might be a compromise: it compiles, providing generic collections, but should not be called.

We want to provide a library which is mathematically correct and at the same time offers enough flexibility. The general idea is that the results should be has "homogeneous" as possible from one geometry to another, so the particular case of a point shouldn't break the whole machine. Therefore I'm glad that there is a mathematical viewpoint which says that it is perfectly valid to consider a point having an area of zero, a polygon being a pointset. So if I may summarize/propose: - area: defined for all, returning 0 for point/linestrings, area for polygons, exception for polyhedrons or DIM>3 - length: defined for all, returning length for linestrings, 0 for the rest of them, exception for polygon - centroid: defined for all other relevant functions mentioned here: - topological_dimension - correct - is_valid - is_closed - is_simple - volume, returning 0 for all objects with topological dimension < 3, exception for > 3 - perimeter, returning 0 for all except polygon and finally we might define: - content, returning 0 for a point, the length of a linestring, area of a polygon, volume of a polyhedron, hypervolume of a polytope With as explanation a quote (off-list) of my co-author Bruno:

...

All this can be seen as a conceptual approximation as we force ourself to view every dimension as a particular case here. Really generic would state that the length of a linestring, the area of a polygon and the volume of a polyhedron are conceptually exactly the same thing, that is: the quantity of space units inside a geometry.

Regards, Barend

Simonson, Lucanus J

26 Feb 26 Feb

7:12 p.m.

Luke wrote:

...

...
it is also unreasonable to assume a winding direction convention. If your polygon concept requires these three conditions (not empty, closed and CC winding) you are excluding 7/8 legacy data types if we assume each decision is made arbitrarily.

Barend Gehrels wrote:

...

We assume closed polygons with specific windings in our algorithms. For Legacy-polygons we provide the "correct" algorithm.

You cannot "correct" a polygon type that enforces the class invariant that the polygon is not-closed or has opposite winding of your convention. Such practice is considered good object oriented design, and is common. More than half of the polygon object I'm familiar with enforce winding, but do so with arbitrary decision of which winding they enforce. Such polygons cannot model your polygon concept, and will be copy converted to something that does (such as your polygon data type) to satisfy the requirements of your library API. This defeats the purpose of C++ concepts based geometry API design. Requiring the user data types conform to these semantics fails to create a general interface for polygon objects. It may be possible for a motivated user to create interfaces in their polygon traits specialization that would iterate over their polygon in reverse order and tack on an extra point for wrap-around closed semantics. I could do that easily, but I don't think it is reasonable to expect that of a geometry library user as opposed to geometry library author. It requires extra programming effort to setup the traits and also a lot more knowledge and understanding of the semantics requried of polygons by the library. Such attempts are likely to fail on the first try. It is likely to lead to users resorting to copy conversion. I believe design decisions should lead to a library that is convenient to use rather than convenient to implement. Assuming the winding direction and closed semantic is a convenience for implementing the algorithms only. These minor semantic differences are the swamp of geometry data type conversion. The library should elevate the user above these low level annoyances and allow them to work at a higher level of abstraction, not force them to trudge through the swamp the same way they always have in the past to use every other geometry library. I also want to reiterate that area(point) and length(polygon) should be syntax errors. Type safety isn't something we should break lightly. Regards, Luke

Mika Heiskanen

8:02 p.m.

Simonson, Lucanus J wrote:

...

I believe design decisions should lead to a library that is convenient to use rather than convenient to implement.

I agree with Luke on this one. I have written a contouring algorithm which produces either polygons for filling or polylines for stroking. The filling will work just fine if I use the even-odd filling rule regardless of the orientation of any individual subpolygon. However, I'd hate to have to produce a certain winding rule for the exterior and the holes in order to use the same data for more complicated GIS calculations such as logical operations and buffering. IMHO it should be just as easy as choosing to use the even-odd filling rule instead of the winding rule. --> Mika Heiskanen

Barend Gehrels

9:26 p.m.

Mika Heiskanen wrote:

...

Simonson, Lucanus J wrote:

...
I believe design decisions should lead to a library that is convenient to use rather than convenient to implement.

I agree with Luke on this one. I have written a contouring algorithm which produces either polygons for filling or polylines for stroking. The filling will work just fine if I use the even-odd filling rule regardless of the orientation of any individual subpolygon. However, I'd hate to have to produce a certain winding rule for the exterior and the holes in order to use the same data for more complicated GIS calculations such as logical operations and buffering. IMHO it should be just as easy as choosing to use the even-odd filling rule instead of the winding rule.

We currently follow the "rules" of the OGC, stating:

...

The exterior boundary LinearRing defines the “top” of the surface which is the side of the surface from which the exterior boundary appears to traverse the boundary in a counter clockwise direction. The interior LinearRings will have the opposite orientation, and appear as clockwise when viewed from the “top”, The assertions for Polygons (the rules that define valid Polygons) are as follows: a) Polygons are topologically closed; b) The boundary of a Polygon consists of a set of LinearRings that make up its exterior and interior boundaries; (source: http://www.opengeospatial.org/standards/sfa#downloads)

I'm talking about OGC more on this list. They have defined a well-defined interface which is followed by the majority of the GIS world and which is based on a mathematical/topological foundation. So we better follow their guidelines than take decisions about every possibility ourselves. Most if not all Open Source implementations are following these rules. Of course we can concentrate on making polygons "winding-agnostic" and "closing-point-agnostic". I'm not convinced however that that is so easy. If you calculate a convex hull, do you want it returned in the same winding direction as you defined it? It will have an influence, plus a performance malus, on many algorithms. By just providing the "correct" algorithm we enable library users to use our library with every polygon. It cannot be called from the algorithms themselves: to verify the winding rule the area has to be calculated beforehand. This will drastically decrease performance. The solution might be a traits class telling the winding rule the polygon type has. That might be ever implemented, but for now we'll better concentrate on other things. Same for closing point. Regards, Barend

Michael Fawcett

9:52 p.m.

On Thu, Feb 26, 2009 at 4:26 PM, Barend Gehrels <barend@geodan.nl> wrote:

...

(source: http://www.opengeospatial.org/standards/sfa#downloads)

I'm talking about OGC more on this list. They have defined a well-defined interface which is followed by the majority of the GIS world and which is based on a mathematical/topological foundation. So we better follow their guidelines than take decisions about every possibility ourselves. Most if not all Open Source implementations are following these rules.

Might I suggest that since you are aiming for OGC compliance, and from your recent mention of including coordinate system projections, that you narrow your library's focus into something more like Boost.GIS instead of Boost.Geometry?

...

Of course we can concentrate on making polygons "winding-agnostic" and "closing-point-agnostic". I'm not convinced however that that is so easy. If you calculate a convex hull, do you want it returned in the same winding direction as you defined it? It will have an influence, plus a performance malus, on many algorithms. By just providing the "correct" algorithm we enable library users to use our library with every polygon. It cannot be called from the algorithms themselves: to verify the winding rule the area has to be calculated beforehand. This will drastically decrease performance.

The solution might be a traits class telling the winding rule the polygon type has. That might be ever implemented, but for now we'll better concentrate on other things. Same for closing point.

GIS is my work area, so I know it's huge, but it seems like you are trying to tackle GIS + generic geometry library. I don't get the impression that game programmers, for example, will enjoy or use this library. Polygons are not closed (and rarely used, only triangles are) in games, and cannot have holes, and any error checking to determine otherwise is wasteful. Winding is handled by the rendering API, and is also rarely, if ever, checked since the model exporter will have enforced the correct winding. It seems easier to please one group of people (GIS) than to try to please GIS, scientific simulations, game, CAD developers, and more. I believe it will also lead to less discussion about the correctness of interfaces since there is already a standard (OGC) to back you up if you choose GIS. Choose none of them, and you'll have programmers from all industries telling you why your way is wrong and their way is right. --Michael Fawcett P.S. I would love to start using a really fast and elegant Boost.GIS rather than ESRI ArcObjects...

Barend Gehrels

11:07 p.m.

Hi Michael,

...

Might I suggest that since you are aiming for OGC compliance, and from your recent mention of including coordinate system projections, that you narrow your library's focus into something more like Boost.GIS instead of Boost.Geometry?

(snipped)

Thanks, Will think about what you wrote. It is true that the geometry world is huge. It is not that strict OGC compliance is our major goal. But the main reason to use this standard is that it is a sound framework created by a consortium. Actually there is also an ISO standard, ISO 19107 (a.k.a. Topic 1 - Feature Geometry). But that is more or less the same, as far as I know. Don't know other standards on geometry. It is true that we're going for a sort of Geometry+GIS library and we'll probably never meet everyones requirements. However, the current authors of this GGL are GIS / geometry / gaming so we've a reasonable mixture. However, will think about it. So maybe we've to support counter clockwise holeless non closed polygons as well... but indeed we'll get other wishes then... fractals...

...

P.S. I would love to start using a really fast and elegant Boost.GIS rather than ESRI ArcObjects... Seems that we're doing our best :-)

Regards, Barend

Bruno Lalande

1 Mar 1 Mar

12:18 a.m.

Hi,

...

GIS is my work area, so I know it's huge, but it seems like you are trying to tackle GIS + generic geometry library. I don't get the impression that game programmers, for example, will enjoy or use this library. Polygons are not closed (and rarely used, only triangles are) in games, and cannot have holes, and any error checking to determine otherwise is wasteful. Winding is handled by the rendering API, and is also rarely, if ever, checked since the model exporter will have enforced the correct winding.

Me and Barend discussed a bit about this, as I'm personally more concerned about game programing than GIS. So I guess that until now, my participation to the library has been influenced by this point of view. What you say is true about what the library currently *provides* (GIS-oriented algorithms, even if polygons with holes can be used in games contrary to what you say, to display 2D vectorial fonts for instance). But it's false about the *potential* of the library, and this has always been my main concern since the beginning. I focused my efforts until now on the kernel because I think that the different application domains (GIS, games or whatever) won't necessarily share the same algorithms but can share the same kernel. If an algorithm is to slow because it can handles polygons with holes while game programmers use plain triangles, the key of the problem is to know if the library has been designed generically enough to sort that out. If mechanisms like tag dispatching and/or concept refinement are present and designed so well that I can add a triangle concept, take advantage of the algorithms already present for polygons, and specialize the ones that deserve to be, there's no problem. We can do that and game programmers will be happy with the library, like GIS ones were happy with the stuff already present. So the kernel is the key. Right now Barend wants to write more algorithms, and those algorithms are GIS-oriented since this is his domain. This doesn't remove anything from the library. It can even improve the kernel: as Douglas Gregor writes in http://tinyurl.com/cw3qk5, "concepts are neither designed nor invented, they are discovered through the process of lifting many algorithms within the same domain." So writing more and more algorithms, even targeted toward one application domain, can only give a chance to improve the kernel. Of course it would be good to make the same in other domains as well, and as soon as I have more time, I'll try to test the kernel against some more game-oriented algorithms. Now the question is: is a Boost.Geometry library meant to contain all the stuff necessary for every application domain (which is basically impossible)? Personally, my approach would rather be to have Boost.Geometry proposing just a kernel and the most basic algorithms like point arithmetics, dot and cross products, etc... and having extensions for more precise application containing potentially a lot of algorithms... Regards Bruno

Michael Fawcett

2 Mar 2 Mar

5:58 p.m.

On Sat, Feb 28, 2009 at 7:18 PM, Bruno Lalande <bruno.lalande@gmail.com> wrote:

...

Me and Barend discussed a bit about this, as I'm personally more concerned about game programing than GIS. So I guess that until now, my participation to the library has been influenced by this point of view.

Good to know. While I work in GIS, I used to work in games, and it's still a hobby of mine.

...

What you say is true about what the library currently *provides* (GIS-oriented algorithms, even if polygons with holes can be used in games contrary to what you say, to display 2D vectorial fonts for instance).

Sure games *can*, but the base rendering API (e.g. D3D or OpenGL) doesn't take care of it. It's extra CPU work, and in the end, it still becomes just a triangle. For instance, OpenGL GL_POLYGON can only do convex polygons with holes, and if the vertices are non-planar you get undefined behavior. Direct3D DrawPrimitives takes a PrimitiveType to define how to interpret the data in the vertex stream - valid PrimitiveTypes are LineList, LineStrip, PointList, TriangleFan, TriangleList, and TriangleStrip. Most video game teams will allow artists to model however they are most comfortable. As part of the export process, a custom exporter will convert their 3D models (maybe with concave polygons, maybe with holes, maybe non-planer) into the triangle only equivalent; triangles, tri-strips, or fans, but it doesn't matter - they're still triangles (3 vertices only) by the time they get to the game. The modeling package also handles polygon winding.

...

I focused my efforts until now on the kernel because I think that the different application domains (GIS, games or whatever) won't necessarily share the same algorithms but can share the same kernel. If an algorithm is to slow because it can handles polygons with holes while game programmers use plain triangles, the key of the problem is to know if the library has been designed generically enough to sort that out. If mechanisms like tag dispatching and/or concept refinement are present and designed so well that I can add a triangle concept, take advantage of the algorithms already present for polygons, and specialize the ones that deserve to be, there's no problem. We can do that and game programmers will be happy with the library, like GIS ones were happy with the stuff already present.

So the kernel is the key.

What types of things does the kernel encompass? If the data types are user-defined and the algorithms simply work over types that match particular concepts, what is left? It's all user defined types and algorithms...or do you mean the kernel is all the concepts and tag-dispatching machinery?

...

Now the question is: is a Boost.Geometry library meant to contain all the stuff necessary for every application domain (which is basically impossible)? Personally, my approach would rather be to have Boost.Geometry proposing just a kernel and the most basic algorithms like point arithmetics, dot and cross products, etc... and having extensions for more precise application containing potentially a lot of algorithms...

Well, I sort of agree. I can't figure out what the kernel encompasses. I think that GIS algorithms belong in Boost.GIS or Boost.Geometry.GIS and that they should use concepts. I also think that Lucanus Simonson's work could co-exist in some other library or extension since he only works with polygons that have angles of multiples of 45 degrees (or something like that, I'm not sure on specifics). I also think that they should use the same concepts and tags where appropriate. I'm ambivalent as to whether they would be separate libraries or simply extensions to a Boost.Geometry library as long as they work together. --Michael Fawcett

Bruno Lalande

3 Mar 3 Mar

9:29 a.m.

Hi Michael,

...

...
What you say is true about what the library currently *provides* (GIS-oriented algorithms, even if polygons with holes can be used in games contrary to what you say, to display 2D vectorial fonts for instance).

Sure games *can*, but the base rendering API (e.g. D3D or OpenGL) doesn't take care of it. It's extra CPU work, and in the end, it still becomes just a triangle. For instance, OpenGL GL_POLYGON can only do convex polygons with holes, and if the vertices are non-planar you get undefined behavior. Direct3D DrawPrimitives takes a PrimitiveType to define how to interpret the data in the vertex stream - valid PrimitiveTypes are LineList, LineStrip, PointList, TriangleFan, TriangleList, and TriangleStrip.

I know all that, I was only thinking about the trick that consists in using GL_TRIANGLE_FAN and the stencil buffer to display concave polygons with concave holes. But indeed, it's a particular use case and the base shape remains plain triangle.

...

Most video game teams will allow artists to model however they are most comfortable. As part of the export process, a custom exporter will convert their 3D models (maybe with concave polygons, maybe with holes, maybe non-planer) into the triangle only equivalent; triangles, tri-strips, or fans, but it doesn't matter - they're still triangles (3 vertices only) by the time they get to the game. The modeling package also handles polygon winding.

Agree completely.

...

What types of things does the kernel encompass? If the data types are user-defined and the algorithms simply work over types that match particular concepts, what is left? It's all user defined types and algorithms...or do you mean the kernel is all the concepts and tag-dispatching machinery?

This is the good question, precisely. I think you guessed well in your last sentence. Kernel could be a bunch of concepts with a ready-made tag dispatching mechanism (or any other generic mechanism(s)). It can seem very virtual, but if you look at the time we've already spent on it and the number of discussions around that, it's far from being trivial. And until an "official" library states all that, geometry developers will continue to reinvent everything everywhere and work with there do-it-all point classes. In addition, I'd see a few algorithms on point. You can do what you want, if you don't have points and basic arithmetics on them, you have nothing, whatever the application domain. Things like dot and cross product (wedge product if dimension-agnostic?) would also be included. I find it pretty stupid to see programmers constantly rewriting such widely used mathematical tools. But as I was saying to Barend, finding the frontier is the hard part. Do intersection algorithms belong in the kernel, and if yes, which ones?... Anyway, it doesn't sound ridiculous to me to have a very tiny (though result of a long work) kernel. But again, is that kernel is not validated by lots of algorithms in a wide range of specific domains, it's worthless.

...

Well, I sort of agree. I can't figure out what the kernel encompasses. I think that GIS algorithms belong in Boost.GIS or Boost.Geometry.GIS and that they should use concepts.

Barend and me think like you.

...

I also think that Lucanus Simonson's work could co-exist in some other library or extension since he only works with polygons that have angles of multiples of 45 degrees (or something like that, I'm not sure on specifics).

It's also my feeling but I can't be affirmative until I take a closer look at his library. But indeed, as 45 degrees computations are clearly a specific domain, it would belong in an extension, like GIS.

...

I also think that they should use the same concepts and tags where appropriate.

Yes definitely.

...

I'm ambivalent as to whether they would be separate libraries or simply extensions to a Boost.Geometry library as long as they work together.

This is something I had thought about. I was wondering if all the library authors could produce a kernel together, and continue to work separately but on this same kernel. And the kernel would be proposed as Boost.Geometry before any other more specific work. But I have to elaborate all this, it's a bit unclear to me for now... Bruno

Michael Fawcett

4:37 p.m.

On Tue, Mar 3, 2009 at 4:29 AM, Bruno Lalande <bruno.lalande@gmail.com> wrote:

...

Hi Michael,

...
What types of things does the kernel encompass? If the data types are user-defined and the algorithms simply work over types that match particular concepts, what is left? It's all user defined types and algorithms...or do you mean the kernel is all the concepts and tag-dispatching machinery?

This is the good question, precisely. I think you guessed well in your last sentence. Kernel could be a bunch of concepts with a ready-made tag dispatching mechanism (or any other generic mechanism(s)). It can seem very virtual, but if you look at the time we've already spent on it and the number of discussions around that, it's far from being trivial. And until an "official" library states all that, geometry developers will continue to reinvent everything everywhere and work with there do-it-all point classes.

I really like where you're headed if that's the case. A core set of concepts and tags that are generic and flexible enough to be used across closely related domains seems like a great thing. Stopping all the reinvention would be awesome!

...

In addition, I'd see a few algorithms on point. You can do what you want, if you don't have points and basic arithmetics on them, you have nothing, whatever the application domain. Things like dot and cross product (wedge product if dimension-agnostic?) would also be included. I find it pretty stupid to see programmers constantly rewriting such widely used mathematical tools.

I'd almost not even want to see those algorithms included. It seems really elegant to me to have a set of concepts and tags that a bunch of domain specific algorithms can use. I think dot product and cross product belong in a linear algebra extension. You could, however, ship the library with a few extensions already included. I think GIL did that with the goal of adding more later.

...

But as I was saying to Barend, finding the frontier is the hard part. Do intersection algorithms belong in the kernel, and if yes, which ones?...

See above. As a side note: I don't think kernel is really the best term for this piece of the library. I know it's central, but it isn't the central computing mechanism of the library since it doesn't really compute anything.

...

Anyway, it doesn't sound ridiculous to me to have a very tiny (though result of a long work) kernel. But again, is that kernel is not validated by lots of algorithms in a wide range of specific domains, it's worthless.

Agreed, wholeheartedly.

...

...
I'm ambivalent as to whether they would be separate libraries or simply extensions to a Boost.Geometry library as long as they work together.

This is something I had thought about. I was wondering if all the library authors could produce a kernel together, and continue to work separately but on this same kernel. And the kernel would be proposed as Boost.Geometry before any other more specific work. But I have to elaborate all this, it's a bit unclear to me for now...

I think this is a good direction to take. I know both parties will be reluctant since they have invested so much time and effort, but surely it would be better for the community. --Michael Fawcett

Bruno Lalande

5 Mar 5 Mar

9:32 a.m.

Hi Michael,

...

I'd almost not even want to see those algorithms included. It seems really elegant to me to have a set of concepts and tags that a bunch of domain specific algorithms can use. I think dot product and cross product belong in a linear algebra extension. You could, however, ship the library with a few extensions already included. I think GIL did that with the goal of adding more later.

Agreed, this approach makes sense.

...

See above. As a side note: I don't think kernel is really the best term for this piece of the library. I know it's central, but it isn't the central computing mechanism of the library since it doesn't really compute anything.

I chose this term because the same term is used in CGAL. I thought what they call "kernel" was precisely the concepts part of the library, but maybe I'm wrong, I don't know much that library.

...

I think this is a good direction to take. I know both parties will be reluctant since they have invested so much time and effort, but surely it would be better for the community.

For sure it would be better. And the fact of trying to "extract" a kernel from both library is much less frustrating than forcing one to give up the core of his work. We don't throw anything away, we rather take the "quintessence" of each one. Bruno

Simonson, Lucanus J

4:18 p.m.

Michael Fawcett wrote:

...

...
I also think that Lucanus Simonson's work could co-exist in some other library or extension since he only works with polygons that have angles of multiples of 45 degrees (or something like that, I'm not sure on specifics).

Bruno Lalande wrote:

...

It's also my feeling but I can't be affirmative until I take a closer look at his library. But indeed, as 45 degrees computations are clearly a specific domain, it would belong in an extension, like GIS.

What was true of my library eight months ago has changed recently as I have added full support for arbitrary angle geometry. I have implemented 100% numerically robust integer polygon clipping operations (AND, OR, NOT, XOR) based upon my N-layer scanline algorithm that produces all geometries with unique sets of overlapping input layers. When the number of input layers is two it can be used for Booleans, but we collapse the entire layer stack of an Intel microprocessor (dozens of input layers) including circular and non-manhattan, non-45 package layout using a single pass of this algorithm. When I say 100% numerically robust integer polygon operations I mean that any input polygons with 32-bit integer coordinates will produce an output that closely approximates the idealized output which is representable only with inifinite precision rational coordinates. Details of how I accomplish this will be presented at boostcon. The algorithm is of course parameterized by coordinate data type and with the appropriate traits 64-bit integer would also be robust. Floating point will work up to a point, and requires some further effort to support in a way that would make fans of floating point coordinates happy. Collaboration with Brandon or Barend to get the floating point robustness right would be appreicated since I don't care to reinvent a wheel I don't expect to use.

...

...
I also think that they should use the same concepts and tags where appropriate.

Yes definitely.

Definitely.

...

...
I'm ambivalent as to whether they would be separate libraries or simply extensions to a Boost.Geometry library as long as they work together.

There will initially be several implementations of some algorithms. It will be difficult to choose which ones to make "the" library algorithm because there will be semantic differences in their behavior and in the way their approaches to robustness produce slightly different outputs. GIS may have different requirements than VLSI, which has different requirements than physical modeling. It will take some time to unify these requirements in a single implementation to a provides a superset of the specifc needs of (most) everyone who might want to use it.

...

This is something I had thought about. I was wondering if all the library authors could produce a kernel together, and continue to work separately but on this same kernel. And the kernel would be proposed as Boost.Geometry before any other more specific work. But I have to elaborate all this, it's a bit unclear to me for now...

This sounds fine to me. I have already released my concepts based API to internal users who are very happy with it. Changes to the names of concepts and mechanisms of generic function overloading would be easy for them to accommodate as long as I can do with an eventual boost geometry concepts type sytem what I can currently do with what I have. I will be sure to point out which design decisions are important for me to support my exisiting API.

...

...
See above. As a side note: I don't think kernel is really the best term for this piece of the library. I know it's central, but it isn't the central computing mechanism of the library since it doesn't really compute anything.

I chose this term because the same term is used in CGAL. I thought what they call "kernel" was precisely the concepts part of the library, but maybe I'm wrong, I don't know much that library.

Funny you should mention CGAL, I have been clarifying the way in which CGAL's type system works to include discussion of it in my paper. CGAL kernel is not in anyway analogous to our concepts core. They use the term concept in their documentation, but its definition is different. It is merely a mechanism for them and their users to easily define new geometry data types that are able to interoperate with their API, not a way to allow data types that don't depend on CGAL header files to interoperate with their API. That is not possible with CGAL, but I am told they are considering adding it. We should not use the term kernel. I would treat it as almost as sacrosanct as a registered trademark and steer clear of comfusing people by avoiding it. I suggest geometry concepts core.

...

...
I think this is a good direction to take. I know both parties will be reluctant since they have invested so much time and effort, but surely it would be better for the community.

For sure it would be better. And the fact of trying to "extract" a kernel from both library is much less frustrating than forcing one to give up the core of his work. We don't throw anything away, we rather take the "quintessence" of each one.

I am not reluctant to collaborate. We will have a boost geometry library much sooner by working together than by competing. It is not really accurate so say both parties, however, because Brandon Kohn, while not as vocal as myself, has proposed a goeometry library at least equal merit. Ideally I'd like to see Brandon also attend boostcon so that we can all sit down together and hash out what the concepts core of a boost-geometry library will look like. I would be happy to implement such a design if we can come to a consensus on its goodness. I can easily ensure portability to gcc, MSVC and edg based compilers such as icc. Regards, Luke

Mika Heiskanen

4:50 p.m.

Simonson, Lucanus J wrote:

...

I have implemented 100% numerically robust integer polygon clipping operations (AND, OR, NOT, XOR) based upon my N-layer scanline algorithm that produces all geometries with unique sets of overlapping input layers.

Do you have any plans to also add a robust buffering function? Regards, --> Mika Heiskanen

Simonson, Lucanus J

5:03 p.m.

...

...
I have implemented 100% numerically robust integer polygon clipping operations (AND, OR, NOT, XOR) based upon my N-layer scanline algorithm that

Simonson, Lucanus J wrote: produces all > geometries with unique sets of overlapping input layers.

Mika Heiskanen wrote:

...

Do you have any plans to also add a robust buffering function?

Do you mean polygon fill? Buffering is a pretty general term and I'm not sure what you mean, but I'm guessing you mean writing the geometry to an image buffer. I have the ability to tile output polygons as trapezoids (may degenerate to rectangles) by slicing along the scanline orientation. These tiles are trivially split (if necessary) into triangles and fed into any graphics API that renders triangles. There will be an example in the documentation of how to convert a black and white GIL image into polygons with edge detection, smooth away pixel scale features, perform operations on the resulting polygons then put the result back into a GIL image by use of a algorithm that performs trapezoid filling of enclosed pixel points in a 2d array. The trapezoid approach introducees numerical artifacts and extra verticies, which would be of no consequence when converting to an image if the coordinate data type were on the order of the pixel size or smaller. If you meant something else please elaborate so that I can answer properly. Thanks, Luke

Mika Heiskanen

5:16 p.m.

Simonson, Lucanus J wrote:

...

Mika Heiskanen wrote:

...
Do you have any plans to also add a robust buffering function?

Do you mean polygon fill?

Sorry, my fault. The terminology comes from GIS, and I now realize is probably not familiar to anyone outside GIS circles, since nobody else probably needs the operation. After a quick google search, this seems to sum things up pretty well: "Buffer operations in GIS" http://www-users.cs.umn.edu/~npramod/enc_pdf.pdf A very simple summary would be: a buffered polygon is the polygon which includes all points within the given distance from the original polygon. The definition can be extended to points (circles) and polylines (lines with width). For more details please see for example the above link. The operation is just as important in GIS as logical operations. Regards, --> Mika Heiskanen

Simonson, Lucanus J

6:49 p.m.

Mika Heiskanen wrote:

...

Simonson, Lucanus J wrote:

...
Mika Heiskanen wrote:

...
Do you have any plans to also add a robust buffering function?

Do you mean polygon fill?

Sorry, my fault. The terminology comes from GIS, and I now realize is probably not familiar to anyone outside GIS circles, since nobody else probably needs the operation.

After a quick google search, this seems to sum things up pretty well:

"Buffer operations in GIS" http://www-users.cs.umn.edu/~npramod/enc_pdf.pdf

A very simple summary would be: a buffered polygon is the polygon which includes all points within the given distance from the original polygon. The definition can be extended to points (circles) and polylines (lines with width). For more details please see for example the above link.

The operation is just as important in GIS as logical operations.

Regards,

--> Mika Heiskanen

Ah, yes, we call this resizing, sizing, inflate, deflate, etc. I have also heard the term errosion, which would be the inverse operation to what you describe. In fact, nearly everyone needs the operation, or some variaton of it. The answer to your original question is yes. When resizing there is typically an option of how to handle convex corners. The common case is to maintain the topology and move the corner. This means that an arbitrarily acute corner would move arbitrarily far, which is often not really what the user wants. There are commonly options such as clipping the corner off at the resizing distance or making a segmented piecewise-linear circluar-arc around the corner that joins the resized edges before and after it. It is this ciruclar arc option that seems to be what you are describing since it approximates the set of points within a distance of a polygon. Other options for the algorithm would relate to snapping preferences for how to handle cases where the resizing distance must be approximated due to limitations of the coordinate data type. Resizing operations are generally lumped into "Booleans" when I use that term. They are planned. I have them for manhattan and 45-degree and they are robust to 45-degree non-integer events. For arbitrary angle I plan to implement them as soon as a request for them comes in from one of my internal users or as part of eventual boost submission. It is trivial to add rectangles to the edges to inflate them (these are sometimes called Minkowski rectangles because the approach is described in his book and often attributed to him) and then fill in the concavities around convex corners with circular-arc pie-slices then pass that geometry through an OR operation. I could problably implement the arbitrary angle version of resizing and integrate it fully into the library API in one to two days of programming effort and perhaps another one to two days of testing and validation effort. The algorithm is linear except for the OR operation which is loosely O(n log n) for a certain definition of n and so is considered equivalent to a Boolean operation. As an aside, the algorithm requires three points at a time instead of two, which means when you "wrap-around" when you reach the end of the sequence of polygon points the redundant point that "closes" the polygon is just in your way because you need the last (non-redundant) point and the first two to complete the algorithm. My code checks for and discards the redundant point that "closes" a polygon in such circumstances. Another algorithm equivalent to booleans is connectivity extraction. This algorithm extracts the connectivity graph between polygons that are touching or overlapping. I have it for manhattan and 45-degree geometry and can trivially add it through my flexible arbitrary angle scanline framework when the need arises. The key thing about providing the resizing algorithm (or any other geometry algorithm) in a boost geometry library is not that it is there, but that it provides enough options to meet the needs of the majority of people who want to use it. For instance, there may be GIS specific requirements that I wouldn't know to implement based upon my background in unrelated application domains. This requires that people co-operate to at least the minimum extent of sharing requirements, which is why I'm not under the illusion that I can implement a boost geometry library alone. Regards, Luke

Barend Gehrels

6 Mar 6 Mar

9:22 a.m.

Hi Mika,

...

Do you have any plans to also add a robust buffering function? You asked this to Luke and I saw his answers but here also an answer from our side.

Yes, we've planned a buffer algorithm in the near future. I'm currently busy with self-intersections, which have to be addressed by the buffering. Then the buffer algorithm might be implemented for a geometry. Buffering is special in that sense that buffering one polygon can have a different output from buffering two polygons. So besides the buffer(polygon) you'll probably also need a buffer algorithm on a collection. Same for linestrings, points. As also described in the PDF-link you gave. Other issues are buffer-outside, buffer-inside, holes which might disappear. And multi geometries (of course related to the collection). Regards, Barend

Simonson, Lucanus J

6:01 p.m.

Barend wrote:

...

Yes, we've planned a buffer algorithm in the near future. I'm currently busy with self-intersections, which have to be addressed by the buffering. Then the buffer algorithm might be implemented for a geometry.

I'm confused, are you trying to inflate the polygon, find intersections it causes and remove interior edges with some kind of graph traversal of an edge graph data structure that you build from the output of generalized line intersection? This is one common way to implement booleans, but it is a horribly more complicated and memory intensive than a scanline based approach. Honestly, it should only take under a week to implement booleans once you have edge intersection. The scanline that computes whether edges are interior can either be factored into the scanline that computes edge intersections or a second pass. I use a second pass because it is simpler to implement and dramatically easier to validate each algorithm in isolation. I use a third pass to form up and output polygons and associate their holes to the outer shells. Once again, it could all be done in one pass, but forming polygons is something you don't want to do if the output is fed immediately into another boolean operation. Before you cry foul about performance for three passes on scanline, let me state that the memory consumption of my algorithms is significantly lower than other algorithms I've benchmarked, and I've never found a faster implementation. I'll make sure to have a detailed explanation of my approach to booleans in the boostcon paper and presentation. If we get to the point where we can share code you could use directly my algorithms for removing interior edges and forming polygons as long as you provide a robust line intersection. Then buffering becomes a one day programming exercise.

...

Buffering is special in that sense that buffering one polygon can have a different output from buffering two polygons. So besides the buffer(polygon) you'll probably also need a buffer algorithm on a collection. Same for linestrings, points. As also described in the PDF-link you gave. Other issues are buffer-outside, buffer-inside, holes which might disappear. And multi geometries (of course related to the collection).

No, you need one algorithm that works on a collection and if the collection happens to have only one polygon you have also your algorithm that works on one polygon. result_of_inflate = inflatePolygons(&polygon, &polygon+1); Regards, Luke

Barend Gehrels

9:54 p.m.

...

I'm confused, are you trying to inflate the polygon, find intersections it causes and remove interior edges with some kind of graph traversal of an edge graph data structure that you build from the output of generalized line intersection? This is one common way to implement booleans, but it is a horribly more complicated and memory intensive than a scanline based approach. You're working with integer's Luke. Therefore the scanline approache is appropriate for you. As far as I know scanlines is not used for floating

Luke, point. Didn't plan graph traversal for this one, by the way. That was the other problem :-)

...

If we get to the point where we can share code you could use directly my algorithms for removing interior edges and forming polygons as long as you provide a robust line intersection. Then buffering becomes a one day programming exercise.

Sounds good. But also feasable if you use boolean/scanline approach and I use double's?

...

...
Buffering is special in that sense that buffering one polygon can have a different output from buffering two polygons. So besides the buffer(polygon) you'll probably also need a buffer algorithm on a collection. Same for linestrings, points. As also described in the PDF-link you gave. Other issues are buffer-outside, buffer-inside, holes which might disappear. And multi geometries (of course related to the collection).

No, you need one algorithm that works on a collection and if the collection happens to have only one polygon you have also your algorithm that works on one polygon.

result_of_inflate = inflatePolygons(&polygon, &polygon+1);

Sure. But the internal algorithm is different. Either self-intersection, or that plus unions. They are related of course so indeed we might see if they can be combined. Regards, Barend

Simonson, Lucanus J

10:22 p.m.

Barend Gehrels wrote:

...

...
I'm confused, are you trying to inflate the polygon, find intersections it causes and remove interior edges with some kind of graph traversal of an edge graph data structure that you build from the output of generalized line intersection? This is one common way to implement booleans, but it is a horribly more complicated and memory intensive than a scanline based approach. You're working with integer's Luke. Therefore the scanline approache is appropriate for you. As far as I know scanlines is not used for floating point. Didn't plan graph traversal for this one, by the way. That was the other problem :-)

I'm still confused. Why isn't scanline suitable for floating point? Isn't generalize line intersection with floating point generally implemented as a scanline patterned after Bentley-Ottmann? If it can be used to identify the intersection points it can also be used to identify interior edges. It can be the same algorithm that does both in a single pass. How does your generalized line intersection work, if not line scan? I know that it is technically challenging to write a robust linescan on floating point geometry but what practical alternative is there? Even with floating point I can bound how far a segment might move at the current point due to a future intersection snapping to the floating point grid based upon its end points and collect nearby points that may need to intersect it. I haven't done it, but it is possible. Once that algorithm is in place everything else becomes easy. If your union algorithm doesn't work by scanline and doesn't work by rule-based graph traversal how does it work? These are the only two methods I've heard of. Regards, Luke

Barend Gehrels

7 Mar 7 Mar

10:24 a.m.

Hi Luke,

...

I'm still confused. Why isn't scanline suitable for floating point? Isn't generalize line intersection with floating point generally implemented as a scanline patterned after Bentley-Ottmann? OK, you mean the sweep line. Yes, that is suitable for floating point, sure. So we agree.

...

If your union algorithm doesn't work by scanline and doesn't work by rule-based graph traversal how does it work? These are the only two methods I've heard of. It uses Weiler Atherton (http://tinyurl.com/ajnqso), but right, that is a form of graph traversal (but they don't call it that way). So here also we agree.

Barend* *

Simonson, Lucanus J

9 Mar 9 Mar

5:34 p.m.

Barend Gehrels wrote:

...

Hi Luke,

...
I'm still confused. Why isn't scanline suitable for floating point? Isn't generalize line intersection with floating point generally implemented as a scanline patterned after Bentley-Ottmann? OK, you mean the sweep line. Yes, that is suitable for floating point, sure. So we agree.

Is scanline the wrong term? I've always thought it was interchangable with sweepline, plane sweep, line scan and line sweep. If it is associated with some particular polygon fill implementation or some such and sweep line is the general term I'd be glad to know of it.

...

...
If your union algorithm doesn't work by scanline and doesn't work by rule-based graph traversal how does it work? These are the only two methods I've heard of. It uses Weiler Atherton (http://tinyurl.com/ajnqso), but right, that is a form of graph traversal (but they don't call it that way). So here also we agree.

Weiler Atherton is an example of what I meant by rule-based graph traversal. You have a visitor and he runs around a graph where polygon-vertices are graph-nodes and polygon-edges are graph-edges and tries to walk the outer boundaries of polygons. This algorithm is really a pain to implement because you have to anticipate all the unfortunate circumstances your visitor will find himself in and write rules to get him out of them correctly. Frequent mistakes that are made is joining together of polygons that touch at only one vertex, and of course getting the wrong answer. Another drawback is that while the visitor can find the outer shells and the holes, he can't find which holes belong to which shells, meaning an additional algorithm is needed for that. Finally, the algorithm is completely inflexible in the sense that to implement a new operation you need to write a complex set of rules, which is a non-trivial ammount of work. This leads to lazy short cuts like using a combination of AND and SUBTRACT (AND NOT) to implement XOR, which could be done as a single operation if it were easier to write the rules. The algorithm is difficult to validate because you can never convince yourself that it has seen all the cases you didn't think about when writing the rules. Aside from being difficult to implement and validate the algorithm is very expensive because the graph data structure is heavy in memory and slow to traverse due to pointer chasing and cache being poorly utilized (neighbor nodes are not adjacent in memory). Unless you need the graph data structure for some other reason (such as meshing) you are probably happier to forego using it at all. Given a black box booleans implementation I can usually tell if it is graph traversal based because it is a) slow, b) uses too much memory and c) has O(n^2) runtime associating n holes to an outer polygon because a shortcut is usually taken when taking care of that detail. When compared to an alternative that doesn't suffer these limitations it is at a competitive disadvantage. Please note, Barend, I'm not criticizing you, I'm criticizing an algorithm invented in the '70s. It is an ongoing frustration of mine that software tools we have to use were implemented with this algorithm. I'd really like to see a superior open source alternative to it plugged into those tools and improve them. That's sort of my goal, actually. Regards, Luke

Barend Gehrels

10 Mar 10 Mar

1 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Hi Luke,

We've corresponded about this off-list as well, so here a short answer 
(in the end not that short...).

I propose to note in the subjectline that discussions are about 
implementation details, as these, to make it clear for "users" that they 
might skip these ones.


> Given a black box booleans implementation I can usually tell if it is graph traversal based because it is a) slow
A polygon intersection/union/difference algorithm consists of:
- finding the intersections
- gathering the results

The last one can be done by graph traversal, of which the performance is 
not the problem, vertices are visited only once. Finding the 
intersections is the challenge and the sweepline can be a useful utility 
there.

As I didn't finish the full polygon intersection yet, (we have 
polygon-clipping against a rectangle) for me finding intersections was 
no issue until now. Will look at the sweepline but also at monotonic 
sections and indexing. And benchmark.

>  b) uses too much memory 
The list of intersections is in normal cases very short and will even in 
extreme cases not exceed the number of vertices of the polygons. So I 
don't think of it as "too much"

> and c) has O(n^2) runtime associating n holes to an outer polygon
Can you give a reference to this calculation? I'm not convinced. A hole 
either:
- disappears by intersection (so will be part of the exterior boundary)
- stays in the polygon because included by both input polygons, or falls 
outside, you indeed have to determine this but this is O(n) where n is a 
hole, and this probably can be done while traversing it above.
- in case of a union new holes can be formed, lying in the union

> Please note, Barend, I'm not criticizing you, I'm criticizing an algorithm invented in the '70s.
Pythagoras' theorem is 2500 years old. You refer to Bentley-Oltmann, it 
is from 1979. Weiler is from 1980. The age of an algorithm is not a 
measure of its usability.

Polygon/polygon relating algorithms (intersection, union, etc) have to:
- handle convex as well as concave polygons
- handle holes
- handle floating point
- handle collinear or equal segments, horizontal/vertical cases
- handle "degenerate cases" (some authors call it like that but a 
segment intersecting / tangenting a vertex, i.m.o it is not degenerate)
- not produce degenerate lines
- be fast and flexible
- you might add that self-intersecting polygons or arbitrary directions 
or winding rules should be handled

There are:
- Sutherland-Hodgeman, only convex
- Weiler-Atherton, the first graph traversal one
- Vatti, used in GPC (I now saw that they indeed call it scan beam 
although I think sweep line is the more general term)
- Leonov-Nikitin, polyboolean *(still graph traversal, only integer?)
- Greiner-Hormann (still graph traversal) (1998)
- Kim/Kim (modified version of the above) (2006)
- ...
Some are modifications of each other. Most ones are in fact modified 
Weiler-Atherton versions, therefore I still refer to it as such, I've 
made modifications as well.

* is fast but I quote here:
> /PolyBoolean/ uses John Hobby's rounding cell technique to avoid 
> extraneous intersections and is therefore completely 
> robust. /Boolean/ also rounds the intersection points to the integer 
> grid, then repeats until no new intersection points are found.
Are you using one of the above or another one?


> That's sort of my goal, actually.
>   
Mine too! So we might be happy to cooperate.

Regards, Barend

Simonson, Lucanus J

6:19 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Barend Gehrels wrote:
>> Given a black box booleans implementation I can usually tell if it
>> is graph traversal based because it is a) slow 

> A polygon intersection/union/difference algorithm consists of:
> - finding the intersections
> - gathering the results
> 
> The last one can be done by graph traversal, of which the performance
> is 
> not the problem, vertices are visited only once. Finding the
> intersections is the challenge and the sweepline can be a useful
> utility there.

Well, you are looking at the big-O.  Traverse a linked list and traverse a vector and you will find that there is a significant constant factor runtime penalty.  The graph is very much like a linked list, I use vectors to store my data and avoid the linked list runtime penalty for the majority of code.
> 
>>  b) uses too much memory

> The list of intersections is in normal cases very short and will even
> in extreme cases not exceed the number of vertices of the polygons.
> So I 
> don't think of it as "too much"

Well, I have to assume people will run the algorithm on data that barely fits in memory on a 256 GB memory monster of a machine.  Space is time, ask Einstein.  A linked list uses four times more memory than a vector.  I store the vertices of partial polygons in linked list form only while they are currently intersecting my sweepline and then copy them out into vector when they are closed by the algorithm, minimizing the ammount of data stored in that form.  In the graph algorithm the entire data set is stored in that form, and that means several times more memory is needed to do the same work.  You fall out of cache, your performance goes way down.  You fall out of physical memory and you might as well kill the job.  Consuming less memory and having less traffic allocating and deallocating are the first and second most important things to consider when optimizing an algorithm for the constant factor.  Because it impacts which data structures you choose, it also impacts which algorithms you choose.

>> and c) has O(n^2) runtime associating n holes to an outer polygon
> Can you give a reference to this calculation? I'm not convinced. A
> hole either:
> - disappears by intersection (so will be part of the exterior
> boundary) 
> - stays in the polygon because included by both input polygons, or
> falls outside, you indeed have to determine this but this is O(n)
> where n is a hole, and this probably can be done while traversing it
> above. - in case of a union new holes can be formed, lying in the
> union 

Two of the old internal implementation I replaced had this problem.  I've also observed it in commercial tools.  It can be made O(n), but often this is neglected because people assume it won't be a problem.  It also isn't obvious how to make it O(n) with the graph traversal algorithm.  If you subtract all polygons from the bounding box of the entire set you have a heck of a lot of holes and nothing in the graph to help you.

> There are:
> - Sutherland-Hodgeman, only convex
> - Weiler-Atherton, the first graph traversal one
> - Vatti, used in GPC (I now saw that they indeed call it scan beam
> although I think sweep line is the more general term)
> - Leonov-Nikitin, polyboolean *(still graph traversal, only integer?)
> - Greiner-Hormann (still graph traversal) (1998)
> - Kim/Kim (modified version of the above) (2006)

Can you send me a reference for Vatti?  I've seen polyboolean before.  It is limited to 22 integer bits of precision or some such, which is kind of odd.

> * is fast but I quote here:
>> /PolyBoolean/ uses John Hobby's rounding cell technique to avoid
>> extraneous intersections and is therefore completely
>> robust. /Boolean/ also rounds the intersection points to the integer
>> grid, then repeats until no new intersection points are found.
> Are you using one of the above or another one?

I am using my own.  It is probably similar to John Hobby, I always round downward within a single integer grid.  I don't repeatedly scan because the algorithm is carefully crafted to not create new intersections in the first pass as I describe in another mail posted earlier today to this list.  Also, it is a work in progress.

Regards,
Luke

Barend Gehrels

7:14 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Hi Luke,

...

Well, you are looking at the big-O. Traverse a linked list and traverse a vector and you will find that there is a significant constant factor runtime penalty. The graph is very much like a linked list, I use vectors to store my data and avoid the linked list runtime penalty for the majority of code.

...

Well, I have to assume people will run the algorithm on data that barely fits in memory on a 256 GB memory monster of a machine. Space is time, ask Einstein. A linked list uses four times more memory than a vector. See my answer above, not using LL or DLL, just vector. Can you send me a reference for Vatti? Vatti, B.R. "A Generic Solution to Polygon Clipping"; Communications of

We're using vectors all the time (actually it is flexible, deque's are also possible). I also implemented the intersection list using a vector. Our approach probably is not that different. the ACM, 35(7), July 1992, pp.56-63. As far as I know it is not directly/freely available.

...

I am using my own. It is probably similar to John Hobby, I always round downward within a single integer grid. I don't repeatedly scan because the algorithm is carefully crafted to not create new intersections in the first pass as I describe in another mail posted earlier today to this list. Also, it is a work in progress OK, you might publish it then, might be a valuable addition :-)

I still have the feeling it is specific to integers or floating points in a small extent, and not to doubles in arbitrary coordinate systems as I'm always using... However, no problem, the more complimentary it is, the more possibilities for merging. Regards, Barend

Simonson, Lucanus J

8:17 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

...

...
Well, you are looking at the big-O. Traverse a linked list and traverse a vector and you will find that there is a significant constant factor runtime penalty. The graph is very much like a linked list, I use vectors to store my data and avoid the linked list runtime penalty for the majority of code.

Barend Gehrels wrote: We're using vectors all the time (actually it is flexible, deque's are also possible). I also implemented the intersection list using a vector. Our approach probably is not that different.

That's what I'm curious about. Can you describe your approach in more detail? I'm left guessing based upon my knowledge of other implementations. What data structure do you use for the graph? Is it vectors? Most people would implement the graph data structure such that each node corresponds to one polygon vertex, is dynamically allocated and contains a vector or linked list of pointers to other nodes. It should be possible to implement a better data structure than that, but I wouldn't expect to see it done. Regards, Luke

Simonson, Lucanus J

12 Mar 12 Mar

6:31 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Barend Gehrels wrote:

...

...
Can you send me a reference for Vatti? Vatti, B.R. "A Generic Solution to Polygon Clipping"; Communications of the ACM, 35(7), July 1992, pp.56-63. As far as I know it is not directly/freely available.

It looks like my algorithm is most closely related to Vatti, but I have improved on it in several ways. So far I can only find references to people who are improving on the graph based algorithm and performing all-pairs brute force line intersection. Do you know of any work that improves directly on Vatti rather than merely comparing yet-another-graph-travering algorithm to Vatti? Thanks, Luke

Barend Gehrels

13 Mar 13 Mar

3:15 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

...

...
Vatti, B.R. "A Generic Solution to Polygon Clipping"; Communications of the ACM, 35(7), July 1992, pp.56-63. As far as I know it is not directly/freely available.

It looks like my algorithm is most closely related to Vatti, but I have improved on it in several ways. So far I can only find references to people who are improving on the graph based algorithm and performing all-pairs brute force line intersection. Do you know of any work that improves directly on Vatti rather than merely comparing yet-another-graph-travering algorithm to Vatti?

Thought I already mentioned that, GPC: http://www.cs.man.ac.uk/~toby/alan/software/ <http://www.cs.man.ac.uk/%7Etoby/alan/software/> Barend

Phil Endecott

3:32 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Barend Gehrels wrote:

...

http://www.cs.man.ac.uk/~toby/alan/software/

I just thought I ought to mention that that code is NOT FREE, so you may want to "talk to your lawyer" before looking at it. (I'm aware of one application that removed that code during a license audit; I was wondering if any future Boost geometry code could end up replacing it there.) Phil.

Simonson, Lucanus J

5:17 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Phil Endecott wrote:

...

Barend Gehrels wrote:

...
http://www.cs.man.ac.uk/~toby/alan/software/

I just thought I ought to mention that that code is NOT FREE, so you may want to "talk to your lawyer" before looking at it. (I'm aware of one application that removed that code during a license audit; I was wondering if any future Boost geometry code could end up replacing it there.)

I think we are safe to benchmark it, writing an academic paper is non-commercial use, I don't plan on reading their code. If they haven't written any papers (I've never seen any from them) I may choose not to discuss them at all other than as a curve in a runtime comparison graph. I certainly hope a future boost geometry code could end up replacing GPC as a good free alternative. I believe what I currently have is good enough to be a drop in replacement. I provide a superset of their features, I am probably no worse in runtime even with my sub-optimal line intersection or arbitrary angle geometries and I am 100% robust for the full integer coordinate domain and all degenerate inputs. I'll be updating what is in the sandbox today, if you would like to explore the applicability. I'll check in my unit testing cpp file, which can serve as example code for how to use just about everything in the library since the purpose of that thing is to get my code coverage up and verify correctness of everything I write. Luke

Jean-Sebastien Stoezel

9:49 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

Phil: These are the libraries I have been using for development under Windows, Linux and Mac OS: - Asio - Bind - Circular buffer - Date time - Filesystem - Program options - Smart pointers - statechart - Thread - Timer - Tokenizer I would like to use these libraries with my embedded applications too, for code reuse. I understand some libraries need OS support, like thread. Using thread for embedded development would allow me to develop applications I could reuse with many RTOS. I woudln't mind developing a port for the RTOSes I have been using. For these I guess I need to develop a facade/stub class... Other things I am iterested in for code reuse and portability are generic interfaces for peripherals, like CAN bus, SPI, I2C, analog to digital converters, GPIOs... Boost does not have interfaces for these, but that would be a nice add-on. Any idea if these types of interfaces are being developped? Can I ask you what embedded platform and toolchain you have been developing on? What was the memory footprints for the libraries you used? Jean On 13-Mar-09, at 10:32 AM, Phil Endecott wrote:

...

Barend Gehrels wrote:

...
http://www.cs.man.ac.uk/~toby/alan/software/

I just thought I ought to mention that that code is NOT FREE, so you may want to "talk to your lawyer" before looking at it. (I'm aware of one application that removed that code during a license audit; I was wondering if any future Boost geometry code could end up replacing it there.)

Phil.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Simonson, Lucanus J

4:59 p.m.

New subject: [geometry] [impl] polygon relations [was: area area, length, centroid, behavior]

...

...
It looks like my algorithm is most closely related to Vatti, but I have improved on it in several ways. So far I can only find references to people who are improving on the graph based algorithm and performing all-pairs brute force line intersection. Do you know of any work that improves directly on Vatti rather than merely comparing yet-another-graph-travering algorithm to Vatti?

Thought I already mentioned that, GPC: http://www.cs.man.ac.uk/~toby/alan/software/ <http://www.cs.man.ac.uk/%7Etoby/alan/software/>

Oh, I didn't recognize the acronym. I've looked at GPC before when I was trying to find a free alternative to writing my own booleans 2+ years ago. Their license is not free. Revisiting their web site their feature set looks oftly familiar. Trapezoidization and holes, Vatti based for sure. I'll need to look into their publications (do they have any) and be sure to give them their deserved citation in my paper. They cite Leonov's boolean comparison which shows them in a very poor light compared to polyboolean. Their runtime is not scaling very nicely in his benchmarks. Also the scale of his benchmarks is very small. You can just barely start to see the runtimes of those two libraries hit the wall. I'll have to benchmark GPC and PolyBoolean myself before I can say more. It looks like in 1998, however, their line intersection at least was suboptimal, suggesting a naïve implementation of Vatti. They also lack keyholing output, which is a hard requirement of just about 50% of use cases, including mine. Thanks for the link! Luke

Bruno Lalande

5 Mar 5 Mar

11:23 p.m.

Hi Luke,

...

...
This is something I had thought about. I was wondering if all the library authors could produce a kernel together, and continue to work separately but on this same kernel. And the kernel would be proposed as Boost.Geometry before any other more specific work. But I have to elaborate all this, it's a bit unclear to me for now...

This sounds fine to me. I have already released my concepts based API to internal users who are very happy with it. Changes to the names of concepts and mechanisms of generic function overloading would be easy for them to accommodate as long as I can do with an eventual boost geometry concepts type sytem what I can currently do with what I have. I will be sure to point out which design decisions are important for me to support my exisiting API.

Nice to see you agree with this idea :-) We'll have to identify exactly what makes our cores diverge and what makes them converge, and see where we can merge and where we have to make a choice. I remember that you use SFINAE after having tried tag dispatching, exactly the inverse of what we did. A good start would be to compare both approaches. I have to take a closer look at your library to understand clearly all that. Anyway, whatever the design we end up with, it will have to ensure that you can still do what you can do right now, as well as for us. The contrary would be an obvious proof of failure.

...

Funny you should mention CGAL, I have been clarifying the way in which CGAL's type system works to include discussion of it in my paper. CGAL kernel is not in anyway analogous to our concepts core. They use the term concept in their documentation, but its definition is different. It is merely a mechanism for them and their users to easily define new geometry data types that are able to interoperate with their API, not a way to allow data types that don't depend on CGAL header files to interoperate with their API. That is not possible with CGAL, but I am told they are considering adding it.

Indeed that's really different from what I thought, for me CGAL was able to handle any foreign data types, exactly like what we do. If things are that way, then "kernel" doesn't really reflect the nature of our work indeed.

...

We should not use the term kernel. I would treat it as almost as sacrosanct as a registered trademark and steer clear of comfusing people by avoiding it. I suggest geometry concepts core.

I had also thought about "core" before so I'm OK with that.

...

I am not reluctant to collaborate. We will have a boost geometry library much sooner by working together than by competing. It is not really accurate so say both parties, however, because Brandon Kohn, while not as vocal as myself, has proposed a goeometry library at least equal merit.

Yes of course, I was mentioning you because we have communicated much more with you for now, but if Brandon is also seduced by the approach, it would be great to see him join us. Anyway all this sounds really interesting. Don't know yet if it's even possible, but at least we can give it a try. Regards Bruno

Barend Gehrels

26 Feb 26 Feb

9:33 p.m.

...

...
...
I also want to reiterate that area(point) and length(polygon) should be syntax errors. Type safety isn't something we should break lightly.

In your previous reaction you gave the example: Polygon p1; //lots of code if(some condition) { Point p1; //some more code result = something + area(p1); //oops, which p1 did they mean? Is it really so unlikely? }

I agree that such typo's are possible. But why would they only occur for area and length? What about other (generic) operations? Then the syntax error would of course not be found by the compiler. We're building a generic library, where all algorithms apply to all geometries as much as possible. We should not exclude possibilities just because the risk of a typo in implementation. I assume that you have algorithms operating on more than one geometry type alone. Area is not an exception. Regards, Barend

John Femiani

3 Mar 3 Mar

5:27 p.m.

On Tue, Feb 24, 2009 at 5:29 AM, Barend Gehrels <barend@geodan.nl> wrote:

...

What should the generic "area" algorithm return for point and linestring?... What should the generic "length" algorithm return for polygon?... What should the generic "centroid" algorithm return for point and linestring?... and for an invalid polygon (e.g. no points at all, or not closed)?...

Thanks, Barend

Are these generic "algorithms" or part of the concepts themselves? I think a number of options make sense for any particular library but not for the concepts -- I think area(p) should not be part of the point concept, and should not be available for a point archetype, but it may exist anyways in your particular library. Concepts don't usually "disallow" functions do they? (or do they?) As far as invalid polygons --- I am curious how one would go about creating a polygon that is not closed. I would tend to think that since you have a linestring, then a polygon must mean that the shape is closed. --John

Brandon Kohn

7 Mar 7 Mar

1:04 p.m.

-------------------------------------------------- From: "Simonson, Lucanus J" <lucanus.j.simonson@intel.com> Sent: Friday, March 06, 2009 5:22 PM To: <boost@lists.boost.org> Subject: Re: [boost] [geometry] area, length, centroid, behavior

...

I'm still confused. Why isn't scanline suitable for floating point? Isn't generalize line intersection with floating point generally implemented as a scanline patterned after Bentley-Ottmann?

I don't really know what algorithm is most often used for floating point (I would suspect bsp tree based for boolean ops due to the graphics applications). For generalized line intersection in FP my bet would be that the brute force is the most often used. The Bentley-Ottmann is a great algorithm, but it is a serious pain to implement in floating point due to problems maintaining a stable sorting of the segment intersections with the sweep-line. Precision really becomes a hairy issue with this. I have a faithful implementation of the algorithm from a text-book which still manages to fail on occasion (even with fuzzy comparisons though not so frequently as without.)

...

If it can be used to identify the intersection points it can also be used to identify interior edges. It can be the same algorithm that does both in a single pass. How does your generalized line intersection work, if not line scan? I know that it is technically challenging to write a robust linescan on floating point geometry but what practical alternative is there? Even with floating point I can bound how far a segment might move at the current point due to a future intersection snapping to the floating point grid based upon its end points and collect nearby points that may need to intersect it. I haven't done it, but it is possible. Once that algorithm is in place everything else becomes easy.

If your union algorithm doesn't work by scanline and doesn't work by rule-based graph traversal how does it work? These are the only two methods I've heard of.

I have been working on a bsp-tree based version of the boolean ops. The other I know of is the scan-line version (which I gather is the best known.) I suspect that there is a grid based version out there, but I haven't researched it. I would be curious to learn about another as well. Cheers, Brandon

Thomas Klimpel

8 Mar 8 Mar

10:49 a.m.

Brandon Kohn wrote:

...

Simonson, Lucanus J wrote:

...
I'm still confused. Why isn't scanline suitable for floating point? Isn't generalize line intersection with floating point generally implemented as a scanline patterned after Bentley-Ottmann?

I don't really know what algorithm is most often used for floating point (I would suspect bsp tree based for boolean ops due to the graphics applications). For generalized line intersection in FP my bet would be that the brute force is the most often used. The Bentley-Ottmann is a great algorithm, but it is a serious pain to implement in floating point due to problems maintaining a stable sorting of the segment intersections with the sweep-line. Precision really becomes a hairy issue with this.

Is it really just the problem of maintaining a stable sorting of the segment intersections with the sweep-line? This will surely require some care, but doesn't sound like a fundamental issue ruling out Bentley-Ottmann for floating point. What looks more nasty to me is that for nearly parallel segments, Bentley-Ottmann has to compute the intersection point of the corresponding lines, even if it is not an intersection point of the segments. But perhaps a simple modification of Bentley-Ottmann can avoid the need to depend on "false" intersection points. I'm just wondering whether there are more fundamental issues ruling out a Bentley-Ottmann type algorithm applied to floating point than "simple" stable sorting issues.

...

I have a faithful implementation of the algorithm from a text-book which still manages to fail on occasion (even with fuzzy comparisons though not so frequently as without.)

I don't think that fuzzy comparison is a good idea when you want to make Bentley-Ottmann robust for floating point. You will have to ensure that "important decisions" are only computed once, and enforce that no other intermediate/computed result can contradict these "decisions". Assuming it is possible to make a Bentley-Ottmann type sweep-line algorithm robust against floating point issues, what other issues will have to be considered when deciding which algorithm to use? Regards, Thomas

Brandon Kohn

3:30 p.m.

-------------------------------------------------- From: "Thomas Klimpel" <Thomas.Klimpel@synopsys.com> Sent: Sunday, March 08, 2009 5:49 AM To: <boost@lists.boost.org> Subject: Re: [boost] [geometry] area, length, centroid, behavior

...

...
Is it really just the problem of maintaining a stable sorting of the segment intersections with the sweep-line? This will surely require some care, but doesn't sound like a fundamental issue ruling out Bentley-Ottmann for floating point.

It does seem simple doesn't it. Perhaps you should give it a try to see... I would certainly welcome a more robust solution. I would like to point out that even the LEDA Bentley-Ottmann implementation has issues on FP (at least the version I tried a couple of years ago.) Here's another paper describing (in sometimes painful detail ;) some of the issues. ftp://ftp.inria.fr/INRIA/publication/publi-pdf/RR/RR-3270.pdf

...

...
What looks more nasty to me is that for nearly parallel segments, Bentley-Ottmann has to compute the intersection point of the corresponding lines, even if it is not an intersection point of the segments. But perhaps a simple modification of Bentley-Ottmann can avoid the need to depend on "false" intersection points.

I don't have anything like this in my implementation. I use segment intersection testing not line (though these tests do involve some of the same mechanisms as line intersection testing but with constraints). The intersection tests are only computed on segments which are adjacent to each other in the sweep line. I don't see how being nearly parallel makes any difference. If the segments (not lines) do not intersect then they do not report a 'false' intersection. The issues that I have encountered come from the computation of the intersection point of one of the segments with the sweep line (not between the segments.) and the subsequent comparison with similar intersection points from neighboring segments in close vicinity.

...

...
I'm just wondering whether there are more fundamental issues ruling out a Bentley-Ottmann type algorithm applied to floating point than "simple" stable sorting issues.

...

...
I have a faithful implementation of the algorithm from a text-book which still manages to fail on occasion (even with fuzzy comparisons though not so frequently as without.)

I don't think that fuzzy comparison is a good idea when you want to make Bentley-Ottmann robust for floating point. You will have to ensure that "important decisions" are only computed once, and enforce that no other intermediate/computed result can contradict these "decisions".

Perhaps you are right here, but I would have to think about it further. I'm not sure that there are cases where you are calculating the same information multiple times. The segment intersections are calculated once. The sorting is only done when a segment is inserted into the sweepline structure (so one comparison between each segment inserted and those already in the structure). I suppose the one bit of information that is calculated many times is the topology relation to neighboring segments when interpolating the current intersection point with the sweep-line over a range of event points. When points become very close (to within an epsilon tolerance) the ordering on the line becomes indeterminate when simply using fp comparison. It may be possible to do something along the lines of what you suggest here. Perhaps this idea is similar to what is done here: http://cutebugs.net/files/curve/2-27.ps If you would like to give this a go, I'd be happy to collaborate. The code is available at: http://sourceforge.net/svn/?group_id=240335 (project name gengeomalg). svn checkout: svn co https://gengeomalg.svn.sourceforge.net/svnroot/gengeomalg gengeomalg

...

Assuming it is possible to make a Bentley-Ottmann type sweep-line algorithm robust against floating point issues, what other issues will have to be considered when deciding which algorithm to use?

I would say the same considerations as any other algorithm: correctness, run-time complexity, and memory use. Regards, Brandon

Simonson, Lucanus J

9 Mar 9 Mar

6:01 p.m.

...

...
I don't think that fuzzy comparison is a good idea when you want to make Bentley-Ottmann robust for floating point. You will have to ensure that "important decisions" are only computed once, and enforce that no other intermediate/computed result can contradict these "decisions".

Brandon Kohn wrote:

...

Perhaps you are right here, but I would have to think about it further. I'm not sure that there are cases where you are calculating the same information multiple times. The segment intersections are calculated once. The sorting is only done when a segment is inserted into the sweepline structure (so one comparison between each segment inserted and those already in the structure). I suppose the one bit of information that is calculated many times is the topology relation to neighboring segments when interpolating the current intersection point with the sweep-line over a range of event points. When points become very close (to within an epsilon tolerance) the ordering on the line becomes indeterminate when simply using fp comparison. It may be possible to do something along the lines of what you suggest here.

There is a trick to perform the comparision of slopes more accurately. If you want to know whether a/b < c/d you can either write: if(a/b < c/d) Or: if(ad < cb) Since division tends to do unfortuntate things it is better to use multiply. If your coordinate is double you can use long double for the multiply and it will fit better. You can extend this trick to compare two scanline intercept points. a = y2 - y1; b = x2 - x2; Intercept1 = y1 + (x - x1) * a/b; Intercept2 = y2_1 + (x - x2_1) * a2/b2; Proportional_to_intercept1 = y1 * b * b2 + (x - x1) * a * b2; Proportional_to_intercept2 = y2_1 * b * b2 + (x - x2_1) * a2 * b; Intercept1 < Intercept2 == Proportional_to_intercept1 < Proportional_to_intercept2 However, you run out of room in even the long double with so many multiplies, so you should investigate higher precision floating point data types to achieve robust arithmetic. My main concern isn't whether the scanline is maintained properly, it is whether spurious intersections are introduced when snapping an intersection point to the floating point coordinate grid. Because snapping moves the edge laterally by some small ammount it may cause it to cross to the other side of a vertex than the original segment, leading to the invariant that all output segments are non-intersecting and touch only at their end points being violated, which breaks downstream code. This is a bad problem in floating point because the ammount the segment moves laterally depends on where the intersection point was snapped in the floating point domain, and can become arbitrarily large. I'm much happer with integer coordinates. Regards, Luke

Thomas Klimpel

11:18 p.m.

Simonson, Lucanus J:

...

My main concern isn't whether the scanline is maintained properly, it is whether spurious intersections are introduced when snapping an intersection point to the floating point coordinate grid.

It is an interesting question whether talking about a floating point coordinate grid is appropriate. An important difference between integer coordinates and floating point coordinates is that integer coordinates input must often be treated as exactly given, while it is often OK to modify floating point input slightly. (But this is problem dependent, and 2D geometry problems are probably more naturally associated with fixed-point/integer coordinates than with floating point coordinates. So following "floating point rules" for 2D geometry problems will probably often be inappropriate.)

...

Because snapping moves the edge laterally by some small amount it may cause it to cross to the other side of a vertex than the original segment, leading to the invariant that all output segments are non-intersecting and touch only at their end points being violated, which breaks downstream code.

It may be OK in a floating point context to break this invariant, if enough information is given to downstream code that it can avoid breakage. If this is not an option, conversion of the initial input to fixed-point may be the more appropriate way to go. (Remember that it is often OK to modify floating point input slightly, so the inexact conversion to fixed-point is not necessarily a problem). A computed floating point coordinate would simply be allowed to have a tolerance of epsilon.

...

This is a bad problem in floating point because the amount the segment moves laterally depends on where the intersection point was snapped in the floating point domain, and can become arbitrarily large. I'm much happier with integer coordinates.

I find it a bit difficult to understand which output is considered valid in an integer coordinates context. The output in figure 2 of http://cutebugs.net/files/curve/2-27.ps (the reference suggested by Brandon) feels wrong to me, but it is apparently the expected output in this case. I'm not sure whether I should try to implement a Bentley-Ottmann line-intersection algorithm along the way I suggested to Brandon, but if I did, this implementation would not be allowed to report the two additional intersection points, because this would contradict the decision computed previously at the event-point (0,0) that the segment [(0,0),(9,5)] is "greater" than the segment [(0,0),(13,6)] along the sweep-line for x>0. But the implementation would still be allowed to report the intersection point (5.45,2.52) as (5,3), because it would be understood that this result has a tolerance on the order of epsilon=1. Regards, Thomas

Simonson, Lucanus J

10 Mar 10 Mar

5:24 p.m.

Thomas Klimpel wrote:

...

It is an interesting question whether talking about a floating point coordinate grid is appropriate. An important difference between integer coordinates and floating point coordinates is that integer coordinates input must often be treated as exactly given, while it is often OK to modify floating point input slightly. (But this is problem dependent, and 2D geometry problems are probably more naturally associated with fixed-point/integer coordinates than with floating point coordinates. So following "floating point rules" for 2D geometry problems will probably often be inappropriate.)

Hmm, if I move a point during the algorithm I move its edges laterally and propigate the badness throughout the network. We do have to keep in mind the fact that there is a floating point grid and not pretend that floating point is continuous.

...

...
Because snapping moves the edge laterally by some small amount it may cause it to cross to the other side of a vertex than the original segment, leading to the invariant that all output segments are non-intersecting and touch only at their end points being violated, which breaks downstream code. It may be OK in a floating point context to break this invariant, if enough information is given to downstream code that it can avoid breakage. If this is not an option, conversion of the initial input to fixed-point may be the more appropriate way to go. (Remember that it is often OK to modify floating point input slightly, so the inexact conversion to fixed-point is not necessarily a problem). A computed floating point coordinate would simply be allowed to have a tolerance of epsilon.

This might be true in the case that I were writing the downstream code. However, if the downstream code is inside Abaqus or a solid modeling program such as ACIS, or a mesher I have to meet their requriements for well formed inputs, which means no monkey buisness. We actually do convert floating point to fixed point, perform the geometry operations and convert back. This could be problematic if the fixed point had a finer resolution than floating point. However, if your input extents is bounded to 32bit integer domain and you use double for floating point there is no approximation needed to convert integer to floating point, and so nothing goes wrong in that step.

...

...
This is a bad problem in floating point because the amount the segment moves laterally depends on where the intersection point was snapped in the floating point domain, and can become arbitrarily large. I'm much happier with integer coordinates.

I find it a bit difficult to understand which output is considered valid in an integer coordinates context.

Whatever the algorithm does is right by its own definition of "right". No approximation is correct in a pedantically strict sense of the word, meaning both floating point and integer operations will produce artifacts when things are happening below the threshold of the coordinate grid. Regards, Luke

Thomas Klimpel

11:34 p.m.

Simonson, Lucanus J wrote:

...

Thomas Klimpel wrote:

...
So following "floating point rules" for 2D geometry problems will probably often be inappropriate.)

Hmm, if I move a point during the algorithm I move its edges laterally and propigate the badness throughout the network. We do have to keep in mind the fact that there is a floating point grid and not pretend that floating point is continuous.

Questioning whether talking about a floating point grid is appropriate is not the same thing as pretending that floating point is continuous. It's more about the statement: "A computed floating point coordinate would simply be allowed to have a tolerance of epsilon." When this tolerance is not an option, using fixed point should be considered. (Using fixed point, the same program compiled with two different compilers will (should) yield exactly identical results, which is not true when using floating point.)

...

...
It may be OK in a floating point context to break this invariant, if enough information is given to downstream code that it can avoid breakage. If this is not an option, conversion of the initial input to fixed-point may be the more appropriate way to go. (Remember that it is often OK to modify floating point input slightly, so the inexact conversion to fixed-point is not necessarily a problem). A computed floating point coordinate would simply be allowed to have a tolerance of epsilon.

This might be true in the case that I were writing the downstream code. However, if the downstream code is inside Abaqus or a solid modeling program such as ACIS, or a mesher I have to meet their requirements for well formed inputs, which means no monkey buisness.

The monkey business here is that the typical downstream code (written by others) doesn't clearly define what it considers as well formed inputs. (Are keyholes allowed? Are overlapping edges that don't share common endpoints allowed?) As soon as there is a clear definition of the requirements for well formed inputs (of a particular downstream code), one can try to conform to these requirements. If I would implement a Bentley-Ottmann line-intersection algorithm along the way I suggested to Brandon, it would probably not conform to all commonly occurring requirements of downstream code. However, I don't think that such an implementation would be "monkey business", because of that.

...

...
...
I'm much happier with integer coordinates.

I find it a bit difficult to understand which output is considered valid in an integer coordinates context.

Whatever the algorithm does is right by its own definition of "right".

The cited example looked like a severe change of topology without any good reason to me, and the authors weren't yet talking about algorithms at all. They just showed the example and the expected result (which was quite unexpected to me). However, it was not just that paper, but that I wondered before about the question how to characterize the expected result in an integer coordinates context. When I suggested a to Brandon how I think a Bentley-Ottmann line-intersection algorithm could be implemented for floating point, I was mostly concerned with the internal consistency of the implementation itself, not about special requirements on the generated output. However, the typical paper talking about issues of inexact arithmetic for geometric algorithms is more concerned about accuracy requirements of the output than about a fast and consistent implementation without using multiple precision libraries. And the typical paper often takes a fixed-point/integer number point of view. I understand that there are good reasons for this, but I still don't understand why a severe change of topology should be OK, but the (nearly) unavoidable "tolerance of epsilon" of a floating point result should be "monkey business". Regards, Thomas

Phil Endecott

10:19 a.m.

Simonson, Lucanus J wrote:

...

My main concern isn't whether the scanline is maintained properly, it is whether spurious intersections are introduced when snapping an intersection point to the floating point coordinate grid. Because snapping moves the edge laterally by some small ammount it may cause it to cross to the other side of a vertex than the original segment, leading to the invariant that all output segments are non-intersecting and touch only at their end points being violated, which breaks downstream code.

Let me see if I understand this correctly. Say you have two line segments AB and CD that cross at E: A D \ / \ /F \ / E / \ / \ C B Due to the floating point approximation, C-E-D is not perfectly straight and there is some other vertex F which lies one one side of CD but the other side of ED. This can also happen with integer co-ordinates, but it's easier to reason about in that case. Right? It seems to me that the fix must be to forget about C-D as soon as E has been found, and only use the new edges CE and ED from then on. If this is true, what are the implications for the interfaces to the algorithms that do this? Does it mean that if I have: polygon P; polygon Q; polygon R = intersection(P,Q); then intersection() should be allowed to modify P and Q to add new points? Otherwise, we may end up with points that are inside R but not inside P or Q. Thinking about it from the application requirements point of view it may be acceptable to have algorithms that are guaranteed to err one way and not the other, e.g. union1(A,B) contains every point in A and B and perhaps some extra ones union2(A,B) contains most points from A and B and no points not in A or B I would like to think that this has all been considered many timed before.... Phil.

Simonson, Lucanus J

5:31 p.m.

...

Let me see if I understand this correctly. Say you have two line segments AB and CD that cross at E:

A D \ / \ /F \ / E / \ / \ C B

Due to the floating point approximation, C-E-D is not perfectly straight and there is some other vertex F which lies one one side of CD but the other side of ED. This can also happen with integer co-ordinates, but it's easier to reason about in that case. Right?

It seems to me that the fix must be to forget about C-D as soon as E has been found, and only use the new edges CE and ED from then on.

You are overlooking the fact that F may be passed by the scanline before we reach E, in which case we considered F with respect to CD, and can't put humpty dumpty back together again once be cross it by snapping E. What I do is find the points such as F that may be crossed due to intersectin snapping and intersect CD with them assuming an E will will come along later. These intersections could be made optional and performed only in the cases where it turned out to be necessary, but it is rare enough I don't bother.

...

If this is true, what are the implications for the interfaces to the algorithms that do this? Does it mean that if I have:

polygon P; polygon Q; polygon R = intersection(P,Q);

then intersection() should be allowed to modify P and Q to add new points? Otherwise, we may end up with points that are inside R but not inside P or Q.

Thinking about it from the application requirements point of view it may be acceptable to have algorithms that are guaranteed to err one way and not the other, e.g.

union1(A,B) contains every point in A and B and perhaps some extra ones union2(A,B) contains most points from A and B and no points not in A or B

I would like to think that this has all been considered many timed before....

You would be right. Commercial tools have options to snap vertices outward and "grow" the output polygon slightly or inward and "shrink" the output polygon slightly, or just snap to the closest. This allows for the type of reasoning you outline above. Regards, Luke

Steve M. Robbins

11 Mar 11 Mar

6:09 a.m.

Hi, On Mon, Mar 09, 2009 at 11:01:33AM -0700, Simonson, Lucanus J wrote:

...

There is a trick to perform the comparision of slopes more accurately.

If you want to know whether a/b < c/d you can either write:

if(a/b < c/d)

Or:

if(ad < cb)

[ ... ]

...

However, you run out of room in even the long double with so many multiplies, so you should investigate higher precision floating point data types to achieve robust arithmetic.

Indeed. A lot of work in Computational Geometry (e.g. Exact Geometric Computation [1]) divides the numeric computation into two types. First, we have decisions ("are three points collinear or not?") that are based on comparing computed numbers like the slopes example above. Second, we have newly-computed geometric entities like the point of intersection of two line segments. These two types of computations are sometimes called Predicates and Constructions, respectively. Libraries that support Exact Geometric Computing (e.g. CGAL) mandate that Predicates are exact, but sometimes the Constructions can be approximate, depending on the application needs. Can the proposed Boost libraries support this kind of exact Predicates with approximate Constructions paradigm? Regards, -Steve [1] ftp://ftp-sop.inria.fr/geometrica/pion/publis/egc_survey_jlap.pdf

Simonson, Lucanus J

4:58 p.m.

Steve M. Robbins wrote:

...

Can the proposed Boost libraries support this kind of exact Predicates with approximate Constructions paradigm?

I don't see that we have a choice. Inexact Predicates leads to undefined prgram behavior, incorrect outputs, hangs and crashes. Exact Constructions implies the output coordinate data type is multi-precision. Most users would typically prefer the other way around for both. The point of the slope comparison trick is that it is a way to avoid loss of precision without resorting to multi-precision data types. It needs only ~66 bits of precision for 32 bit integer coordinates, and so long double is suitable in all cases to correctly compare the slopes of integer line segments. My comparisons in integer geometry are exact if a multi-precision data type is specified, otherwise it uses long double, which turns out to be good enough almost all the time. My constructions are approximate because I snap to the integer grid. Luke

Steve M. Robbins

5:48 p.m.

On Wed, Mar 11, 2009 at 09:58:31AM -0700, Simonson, Lucanus J wrote:

...

Steve M. Robbins wrote:

...
Can the proposed Boost libraries support this kind of exact Predicates with approximate Constructions paradigm?

I don't see that we have a choice. Inexact Predicates leads to undefined prgram behavior, incorrect outputs, hangs and crashes. Exact Constructions implies the output coordinate data type is multi-precision. Most users would typically prefer the other way around for both.

Arguably, once in a while, someone will want to use an extended precision number type and get exact constructions. But the common case would be as you state.

...

The point of the slope comparison trick is that it is a way to avoid loss of precision without resorting to multi-precision data types. It needs only ~66 bits of precision for 32 bit integer coordinates, and so long double is suitable in all cases to correctly compare the slopes of integer line segments.

OK. But I'm assuming that I can define my own point class and use my favourite number type for the coordinate values. (I admit that I haven't been looking closely at the current proposed library code, so that may be a mistaken assumption.) If I use doubles for the coordinate values, then the slope comparison trick needs more precision. Basically, the intersection test is a degree 2 predicate so it needs about double the precision of the input (coordinate) data types. So each predicate needs to be templated over the coordinate number type. Or something similar. How is this achieved in the proposed libraries? Cheers, -Steve

Simonson, Lucanus J

7:12 p.m.

...

...
The point of the slope comparison trick is that it is a way to avoid loss of precision without resorting to multi-precision data types. It needs only ~66 bits of precision for 32 bit integer coordinates, and so long double is suitable in all cases to correctly compare the slopes of integer line segments.

Steve M. Robbins wrote:

...

OK. But I'm assuming that I can define my own point class and use my favourite number type for the coordinate values. (I admit that I haven't been looking closely at the current proposed library code, so that may be a mistaken assumption.) If I use doubles for the coordinate values, then the slope comparison trick needs more precision.

Basically, the intersection test is a degree 2 predicate so it needs about double the precision of the input (coordinate) data types. So each predicate needs to be templated over the coordinate number type. Or something similar. How is this achieved in the proposed libraries?

If you use your favorite number type you can supply your favorite multi-precision number type. I use a big hammer and allow the user to specialize a meta-function that supplies the high-precision data type to use in all such cases for their given coordinate. It defaults to long double. I use gmp rational for my integer arithmetic when I use the library, but don't depend on gmp in the library header files because it can be customized at the application level. While it could be more efficient to specify a different high-precision type for each predicate (long double is sufficient for my integer slope comparisions, but gmp rational is used) it is less reasonable to expect such sophisticated custimization to be applied by the user. Each predicate could be individually overridden by specializing the function that implements it for the given coordinate type, but this would be very challenging because it would require knowledge of the internals of the library and only expert users might do it. So far I'm the only person with such knowledge. I could, for example, optimize integer slopes comparision to use long double instead of whatever the high-precision data type is parameterized to. I wouldn't expect anyone else to do so. It might be worthwhile to create a separate module of robust geometry primitives that privdes a nice interface with points of customization instead of letting them be implementation details of specific algorithms. Regard, Luke

Barend Gehrels

13 Mar 13 Mar

3:27 p.m.

...

...
Steve M. Robbins wrote:

...
Can the proposed Boost libraries support this kind of exact Predicates with approximate Constructions paradigm?

OK. But I'm assuming that I can define my own point class and use my favourite number type for the coordinate values. (I admit that I haven't been looking closely at the current proposed library code, so that may be a mistaken assumption.) If I use doubles for the coordinate values, then the slope comparison trick needs more precision.

Basically, the intersection test is a degree 2 predicate so it needs about double the precision of the input (coordinate) data types. So each predicate needs to be templated over the coordinate number type. Or something similar. How is this achieved in the proposed libraries Currently we concentrated on concepts and the core of the library, and still have the original (naive) approach to floating point comparisons. However we had discussed internally to enhance this by providing a

traits-based solution. Users often need to define there own tolerance, especially when it comes to use an algorithm such as intersection or neighbourhood detection. Others want only performance. So we will let them define the way in which they want comparisons to be made (strict or tolerant) via the geometry traits already in place. The Point Concept will have a traits class e.g. "comparison". This will be an enhanced version of the "equals" function used now. However, library users can implement another comparison class, and attach this via traits to their point type. Which will be used by floating point equality evaluation. This way we handle the defaults correctly, we enable matching of points which are more distant and still to be considered as equal, and enable configurable floating point (comparison) models. If other things then equality will be checked, those cases would be handled in the same way. So, in summary, it will be part of the concept, configurable. How FP equality then implemented is to be defined, it would be useful if it is somewhere within Boost because its scope is broader then geometry alone. Regards, Barend

Simonson, Lucanus J

5:03 p.m.

Barend Gehrels wrote:

...

So we will let them define the way in which they want comparisons to be made (strict or tolerant) via the geometry traits already in place. The Point Concept will have a traits class e.g. "comparison". This will be an enhanced version of the "equals" function used now. However, library users can implement another comparison class, and attach this via traits to their point type. Which will be used by floating point equality evaluation. This way we handle the defaults correctly, we enable matching of points which are more distant and still to be considered as equal, and enable configurable floating point (comparison) models. If other things then equality will be checked, those cases would be handled in the same way.

So, in summary, it will be part of the concept, configurable. How FP equality then implemented is to be defined, it would be useful if it is somewhere within Boost because its scope is broader then geometry alone.

Doesn't it make more sense to make this a trait of the coordinate concept than adding it onto a specific geometry concept? You could then use the trait of the coordinate for all coordinate comparisions. I would expect there are cases where you need to compare two x values and don't care about the y and it would be awkward to construct points to use the traits of the point for this. Luke

5970

Age (days ago)

5987

Last active (days ago)

List overview

Download

66 comments

15 participants

participants (15)

Barend Gehrels
Brandon Kohn
Bruno Lalande
Jean-Sebastien Stoezel
John Femiani
Mathias Gaunard
Matt Calabrese
Michael Fawcett
Mika Heiskanen
Phil Endecott
Sebastian Redl
Simonson, Lucanus J
Steve M. Robbins
Thomas Klimpel
Topher Cooper