
I am pleased to be able to review this important library. Boost would definitely benefit from the addition of a geometry library of the right sort. I am certainly not a geometry expert, but I have written a number of things in the last few years that use geometry in GUIs and for geographic data. I have written this review as a potential user by evaluating how well it would work in a couple of example applications. My experience over all has been a little disappointing. I think this is mainly because I had very high hopes for the library. The reality has not quite met my expectations. As I will explain in detail below, my complaints are mainly things like excessive warnings and odd misfeatures in the interface. These are all things that could be fixed, and they perhaps only indicate that the library has arrived a little prematurely. Based only on these issues my verdict would be that that the library could be accepted after some revisions. But we must also look at the bigger picture, i.e. the existence of other competing libraries "in the wings". I will discuss this after describing my practical evaluation. Points starting ** in the following are issues that I think should be fixed for the library to be acceptable. Application 1 ------------- I have an application that can display placename search result markers on a world map (see http://chezphil.org/tmp/six_thousand_lakes.jpeg). When no markers are shown the screen can be updated at about 20 frames per second; when many markers are shown the frame rate drops to the point where usability is impaired. In this case the vast majority of the markers overlap each other, and this could be exploited to reduce the amount of work that needs to be done by the graphics hardware and hence boost the frame rate. The first step was to add the "magic" required by the library to support my rectangle type. The examples don't seem to include a custom rectangle, and the decomposition of a rectangle into a pair of intervals rather than the more common pair of points or point and size, made this a bit tricky. More serious were the following problems: ** The library does not seem to use any sort of explicit concept checking; if types do not satisfy their concepts random errors from deep inside the library are emitted. The verbosity and incomprehensibility of errors from complex C++ code is, in my opinion, the greatest single failing of the language today. In the face of these ridiculous messages, some users will simply abandon trying to use the library (or the whole language, and go back to C) or limit themselves to only what can be done by copying examples. Fixing this needs effort from the language designers, compiler writers, and libraries. I am not certain that concept checking can totally cure this problem, but it can only help. ** The library spews out enormous quantities of warnings (gcc 4.1.2), I think mainly related to unused parameters. This has to be fixed (and I don't believe fixing it is hard). ** The library failed to work with std::pair being used as the interval type, despite this being mentioned as a possibility in the documentation. The problem seems to be that the std namespace is brought into ADL and std::set is found rather than the library's own set function. I have not investigated how hard this would be to fix; instead I wrote my own pair template. Having completed the "magic", I started to adapt my marker drawing code. My first attempt was a minimally-invasive version that would check if each marker covered any new area and not plot it otherwise: if (!contains(markers,b)) { markers |= b; plot(b); } Unfortunately this failed since contains() does not seem to be defined for a polygon_90_set and a rectangle. This surprised me somewhat. A valuable addition to the documentation would be a matrix indicating which operations are supported between which types. Somebody will need every combination of operations and types, and it would be useful to see quickly which are provided, which are planned for a future version of the library, and which cannot be implemented due to some restriction of the library design. I could have tried to implement contains(polygon_90_set,rectangle) in terms of e.g. intersection but I was concerned that that would have much worse complexity than an optimised contains() algorithm could have. Which raises another issue: ** Algorithms should indicate their complexity guarantees, and (where different) their expected complexity for common inputs. So I moved on to a more invasive modification where I accumulate all of the rectangles and use get_rectangles() at the end to extract a minimal set of areas to plot: for (...) { markers |= b; } get_rectangles(rects,markers); foreach(rect in rects) { plot(rect); } This works, and reduces the number of rectangles that are plotted from about 6,000 to about 900. Unfortunately get_rectangles() takes about 100 ms to run, which is slightly longer than plotting the redundant markers would have taken, so overall the frame rate is not improved. In that code, I was surprised to see that get_rectangles takes an output container by reference. In this case, I have no need to allocate memory and store these rectangles at all. I would have prefered some way to get the rectangles as they were generated and pass them directly to my plot function, e.g. for_each_rectangle(markers, &plot); An output iterator would be a more conventional (but in this case more verbose) way to achieve this. ** Unless there is some good reason for using output reference parameters, algorithms should use output iterators. I was also a little confused by the difference - if any - between "markers |= b" and "markers.insert(b)". I believe that I know what the former does, but I worry that unioning-in each rectangle in turn is inefficient and that it would be better to construct a set of rectangles and self-union in one go. Perhaps this is what insert() does - yet there is no self_union function, and in practice I seem to get the same results. Maybe get_rectangles() has done the self_union. Some clarification of this in the documentation would be useful. Application 2 ------------- I have data for the borders of N. American states and provinces. My original plan was to try to reduce these state outline polygons to a set of polylines; my viewer does this as a pre-processing step to halve the work it does when plotting these borders. But the library really is limited to polygons (hence the name), with no support for even representing polylines let alone doing anything with them. It would be useful for the documentation to include a "roadmap" section indicating whether this is being considered for a future version. It might be worthwhile to define polyline and other concepts now, if they are relatively uncontroversial, to avoid multiple implementations later. So I decided instead to use this data to implement a "what state is this point in?" function. This boils down to reading in the state border polygons and testing points using contains(polygon,point). One issue that I noted while reading in the polygons is that the provided polygon_data type is immutable: to read from a file, I need either an exotic iterator that can be passed to its constructor, or a temporary. I'm unsure why polygon_data can't be more like a mutable std::container so that I can simply read points from a file and push_back() in a loop. Trying to use a std::vector<point_data> required more traits specialisation that I expected; can't this be pre-defined? ** The polygon_data type should be mutable (i.e. support things like push_back()), or a mutable version should be provided, or std::vector/list<point_data> should work with minimal traits specialisation. I also found that polygon_data is specialised by the coordinate type, not the point type, so anyone using their own point type will have to provide their own polygon type. Is there some reason why I can't have polygon_data<my_point>? Again, making it easier to use std::container<my_point> would help. I also found little mention of winding direction in the documentation. I haven't checked whether the data I have winds consistently; what does unknown_winding actually do? As I guessed, contains(polygon,point) is very slow if done naively: the library's polygon concept (and the provided default implementation) have no way to determine whether a point lies within a polygon in better than O(N) time. Testing first with the bounding box makes it O(1) in the common case. So I tried to write a polygon type that would do this, something like this (but not exactly this!): struct my_polygon: polygon_data { rectangle bbox; template <typename ITER> my_polygon(ITER begin, ITER end): polygon_data(begin,end), bbox(extents(*this)) {} }; bool contains(const my_polygon& poly, point_t pt) { return contains(poly.bbox,pt) && contains(poly,pt); } Hmm, well that looks OK until the very last "contains"... no doubt someone will immediately spot the "right" way to do this. I ended up making contains() a member of my_polygon, and this worked reasonably well. Note that the extents() function does not work as written above: it takes a reference to a rectangle as an out parameter. center() does something similar. This seems wrong to me; returning a point or rectangle would be more conventional I think. ** extents() and center() should return their results, rather than using an output reference. An important question is what this function should do when given a coordinate that lies on the boundary between two (or more) states. One can argue for various different behaviours. My preference would be to never return "no state" and to be deterministic, but otherwise I don't care. The contains() function has a parameter "consider_touch" that makes it possible to implement different policies; this seems sufficient in most cases, though I'm unsure how you would consider the left and bottom included and the right and top excluded in a 90-degree shape (which might be needed for a GUI). I have a fixed point type that I use for GPS coordinates (32 bits is good for lat/lng GPS positions, but 24 bits isn't) and using it with this library worked well. The Coordinate Concept page doesn't say much about the type requirements ("expected to be integral") and spelling out in more detail what is needed would be useful. In practice I must have already implemented everything that it needed. I would have liked to pursue this example further and see just how fast the "which state?" function can be made. For example, simplified versions of the state borders could be stored for fast lookup for points not near the borders. But I ran out of time. If anyone would like to experiment with this, the data can be found at http://chezphil.org/tmp/states.tgz . Other general notes ------------------- Operator Overloading : I am not a huge fan of "gratuitous" operator overloading. In this case, I find |= for union acceptable, particularly since I think I can disable this by not using the boost::polygon::operators namespace. I would prefer to avoid the duplicate operators (i.e. + == |). However, I can't see any way to perform these boolean operations other than using the operators, and I think this is an ommission. If I were writing code that I expected someone else unfamiliar with the library to understand I would like the important calls to be conspicuous. ** The library should provide functions to perform the boolean operations, as well as operators, for the benefit of those who want to make their code more verbose. I dislike the use of operators with integers for the grow/shrink operations. My concern is that this meaning is non-obvious, and since these operators are also defined for boolean operations, there is potential for error: polygon_set a, b, i; ... for (int i = 0; i<10; ++i) { ... a -= b+i; // oops, forgot i is now an int } Isotropic Style --------------- I recall Luke describing his isotropic style on the list a long time ago, and the documentation would benefit from an explanation along those lines. One "FAQ" that it would need to address is the question of the performance of a run-time selection of X vs. Y compared to a compile-time selection in more conventional code. I'm unsure if the use of caps for e.g. HORIZONTAL and VERTICAL enum members is Boost style. They look like macros to me. I'm not very happy with the use of compass directions as direction labels, since I have code in which these directions are definitely not what the axes actually mean. Other labels including LEFT and RIGHT would have the same problem in other contexts. GUI APIs are inconsistent about whether y increases up or down the screen, so terms like "top left" are also problematic. I think about the only safe terms are "increasing X axis" etc. I tend to use x0y0 ... x1y1 for corners. Documentation ------------- I find the documentation satisfactory, though it could benefit from some style improvements (e.g. the font for the example code). I suggest that an introductory tutorial should be added. This should fully explain everything that a new user who has no legacy code needs to know (i.e. it ignores the traits and uses the provided _data types.) Then, a second tutorial (more like the current material) could explain how users with their own types can use them with the library. Answers to basic questions like "is it header-only?" and "what other Boost libraries does it depend on?" should also be added somewhere prominent in the introduction. That concludes my practical examination of the library. We then have the question of rival geometry libraries to consider. Barend has presented a few previews of his library and I have looked at these, though not in much detail. My feeling was that Barend had a long way to go before he would have something useful, and that even then he may never produce something that I would actually be able to use. For example, he has decided to adopt the conventions of the OGC; this is not entirely unreasonable, but it means that semantics of things like whether border points are inside or outside a polygon are non-negotiable. Barend has also recently reminded us of the spatial indexes GSOC project and says that it is now included in his library, despite it being the worse code I have ever seen presented to Boost. Despite all that, Barend's recent comments suggest that he has made significant progress and that he may have something useful to show us in a few weeks. Barend has also raised questions about the worse-case performance of Luke's line intersection algorithm, and it seems that Luke accepts that it is not optimal. Achieving the best possible algorithmic complexity is something that I would consider very important or essential for a Boost library. At the very least, Luke needs to document the complexity of his current algorithm and explain why he cannot do any better. The views of experts should be sought if this cannot be resolved by the review manager. In view of all this, I suggest that this library should be rejected for now. This will tell Barend that he still has an opportunity to present his library for review, and that it will be considered on a level playing field. If Barend's library is reviewed and found to be more complete, more performant and at least as usable as this library, then it should be accepted. On the other hand, if Barend's library is found to be deficient in some way (or is not submitted for review), then Luke will have an opportunity to resubmit an updated version of this library, which I anticipate should be accepted. Regards, Phil.