[intrusive] My review

20 Mar 2007

      Ion:  I have limited time to make a review, so I'm sorry that I had  
to concentrate on the documentation without reviewing the code, which  
I will do at a later date.  However, what I saw from the  
documentation is by-and-large how I would have done a design, no  
surprises there but some nice twists and a very well thought-out  
interface. The devil being in the detail I'm glad you validated the  
design with some working code.

This being said, this is a first-rate and very useful library, and I  
vote whole-heartedly to accept it to boost.

I have a few spot-comments on the documentation, to follow at the end  
of this message.

My main criticism is with one aspect of the design:  in std::set and  
other associative container, the key (part of the value_type that  
determines the order) is constant, and cannot be changed (once  
constructed) since the std::set interface only grants constant access  
to it.  This makes it possible to maintain the invariant of the  
std::set class.  It is no longer possible to guarantee this with  
intrusive containers.  You need to mention that upfront in the doc of  
iset with a big warning sign!

But now I must mention that this is actually a plus!!!  In geometric  
algorithms (Fortune's sweep for Voronoi diagram, to be specific), the  
key is an intersection with the sweep line, and varies over time. To  
simplify a long story, the algorithm has discrete events at which the  
order can change, but the invariant is maintained by swapping two  
elements whose keys "cross".  In order to guarantee logarithmic time  
per event, some kind of balanced tree is needed.  Naturally one would  
like to use std::set, but because the key changes and the user (no  
longer the class) is responsible for maintaining the set invariant,  
one must go through hoops, and use mutable data members, etc.  It  
gets ugly!  The worse part is that one must add two nodes which are  
equal (at the critical sweep event, they'll get ordered as soon as  
the sweep line starts to move), but std::set does not allow to add  
the two nodes with guaranteed relative position, without performing  
some comparisons (which have no geometric meaning: the comparison  
object must be tweaked).  It gets *very* ugly!

For a long time, I have advocated separating the rbtree algorithms  
from the actual data structure, for reuse.  I wrote a paper with  
Jyrki Katajainen (now a CPH STL report) about this. One can imagine  
applying rbtree algorithms, or avl_algorithms, or  
splay_tree_algorithms, all of them without a class;  each offers a  
different tradeoff.  In particular, I'd expect treaps to perform very  
well.  In essence, this is exactly what your library does: it  
separates storage from algorithms, and thus offers the potential for  
reuse.  This is the promise of generic programming even beyond the C+ 
+ STL.

You might add my Fortune's sweep example to the others you provide  
(legacy objects, etc.)  I imagine kinetic data structures and some  
graph algorithms would be fine consumers of this approach as well.

I think it is very natural, and would very much like to (but won't  
have the time), extend your framework with (geometric) halfedge and  
(combinatorial) graph-like data structures.  I think it will be a  
great winner there, even more than standard C++ containers (one  
dimensional).  Ultimately, because geometric algorithms typically  
require building several data structures on top of geometric objects,  
the framework here has even more value than the implementations.

I would really like to see a chapter in the documentation "Guidelines  
for extending the framework to domain-specific containers".  of  
course, it is not a requirement for the library, merely a suggestion  
but given the potential for extension, you will have questions.  A  
graph-like example compatible with the Boost Graph Library would  
superbly demonstrate the power of the approach!   Who knows, it could  
even become part of the intrusive library one day.  After reading  
your library, I feel that this is how my generic HDSTL (half-edge  
data structure library) should be implemented (it remains as a half- 
baked research project, having been published in WAE'2001).

I will take a longer look but the end of the review approaching, I  
feel it's best to dump this now and promise later :)

Thanks for a very promising library, and regards,
--
Herve Bronnimann

PS: Here are the more important comments I jotted down while reading  
the docs:  I spent about two hours doing so, so forgive the  
sketchiness and incompleteness of the review, but one more review is  
always better than nothing :)

* documentation:  I wanted to see, while browsing the doc and  
clicking on the class name, the synopsis for header, in addition to  
the class interface for the class (following the C++ standard).  But  
actually, by the end of the documentation, we get the header synopsis  
so the physical architecture of the library is clear.

* documentation: iterator and const_iterator types seems to be  
missing consistently for each class interface (from islist, ilist,  
but not from iset/imultiset.)

* documentation: bool ConstantTimeSize = (default not provided in  
ilist and islist).

* design: const/mutable value type?  While it is possible to violate  
the invariants of a std::set, that requires willfull wrongdoing from  
the programmer, whereas it seems much easier to inadvertently do so  
with intrusive set.  Must be pointed out with a big warning sign.

* documentation: iterator validity guarantees for islist are part of  
individual functions, an overview of the iterator validity rules in  
the description would be welcome.

* indeed, the two foremost advantages of intrusive containers are:  
non-invalidation rules for iunordered_set, and possibility to store  
derived objects in the intrusive container (without extra  
indirection, that is - it's possible with storing pointers to object  
in std containers).

* combination of smart pointers + intrusive containers:  the  
intrusive container can't use simple pointers if the object lifetimes  
are managed by smart pointers.  Thus it appears that having managed  
objects actually either forces the intrusive containers to use same  
managed pointers, or else to use the auto-unhook feature.  This is  
not well discussed in the relevant page (about smart pointers and  
intrusive containers).

BTW, I don't understand the requirement on smart pointer that "It  
must have the same ownership semantics as a raw pointer."  I don't  
see why a shared_ptr couldn't be used, in fact, I would find some  
advantages to doing so (preferable imho to the auto-unhook feature).   
But I haven't thought too long about it.

* rbtree_algorithms:  here's a great opportunity to incorporate the  
order-statistic tree (augmented rbtree) with additional requirements  
on the node traits.  If there is an int to spare in the data, then  
you can use it.  You only have to make sure you use the correct  
algorithms (if rotating, etc., then the node's size field must be  
recomputed too for the order-statistic operations to be correct).   
The extra field can be computed at any given time (not necessary to  
maintain it through the insertions/manipulation) if only needed in  
one section of the code, and even reclaimed later if needed.  This is  
similar to having a singly or doubly linked list sometimes or in  
other situations (see page "custom valueTraits").

Here is another use for that schema of using storage and reclaiming  
it when no longer needed: insertion in singly linked list (which is  
faster) and some operations on it (e.g. move-to-front heuristic which  
only requires islist), then making doubly-linked list (in linear time  
by setting the reverse pointer) to allow more operations when that is  
need (e.g. entering another section of the code).  Or reusing the two  
same list pointers for an iset if the ilist structure is no longer  
needed.

Hervé Brönnimann

Ion Gaztañaga

tags

participants (2)