
Hi Dan! Even though I'm not an "experienced booster" nor do I consider this to be "expert advice"... I've spent some time playing with designs and implementations for a generic tree framework and hope you'll find my comments useful. Dan Rosen wrote:
Of the two "how to design a generic tree container" camps, I'm finding myself mostly in the one where individual nodes are not exposed to users, and the tree is an opaque structure.
In my design nodes were made an implementation detail but I also attempted to define some node concepts to allow advanced users to plug their own node types into pre-defined trees. All pre-defined trees would take a TreeNode parameter which could (should!) be ignored for most use cases.
Also, nodes are not identical to iterators, and the iterator concept is not extended in any significant way from the standard. (Anybody who would like more detail, I'm happy to provide my draft of the concepts, but I don't want to get into that detail here -- it's not particularly relevant to my questions that follow.)
I'd be interested in looking at the details, however draft they are (Feel free to mail them privately).
I feel like I've been making decent progress so far, mostly working on paper, just documenting the concepts, and doing some coding exercises for proofs of concept.
This is where I am ;-)
But I don't have anything I'd consider ready for primetime yet -- not even remotely -- and in fact as I progress more solidly into the coding phase, I'm finding myself getting somewhat mired in the details. So I thought I'd solicit opinions from experienced Boosters here on some of the issues I'm trying to work through.
I'm primarily finding myself conflicted about what degree of flexibility the concepts and their concrete implementations need to offer. I thought it would be useful to allow clients of tree<> to be able to specify the container type for a node's children. In the code I'm currently playing with, I have a couple "selector" structs, basically identical to Boost Graph Library's vecS, listS, etc, and the tree<> class template takes one of these selectors as a template parameter. That much is all well and good, but I quickly run into trouble once I start dealing with nodes and iterators.
I'm not sure if you're referring to the standard containers here. I initially considered using those, after considering allocator issues and the (potentially) large per-node overhead I almost completely dropped the idea. Right now, I'm pondering on a node concept based on boost::range, plus some free functions (insert, update, erase...) that would allow greater flexibility and extensibility, with no overhead.
First of all, I'd like to provide access to the children of a given node as a sequence of the type specified by the client. So if you have an iterator to an element in the tree (the structure of the tree being otherwise opaque), I want to be able to say something like:
tree<foo, list_selector> t; tree<foo, list_selector>::some_kind_of_iterator i; // assume t contains something useful and i points to an element in t std::list<foo> children = t.children(i);
Personally, I prefer a tree-iterator concept that allows traversal of the tree without needing to carry the tree around. Having children as a member function of the tree prevents this.
At this point, I've got a problem: internal to the tree<>, the lists representing child sequences contain not instances of type "foo" but instances of something like node<foo>. So I can't simply provide access like this unless I can guarantee that my nodes are convertible in all contexts to their stored data. That seems like a rat hole.
That's one of the reasons I decided to define a TreeIterator concept that overlaps with the standard Iterator concept when it comes to sibling traversal.
Another thing I'm stuck on is, even if I don't expose the container structure like this, I still have a problem with iterators. The above example omits the actual iterator type, but to get into some specifics, one type of iterator I'd like to provide is a pre-order traversal iterator.
On the thread Justin Gottschlich started on this issue (http://tinyurl.com/44e3u), I suggested pre-order and other linearizations of a tree could be provided externally. I think putting these on the tree unnecessarily complicates the design of the tree and the iterators.
In my current code, this iterator is basically just a wrapper for iterators into the flat sequences I've been instructed to use, and some minimal code that knows how to jump between these sequences appropriately (for example: if a node has children, operator++ yields the first child of that node). It's easy to get an iterator into the sequence containing a node's children (internally, it's just something like node.children.begin()), but it's not so easy to get an iterator into the sequence containing a node's parent, unless a node contains not a pointer to its parent but an iterator instead. And that scheme breaks down for children of the root node, which is not in any sequence, as it has no siblings. I'm sure there are other options for how to implement the node structure and iterator relationships, but none jump out at me.
I solved parent/root issue by having the tree class promise only the most basic TreeIterator: for traversal from parent to child. I think this is the basic form of tree traversal all tree-iterators must provide. From the initial root iterator one could get SiblingIterators for iteration over its children. Depending on the underlying node, both iterators could actually be the same. This could be inspected through separate traits classes.
In short, although I'd like my concepts for trees to allow users to specify and access the sequences representing the children of a node, my implementation of this is proving very clumsy. So I'm wondering two things:
1. Is this a good idea in the first place? Are there really any compelling use cases, where you'd want one tree to store its node structure in vectors and another in lists? Or is this a case where a policy-based approach is misplaced effort?
In my design I planned to use a fixed size array for n-ary trees and a single or double-linked list for general trees.
2. If this actually is a good idea (which I'm starting to get the feeling is not the case), my experience so far is that there's some significant overhead -- both in terms of the size of the structures, and in terms of speed (extra pointer traversals when doing anything with iterators). Does anybody have any brilliant ideas about how to minimize this overhead?
Right! My idea was to write the basic single and double-linked lists operations from scratch. We don't be need the whole std::list interface for the nodes anyway. João