[boost] Re: boost::tree advice?

28 Mar 2005

      Hi Dan!

Even though I'm not an "experienced booster" nor do I consider this to 
be "expert advice"... I've spent some time playing with designs and 
implementations for a generic tree framework and hope you'll find my 
comments useful.

Dan Rosen wrote:
...
Of the two "how to design a generic tree container" camps, I'm finding
myself mostly in the one where individual nodes are not exposed to
users, and the tree is an opaque structure.
In my design nodes were made an implementation detail but I also 
attempted to define some node concepts to allow advanced users to plug 
their own node types into pre-defined trees.

All pre-defined trees would take a TreeNode parameter which could 
(should!) be ignored for most use cases.
...
Also, nodes are not
identical to iterators, and the iterator concept is not extended in
any significant way from the standard. (Anybody who would like more
detail, I'm happy to provide my draft of the concepts, but I don't
want to get into that detail here -- it's not particularly relevant to
my questions that follow.)
I'd be interested in looking at the details, however draft they are 
(Feel free to mail them privately).
...
I feel like I've been making decent progress so far, mostly working on
paper, just documenting the concepts, and doing some coding exercises
for proofs of concept.
This is where I am ;-)
...
But I don't have anything I'd consider ready
for primetime yet -- not even remotely -- and in fact as I progress
more solidly into the coding phase, I'm finding myself getting
somewhat mired in the details. So I thought I'd solicit opinions from
experienced Boosters here on some of the issues I'm trying to work
through.
I'm primarily finding myself conflicted about what degree of
flexibility the concepts and their concrete implementations need to
offer. I thought it would be useful to allow clients of tree<> to be
able to specify the container type for a node's children. In the code
I'm currently playing with, I have a couple "selector" structs,
basically identical to Boost Graph Library's vecS, listS, etc, and the
tree<> class template takes one of these selectors as a template
parameter. That much is all well and good, but I quickly run into
trouble once I start dealing with nodes and iterators.
I'm not sure if you're referring to the standard containers here. I 
initially considered using those, after considering allocator issues and 
the (potentially) large per-node overhead I almost completely dropped 
the idea.

Right now, I'm pondering on a node concept based on boost::range, plus 
some free functions (insert, update, erase...) that would allow greater 
flexibility and extensibility, with no overhead.
...
First of all, I'd like to provide access to the children of a given
node as a sequence of the type specified by the client. So if you have
an iterator to an element in the tree (the structure of the tree being
otherwise opaque), I want to be able to say something like:
tree<foo, list_selector> t;
  tree<foo, list_selector>::some_kind_of_iterator i;
  // assume t contains something useful and i points to an element in t
  std::list<foo> children = t.children(i);
Personally, I prefer a tree-iterator concept that allows traversal of 
the tree without needing to carry the tree around. Having children as a 
member function of the tree prevents this.
...
At this point, I've got a problem: internal to the tree<>, the lists
representing child sequences contain not instances of type "foo" but
instances of something like node<foo>. So I can't simply provide
access like this unless I can guarantee that my nodes are convertible
in all contexts to their stored data. That seems like a rat hole.
That's one of the reasons I decided to define a TreeIterator concept 
that overlaps with the standard Iterator concept when it comes to 
sibling traversal.
...
Another thing I'm stuck on is, even if I don't expose the container
structure like this, I still have a problem with iterators. The above
example omits the actual iterator type, but to get into some
specifics, one type of iterator I'd like to provide is a pre-order
traversal iterator.
On the thread Justin Gottschlich started on this issue 
(http://tinyurl.com/44e3u), I suggested pre-order and other 
linearizations of a tree could be provided externally. I think putting 
these on the tree unnecessarily complicates the design of the tree and 
the iterators.
...
In my current code, this iterator is basically
just a wrapper for iterators into the flat sequences I've been
instructed to use, and some minimal code that knows how to jump
between these sequences appropriately (for example: if a node has
children, operator++ yields the first child of that node). It's easy
to get an iterator into the sequence containing a node's children
(internally, it's just something like node.children.begin()), but it's
not so easy to get an iterator into the sequence containing a node's
parent, unless a node contains not a pointer to its parent but an
iterator instead. And that scheme breaks down for children of the root
node, which is not in any sequence, as it has no siblings. I'm sure
there are other options for how to implement the node structure and
iterator relationships, but none jump out at me.
I solved parent/root issue by having the tree class promise only the 
most basic TreeIterator: for traversal from parent to child. I think 
this is the basic form of tree traversal all tree-iterators must provide.

 From the initial root iterator one could get SiblingIterators for 
iteration over its children.

Depending on the underlying node, both iterators could actually be the 
same. This could be inspected through separate traits classes.
...
In short, although I'd like my concepts for trees to allow users to
specify and access the sequences representing the children of a node,
my implementation of this is proving very clumsy. So I'm wondering two
things:
1. Is this a good idea in the first place? Are there really any
compelling use cases, where you'd want one tree to store its node
structure in vectors and another in lists? Or is this a case where a
policy-based approach is misplaced effort?
In my design I planned to use a fixed size array for n-ary trees and a 
single or double-linked list for general trees.
...
2. If this actually is a good idea (which I'm starting to get the
feeling is not the case), my experience so far is that there's some
significant overhead -- both in terms of the size of the structures,
and in terms of speed (extra pointer traversals when doing anything
with iterators). Does anybody have any brilliant ideas about how to
minimize this overhead?
Right! My idea was to write the basic single and double-linked lists 
operations from scratch. We don't be need the whole std::list interface 
for the nodes anyway.

João

[boost] Re: boost::tree advice?

Joao Abecasis