boost::tree advice?

Hi all, Over the past few weeks I've been slowly working on (what I hope to be) a candidate for a boost::tree. I've reviewed the comments on previous submissions by Kasper Peeters (http://www.damtp.cam.ac.uk/user/kp229/tree) and Justin Gottschlich (http://nodeka.com/TreePart1.pdf), and I'm trying to put together a set of concepts to address some of the issues that were discussed here. Of the two "how to design a generic tree container" camps, I'm finding myself mostly in the one where individual nodes are not exposed to users, and the tree is an opaque structure. Also, nodes are not identical to iterators, and the iterator concept is not extended in any significant way from the standard. (Anybody who would like more detail, I'm happy to provide my draft of the concepts, but I don't want to get into that detail here -- it's not particularly relevant to my questions that follow.) I feel like I've been making decent progress so far, mostly working on paper, just documenting the concepts, and doing some coding exercises for proofs of concept. But I don't have anything I'd consider ready for primetime yet -- not even remotely -- and in fact as I progress more solidly into the coding phase, I'm finding myself getting somewhat mired in the details. So I thought I'd solicit opinions from experienced Boosters here on some of the issues I'm trying to work through. I'm primarily finding myself conflicted about what degree of flexibility the concepts and their concrete implementations need to offer. I thought it would be useful to allow clients of tree<> to be able to specify the container type for a node's children. In the code I'm currently playing with, I have a couple "selector" structs, basically identical to Boost Graph Library's vecS, listS, etc, and the tree<> class template takes one of these selectors as a template parameter. That much is all well and good, but I quickly run into trouble once I start dealing with nodes and iterators. First of all, I'd like to provide access to the children of a given node as a sequence of the type specified by the client. So if you have an iterator to an element in the tree (the structure of the tree being otherwise opaque), I want to be able to say something like: tree<foo, list_selector> t; tree<foo, list_selector>::some_kind_of_iterator i; // assume t contains something useful and i points to an element in t std::list<foo> children = t.children(i); At this point, I've got a problem: internal to the tree<>, the lists representing child sequences contain not instances of type "foo" but instances of something like node<foo>. So I can't simply provide access like this unless I can guarantee that my nodes are convertible in all contexts to their stored data. That seems like a rat hole. Another thing I'm stuck on is, even if I don't expose the container structure like this, I still have a problem with iterators. The above example omits the actual iterator type, but to get into some specifics, one type of iterator I'd like to provide is a pre-order traversal iterator. In my current code, this iterator is basically just a wrapper for iterators into the flat sequences I've been instructed to use, and some minimal code that knows how to jump between these sequences appropriately (for example: if a node has children, operator++ yields the first child of that node). It's easy to get an iterator into the sequence containing a node's children (internally, it's just something like node.children.begin()), but it's not so easy to get an iterator into the sequence containing a node's parent, unless a node contains not a pointer to its parent but an iterator instead. And that scheme breaks down for children of the root node, which is not in any sequence, as it has no siblings. I'm sure there are other options for how to implement the node structure and iterator relationships, but none jump out at me. In short, although I'd like my concepts for trees to allow users to specify and access the sequences representing the children of a node, my implementation of this is proving very clumsy. So I'm wondering two things: 1. Is this a good idea in the first place? Are there really any compelling use cases, where you'd want one tree to store its node structure in vectors and another in lists? Or is this a case where a policy-based approach is misplaced effort? 2. If this actually is a good idea (which I'm starting to get the feeling is not the case), my experience so far is that there's some significant overhead -- both in terms of the size of the structures, and in terms of speed (extra pointer traversals when doing anything with iterators). Does anybody have any brilliant ideas about how to minimize this overhead? If these questions are too vague, or lacking in context, I'm more than willing to provide the rough draft I've got of the structural concepts, and the very incomplete code (such as it is). Thanks in advance for your expert advice! dr

Hi Dan! Even though I'm not an "experienced booster" nor do I consider this to be "expert advice"... I've spent some time playing with designs and implementations for a generic tree framework and hope you'll find my comments useful. Dan Rosen wrote:
Of the two "how to design a generic tree container" camps, I'm finding myself mostly in the one where individual nodes are not exposed to users, and the tree is an opaque structure.
In my design nodes were made an implementation detail but I also attempted to define some node concepts to allow advanced users to plug their own node types into pre-defined trees. All pre-defined trees would take a TreeNode parameter which could (should!) be ignored for most use cases.
Also, nodes are not identical to iterators, and the iterator concept is not extended in any significant way from the standard. (Anybody who would like more detail, I'm happy to provide my draft of the concepts, but I don't want to get into that detail here -- it's not particularly relevant to my questions that follow.)
I'd be interested in looking at the details, however draft they are (Feel free to mail them privately).
I feel like I've been making decent progress so far, mostly working on paper, just documenting the concepts, and doing some coding exercises for proofs of concept.
This is where I am ;-)
But I don't have anything I'd consider ready for primetime yet -- not even remotely -- and in fact as I progress more solidly into the coding phase, I'm finding myself getting somewhat mired in the details. So I thought I'd solicit opinions from experienced Boosters here on some of the issues I'm trying to work through.
I'm primarily finding myself conflicted about what degree of flexibility the concepts and their concrete implementations need to offer. I thought it would be useful to allow clients of tree<> to be able to specify the container type for a node's children. In the code I'm currently playing with, I have a couple "selector" structs, basically identical to Boost Graph Library's vecS, listS, etc, and the tree<> class template takes one of these selectors as a template parameter. That much is all well and good, but I quickly run into trouble once I start dealing with nodes and iterators.
I'm not sure if you're referring to the standard containers here. I initially considered using those, after considering allocator issues and the (potentially) large per-node overhead I almost completely dropped the idea. Right now, I'm pondering on a node concept based on boost::range, plus some free functions (insert, update, erase...) that would allow greater flexibility and extensibility, with no overhead.
First of all, I'd like to provide access to the children of a given node as a sequence of the type specified by the client. So if you have an iterator to an element in the tree (the structure of the tree being otherwise opaque), I want to be able to say something like:
tree<foo, list_selector> t; tree<foo, list_selector>::some_kind_of_iterator i; // assume t contains something useful and i points to an element in t std::list<foo> children = t.children(i);
Personally, I prefer a tree-iterator concept that allows traversal of the tree without needing to carry the tree around. Having children as a member function of the tree prevents this.
At this point, I've got a problem: internal to the tree<>, the lists representing child sequences contain not instances of type "foo" but instances of something like node<foo>. So I can't simply provide access like this unless I can guarantee that my nodes are convertible in all contexts to their stored data. That seems like a rat hole.
That's one of the reasons I decided to define a TreeIterator concept that overlaps with the standard Iterator concept when it comes to sibling traversal.
Another thing I'm stuck on is, even if I don't expose the container structure like this, I still have a problem with iterators. The above example omits the actual iterator type, but to get into some specifics, one type of iterator I'd like to provide is a pre-order traversal iterator.
On the thread Justin Gottschlich started on this issue (http://tinyurl.com/44e3u), I suggested pre-order and other linearizations of a tree could be provided externally. I think putting these on the tree unnecessarily complicates the design of the tree and the iterators.
In my current code, this iterator is basically just a wrapper for iterators into the flat sequences I've been instructed to use, and some minimal code that knows how to jump between these sequences appropriately (for example: if a node has children, operator++ yields the first child of that node). It's easy to get an iterator into the sequence containing a node's children (internally, it's just something like node.children.begin()), but it's not so easy to get an iterator into the sequence containing a node's parent, unless a node contains not a pointer to its parent but an iterator instead. And that scheme breaks down for children of the root node, which is not in any sequence, as it has no siblings. I'm sure there are other options for how to implement the node structure and iterator relationships, but none jump out at me.
I solved parent/root issue by having the tree class promise only the most basic TreeIterator: for traversal from parent to child. I think this is the basic form of tree traversal all tree-iterators must provide. From the initial root iterator one could get SiblingIterators for iteration over its children. Depending on the underlying node, both iterators could actually be the same. This could be inspected through separate traits classes.
In short, although I'd like my concepts for trees to allow users to specify and access the sequences representing the children of a node, my implementation of this is proving very clumsy. So I'm wondering two things:
1. Is this a good idea in the first place? Are there really any compelling use cases, where you'd want one tree to store its node structure in vectors and another in lists? Or is this a case where a policy-based approach is misplaced effort?
In my design I planned to use a fixed size array for n-ary trees and a single or double-linked list for general trees.
2. If this actually is a good idea (which I'm starting to get the feeling is not the case), my experience so far is that there's some significant overhead -- both in terms of the size of the structures, and in terms of speed (extra pointer traversals when doing anything with iterators). Does anybody have any brilliant ideas about how to minimize this overhead?
Right! My idea was to write the basic single and double-linked lists operations from scratch. We don't be need the whole std::list interface for the nodes anyway. João

Hi João,
In my design nodes were made an implementation detail but I also attempted to define some node concepts to allow advanced users to plug their own node types into pre-defined trees.
I'm curious what one would do with that ability ... what kind of use cases were you picturing? I've been attempting not to expose nodes in the interface, but maybe there's a reason why I should?
I'd be interested in looking at the details, however draft they are (Feel free to mail them privately).
Actually, I shouldn't be such a tease. I'll try to get the document semi-workable and post it here when I get back from work this evening.
I have a couple "selector" structs, basically identical to Boost Graph Library's vecS, listS, etc, and the tree<> class template takes one of these selectors as a template parameter.
I'm not sure if you're referring to the standard containers here. I initially considered using those, after considering allocator issues and the (potentially) large per-node overhead I almost completely dropped the idea.
Well, vecS and listS do refer specifically to the standard containers, but my idea was that you could plug any container conforming to the standard Sequence concept into the tree. But yes, the overhead is one of the aspects of this approach that's been plaguing me.
Right now, I'm pondering on a node concept based on boost::range, plus some free functions (insert, update, erase...) that would allow greater flexibility and extensibility, with no overhead.
I haven't looked at boost::range at all. I'll check that out!
Personally, I prefer a tree-iterator concept that allows traversal of the tree without needing to carry the tree around. Having children as a member function of the tree prevents this.
Oh, I think I was unclear, sorry. I wanted to provide children() as a convenience, rather than as a necessary method for traversal and mutation of the tree. In lieu of having a document prepared, let me give you a little more detail on my basic idea for the concepts behind the tree. The basic building blocks are what I've been calling a "Multiply Traversable Container" and a "Traversal", which go hand in hand. The notion is that if there is no single, canonical traversal of a container from beginning to end (unlike the present standard containers which are all "flat"), I still want to be able to ask for a begin() and an end() and use standard algorithms on an iterator range. So iterators and member functions such as begin() and end() are parametrized on a traversal policy. In the case of the Tree concept, I've provided the obvious ones: X::pre_order_traversal X::post_order_traversal X::depth_first_traversal X::breadth_first_traversal X::sibling_traversal X::first_descendant_traversal X::last_descendant_traversal The first four traverse the whole tree, whereas the second three may traverse only a subset of the tree. One outcome of having multiple potential traversals is that, although it still makes sense for a Multiply Traversable Container to be EqualityComparable, it no longer makes sense to be LessThanComparable. So I also thought, in the event that you actually want to compare two trees (or other multiply-traversable containers) via operator <, I could provide a little wrapper utility (I've called this "single_traversal") modelling the standard Container concept, which hangs onto a traversal policy and gives you only a single iterator type. Anyway, I'm omitting a lot of detail here (Multiply Traversable Container is split up into Forward, Reversable and Random Access variants, and I also have a separate Binary Tree concept, which provides in-order iterators, etc.), but hopefully this gives you a better basic idea of what I'm going for.
That's one of the reasons I decided to define a TreeIterator concept that overlaps with the standard Iterator concept when it comes to sibling traversal. [snip] On the thread Justin Gottschlich started on this issue (http://tinyurl.com/44e3u), I suggested pre-order and other linearizations of a tree could be provided externally. I think putting these on the tree unnecessarily complicates the design of the tree and the iterators.
So, hopefully my comments above illustrate my opinions on this. I think the minimum set of iterators necessary to build all other tree iterators on (with reasonable efficiency) is first_descendant_iterator, last_descendant_iterator, and sibling_iterator. If I understand you correctly, I think that jives with what you're saying, and seems to be in rough agreement with the general consensus in Justin's thread. But I think it's a good thing to have the full set of traversals in the tree as well, even if for no other reason than for consistency with those three "fundamental" traversals.
I solved parent/root issue by having the tree class promise only the most basic TreeIterator: for traversal from parent to child. [snip] From the initial root iterator one could get SiblingIterators for iteration over its children.
So, from your description, I'm picturing two things: - A TreeIterator is an extension of the standard iterator concept that provides an additional method, something like children(), which returns a SiblingIterator, to provide access to a node's children. - The only way to get a SiblingIterator is via a TreeIterator's children() method. I'm not sure if this is what you were implying or not. But in my own design, I was thinking two things: first, my iterators do not extend the standard Iterator concept, and second, all iterator types should be convertible to each other. For example, if I'm doing a pre-order traversal of a tree -- maybe as a means of searching -- and I find an interesting element, I might want to convert my iterator to a sibling iterator to deal with the siblings of the element I've found. That implies that even for the root of the tree, I should be able to get a sibling iterator to point to it (though it would have no previous or next siblings).
Depending on the underlying node, both iterators could actually be the same. This could be inspected through separate traits classes.
Hmm, this gets back to the question about providing a node concept and interface. I'm having trouble picturing what you're describing here.
My idea was to write the basic single and double-linked lists operations from scratch. We don't be need the whole std::list interface for the nodes anyway.
I think I tend to agree now. I'd like to be able to plug in arbitrary sequences (including std::list) to represent a node's children, but that seems to come encumbered with too many penalties. On the other hand, I'm still soliciting clever approaches to this problem, and I'm still curious whether (despite having given up on it) you think it's a good idea conceptually. I think at this point (lacking a decent solution for the pluggable child sequence problem) if I were to provide a tree::children() method to expose a child sequence, the sequence returned still should conform to the standard Sequence concept, but needn't literally be a std::vector, std::list, or whatever. Thanks again for your insight, dr

It is a big problem to specify a fixed interface and implementation for something diverse like a tree. There are so many aspects and wishes that could be fulfilled. Jeff Mirwaisi bypassed that issue by extending the policy based approach. Since Jeff is moving soon, and has no time to follow the discussion I try to discuss in his name. Thus I uploaded treelib.tar.bz2 to the sf.net vault at. http://boost-sandbox.sourceforge.net/vault/index.php The tree type is composed out of a set of feature. A feature is a set of fragment, or a single fragment. The feature type list is specified by the user. `A fragment is a binary meta function that produces a step-wise refinement of a base type by adding, hiding,overloading or overriding member data and member functions resulting in an augmented version of the base.' [JM05]. The user is allowed to add his own fragments, so it is possible to change the internals even after the library has been ``shipped'' to the user. That is also the main difference to policy based design, which would define a fixed set of extension points. The library in its current state is not yet complete, erase methods, and proper rebalancing of the different tree variants is missing. Regards, Andreas Pokorny [JM05] gen - FOP Utility Library - Documentation of the feature oriented programming utility library, which is currently part of the treelib source code.

Andreas, Sounds like a very different paradigm. I'll have a look, and hope to reply in detail later this evening. Thanks, dr On Mon, 28 Mar 2005 21:35:30 +0200, Andreas Pokorny <andreas.pokorny@gmx.de> wrote:
[JM05] gen - FOP Utility Library - Documentation of the feature oriented programming utility library, which is currently part of the treelib source code.

Hi again, I had the chance to review Jeff's treelib docs and code this evening. I have a basic understanding of template metaprogramming, but no real experience with it. So I can't comment in much detail on Jeff's design and implementation, other than to say that the "feature-oriented programming" technique seems like a very interesting approach to design in general (though I imagine the paradigm might be more elegant if implemented as a brand new language). The one thing of substance I can say after having read through treelib is that it both solves some problems I believe don't need solving, and fails to solve some problems that I believe do. I'll address one at a time. First, problems that don't need solving. The tree design goes out of its way to provide an elegant system of mix-and-match descriptors. Four of these descriptor categories are key, order, balance and search optimization. I believe these four to be unnecessary, for the following reasons: - Specifying a sort ordering for the tree other than what the library refers to as "oriented" (i.e. elements stay where you put them) implies that you care not about the tree structure itself, but the strict ordering of elements in the tree. This need is already met by std::set and std::multiset (or in concept terms, "Sorted Associative Container"). - Specifying a balance or a search optimization for the tree (if I understand these concepts correctly) implies that you care not about the relative position of elements in the tree, but the algorithmic complexity of operations in the tree (such as insert, remove, find, etc.). This need is also already met by Sorted Associative Container. - Specifying a key type separate from value type could maybe be construed to imply that you're using the tree primarily as a fast search mechanism. I'm less confident about this assertion, but it seems like client's uses might well be met by std::map here. In short, my belief is that programmers only want to use a tree when it's the tree structure in particular that's important to them. Otherwise they will go with something like std::map, which may be a facade to a red-black tree, but offers a simpler interface isolated from implementation details. So eliminating these four descriptors leaves behind -- at least in the documentation -- "primary" (i.e. binary tree, k-ary tree, multi-way tree), value type, and allocation descriptors. But this is a manageable number of axes to represent using the more traditional model of template programming exemplified by the C++ standard library. Or, more concretely, it's reasonable with only these three axes to provide a few tree types (tree<>, binary_tree<>, k_ary_tree<>) parametrized on value and allocator type. Now, on to problems that do need solving. To restate my assumptions, I believe the most significant use case for trees is direct manipulation and traversal of the tree structure. For example, my personal interest in seeing a good generic tree container implemented is so I can represent turn-based games (where the data stored at each node is a player's turn, and players can undo turns or explore variations when reviewing games). Part of the value that I think would be provided by a well-designed tree structure is a set of concepts that can operate nicely with standard algorithms, such as std::find(), std::remove_if(), etc. Those needs don't seem to be a primary concern for treelib. Of course, that's not to say that a feature-oriented design and interoperability on the "concept" level with standard algorithms are mutually exclusive, and I also recognize that treelib is a proof of concept illustrating the successful application of a fairly radical approach to design. My only complaint is that treelib's area of focus seems orthogonal to the problems I'm hoping to address. If you think I might be glossing over or missing some important, applicable points in treelib, or if you disagree with my assessment, please do let me know! At this point, I'm a willing student. What I hope to get out of this exercise are learning, first and foremost, and a boost::tree to benefit everybody. Thanks very much for your help! dr On Mon, 28 Mar 2005 21:35:30 +0200, Andreas Pokorny <andreas.pokorny@gmx.de> wrote:
http://boost-sandbox.sourceforge.net/vault/index.php
The tree type is composed out of a set of feature. A feature is a set of fragment, or a single fragment. The feature type list is specified by the user. `A fragment is a binary meta function that produces a step-wise refinement of a base type by adding, hiding,overloading or overriding member data and member functions resulting in an augmented version of the base.' [JM05]. The user is allowed to add his own fragments, so it is possible to change the internals even after the library has been ``shipped'' to the user. That is also the main difference to policy based design, which would define a fixed set of extension points.

On Mon, Mar 28, 2005 at 10:35:36PM -0800, Dan Rosen <dan.rosen@gmail.com> wrote:
- Specifying a sort ordering for the tree other than what the library refers to as "oriented" (i.e. elements stay where you put them) implies that you care not about the tree structure itself, but the strict ordering of elements in the tree. This need is already met by std::set and std::multiset (or in concept terms, "Sorted Associative Container").
- Specifying a balance or a search optimization for the tree (if I understand these concepts correctly) implies that you care not about the relative position of elements in the tree, but the algorithmic complexity of operations in the tree (such as insert, remove, find, etc.). This need is also already met by Sorted Associative Container.
- Specifying a key type separate from value type could maybe be construed to imply that you're using the tree primarily as a fast search mechanism. I'm less confident about this assertion, but it seems like client's uses might well be met by std::map here.
Well, From my point of view these three aspects are important, I already had uses for that in my applications. E.g. sometimes std::set could be improved by if you know that the key space is restricted. Then I would appreciate the fact that such a library exists, allowing me to reuse lots of code, and just adding my special optimization. My point in this case is, that you do not have to use these features. You can just not specify them. From what I know from Jeff, he implemented these features because the untangled tree template paper by Matt Austern, Stroustrup et al. included them too.
Now, on to problems that do need solving. To restate my assumptions, I believe the most significant use case for trees is direct manipulation and traversal of the tree structure. For example, my personal interest in seeing a good generic tree container implemented is so I can represent turn-based games (where the data stored at each node is a player's turn, and players can undo turns or explore variations when reviewing games). Part of the value that I think would be provided by a well-designed tree structure is a set of concepts that can operate nicely with standard algorithms, such as std::find(), std::remove_if(), etc.
So you are here talking about a tree iterator that represents the contents as an ordered sequence (sequence in terms of being accessible by calling ++it)? In the treelib that would be a: struct depth_first { templat<typename,typename Base> struct appy : public Base { struct depth_first_iterator : public boost::iterator_facade<...> { ... }; } }; or breadth first, or maybe some templated fragments which provides a dynamic iteration concept. Maybe I misconceive your needs, but I believe that these problems are solvable with the design of the treelib.
Those needs don't seem to be a primary concern for treelib. Of course, that's not to say that a feature-oriented design and interoperability on the "concept" level with standard algorithms are mutually exclusive, and I also recognize that treelib is a proof of concept illustrating the successful application of a fairly radical approach to design. My only complaint is that treelib's area of focus seems orthogonal to the problems I'm hoping to address.
I fully agree with you! Due to the orthognal views, I beliebe a combination of both efforts could create a really versatile overhead free tree library. Even though you do not need the features cited above, there are people that would like to use them, and there is still more that can be done with the library. Lets say someone wants to serializer a big tree into a file, but not keep the full tree in memory. By defining a link_type feature, that turns the previous link_type into a lazy pointer, which would get the node structure from a file using boost::serialization, one could reuse lots of code, provided that the other features do not make assumptions that break with the new link_type. So in general I did not want to imply that the treelib solves your problems, but that it provides a flexibility that I would really like to see in boost::tree. Your points are valid, these issues (missing iteration concept, or algorithm concept) need to be addressed, and solved. Regards Andreas Pokorny

Hi Andreas, Sorry for the delayed reply.
Well, From my point of view these three aspects are important, I already had uses for that in my applications. E.g. sometimes std::set could be improved by if you know that the key space is restricted. Then I would appreciate the fact that such a library exists, allowing me to reuse lots of code, and just adding my special optimization.
Good point. I think there are a large set of uses I wasn't considering, where the user of this tree is not writing some arbitrary domain-specific code, but instead is writing a general container! For example, I wasn't considering using this tree as a basis for implementing std::set. I am kind of on the fence as to whether I should consider this type of scenario or not. It certainly seems appealing in some ways, but I have to admit, I don't have even remotely enough experience with template metaprogramming to have an intuition for the drawbacks. Would you mind writing up a little bit more description of the advantages/disadvantages of Jeff's approach? I'd really benefit from having a better understanding there.
... the untangled tree template paper by Matt Austern, Stroustrup et al. included them too.
Where can I find this?
Due to the orthognal views, I beliebe a combination of both efforts could create a really versatile overhead free tree library. Even though you do not need the features cited above, there are people that would like to use them, and there is still more that can be done with the library.
You may be convincing me :) My feeling was that the feature-oriented approach seems like a lot of trouble to go to, to support a fairly homogeneous set of uses, but if we consider supporting container implementers in addition to general users, the uses become far more diverse. Additionally, your scenario here:
Lets say someone wants to serializer a big tree into a file, but not keep the full tree in memory.
... convinces me that the distinction between users with "specialized" needs (such as container implementors) and the average user is not so black-and-white. I'll keep working through Jeff's code to try to understand it better. If you wouldn't mind, could you please send me a link or two to any other documents you think would help me put this code in context? Thanks very much! dr

Good point. I think there are a large set of uses I wasn't considering, ...
From reading this thread and the tree thread a while back I feel perhaps there should be "Boost Tree Libraries", rather than "The Boost Tree Library". Even if you can find a design that satisfies everyone it'll probably take too long to compile or be too difficult to use.
My main current need for a good tree class is for merging large numbers of game trees then being able to analyze them in various ways. So I've understood your points better than I have other's :-). Darren

I'd be interested in looking at the details, however draft they are (Feel free to mail them privately).
Actually, I shouldn't be such a tease. I'll try to get the document semi-workable and post it here when I get back from work this evening.
Okay, as promised, I have cleaned up my document a bit and am posting it here (see the .txt file attached; beware UNIX line-endings). It has huge, gaping holes but I hope it illustrates at least the direction I was trying to go with my tree. Any comments would be much appreciated. In the meantime, I'll go back over Justin's thread from last month again, and see if it gives me any ideas. I'm hoping that some of the folks who contributed to that thread (Thorsten, Justin, Beman, etc.) will take an interest in this one as well. Cheers, dr

Dan Rosen wrote:
In my design nodes were made an implementation detail but I also attempted to define some node concepts to allow advanced users to plug their own node types into pre-defined trees.
I'm curious what one would do with that ability ... what kind of use cases were you picturing? I've been attempting not to expose nodes in the interface, but maybe there's a reason why I should?
One possibility would be to define a node that queries another data source for the data and children (e.g., a database). This means our tree would be like a (possibly mutating) view. I also like the idea of interchangeable node types and from here to defining and exposing the concept it's only a short walk. For example, where space constraints are important a node could have two pointers: * next sibling * first child Where one will be constantly going up and down the tree, back and forth between siblings, the gains from keeping the extra previous sibling and parent are important. In the end, there are many possible node designs, and I can't say that one size fits all (or which, for that matter), so I planned to implement a few that I might want and let the advanced user write his own. Having a defined interface for nodes also means that I can start by implementing simple, less-than-perfect nodes and later on (when I've got things working...) implement other variations.
Personally, I prefer a tree-iterator concept that allows traversal of the tree without needing to carry the tree around. Having children as a member function of the tree prevents this.
Oh, I think I was unclear, sorry. I wanted to provide children() as a convenience, rather than as a necessary method for traversal and mutation of the tree. In lieu of having a document prepared, let me give you a little more detail on my basic idea for the concepts behind the tree.
The basic building blocks are what I've been calling a "Multiply Traversable Container" and a "Traversal", which go hand in hand. The notion is that if there is no single, canonical traversal of a container from beginning to end (unlike the present standard containers which are all "flat"), I still want to be able to ask for a begin() and an end() and use standard algorithms on an iterator range. So iterators and member functions such as begin() and end() are parametrized on a traversal policy. In the case of the Tree concept, I've provided the obvious ones:
X::pre_order_traversal X::post_order_traversal X::depth_first_traversal X::breadth_first_traversal X::sibling_traversal X::first_descendant_traversal X::last_descendant_traversal
Here, I would leave the pre- and post-order traversals outside of the tree, but we can agree to disagree ;) My rationale is that given the traversal categories of tree iterators, the pre- and post- order can be provided in a general external way. Keeping them out of the tree would keep the tree simpler for me. There are applications that don't require the "linearization" of the tree. As for breadth- and depth-first and I was hoping to leave them to the BGL. I'd say they're really graph algorithms.
The first four traverse the whole tree, whereas the second three may traverse only a subset of the tree.
One outcome of having multiple potential traversals is that, although it still makes sense for a Multiply Traversable Container to be EqualityComparable, it no longer makes sense to be LessThanComparable. So I also thought, in the event that you actually want to compare two trees (or other multiply-traversable containers) via operator <, I could provide a little wrapper utility (I've called this "single_traversal") modelling the standard Container concept, which hangs onto a traversal policy and gives you only a single iterator type.
I'll admit I didn't consider the possibility of having the tree be LessThanComparable. But I think we could agree on having the default behaviour to be a recursive node-wise comparison. This would be similar to in-order, but structure would matter for the comparison. Anyway, anyone is free to define their own comparison function for specialized purposes, so I wouldn't bother too much with this.
Anyway, I'm omitting a lot of detail here (Multiply Traversable Container is split up into Forward, Reversable and Random Access variants, and I also have a separate Binary Tree concept, which provides in-order iterators, etc.), but hopefully this gives you a better basic idea of what I'm going for.
Yes.
That's one of the reasons I decided to define a TreeIterator concept that overlaps with the standard Iterator concept when it comes to sibling traversal. [snip] On the thread Justin Gottschlich started on this issue (http://tinyurl.com/44e3u), I suggested pre-order and other linearizations of a tree could be provided externally. I think putting these on the tree unnecessarily complicates the design of the tree and the iterators.
So, hopefully my comments above illustrate my opinions on this. I think the minimum set of iterators necessary to build all other tree iterators on (with reasonable efficiency) is first_descendant_iterator, last_descendant_iterator, and sibling_iterator. If I understand you correctly, I think that jives with what you're saying, and seems to be in rough agreement with the general consensus in Justin's thread. But I think it's a good thing to have the full set of traversals in the tree as well, even if for no other reason than for consistency with those three "fundamental" traversals.
I'm not convinced those are "fundamental" traversals. I can imagine many cases where all you need is basic tree-traversal: parent to children. Different traversals can be provided outside the tre based on the operations its TreeIterators can provide.
I solved parent/root issue by having the tree class promise only the most basic TreeIterator: for traversal from parent to child. [snip] From the initial root iterator one could get SiblingIterators for iteration over its children.
So, from your description, I'm picturing two things:
- A TreeIterator is an extension of the standard iterator concept that provides an additional method, something like children(), which returns a SiblingIterator, to provide access to a node's children.
I had child_begin, child_end, ascend, descend (overloaded).
- The only way to get a SiblingIterator is via a TreeIterator's children() method.
Something like that.
I'm not sure if this is what you were implying or not. But in my own design, I was thinking two things: first, my iterators do not extend the standard Iterator concept, and second, all iterator types should be convertible to each other.
For the same reason that led me to split the iterators, a SiblingIterator might always be convertible to a TreeIterator, but the reverse wouldn't always be true.
For example, if I'm doing a pre-order traversal of a tree -- maybe as a means of searching -- and I find an interesting element, I might want to convert my iterator to a sibling iterator to deal with the siblings of the element I've found. That implies that even for the root of the tree, I should be able to get a sibling iterator to point to it (though it would have no previous or next siblings).
In general, I think this cannot be provided without some ugly hacks. Consider for example the following node for an n-ary tree: struct node { // node * parent; children_type children; value_type data; }; children_type is some kind of Sequence (or Range). Here, a sibling iterator would be an adaptor on top of the iterator provided by children_type. Because the root node is held directly by the tree, we have no way to provide a children_type::iterator pointing to it.
Depending on the underlying node, both iterators could actually be the same. This could be inspected through separate traits classes.
Hmm, this gets back to the question about providing a node concept and interface. I'm having trouble picturing what you're describing here.
Consider this node: struct node { // node * parent; node * previous_sibling, * next_sibling; node * begin_child, * end_child; }; Here, we can provide "sibling traversal" for the root node. So the tree_iterator exposed by the tree could be the same typedef as the sibling_iterator.
My idea was to write the basic single and double-linked lists operations from scratch. We don't be need the whole std::list interface for the nodes anyway.
I think I tend to agree now. I'd like to be able to plug in arbitrary sequences (including std::list) to represent a node's children, but that seems to come encumbered with too many penalties. On the other hand, I'm still soliciting clever approaches to this problem, and I'm still curious whether (despite having given up on it) you think it's a good idea conceptually.
Given there are many possible node designs, providing different size/traversal tradeoffs, I chose to define the Node concept. Anyway, I think the main part for a tree library would be the interface for the tree and its iterators. I can imagine, or rather hope, others would come up with trees conforming to the interface without imposing on them the node concepts.
I think at this point (lacking a decent solution for the pluggable child sequence problem) if I were to provide a tree::children() method to expose a child sequence, the sequence returned still should conform to the standard Sequence concept, but needn't literally be a std::vector, std::list, or whatever.
Right. In general you don't need to directly expose children. Remember this could be kept as a fixed size array. I inclined for child_begin, child_end. Best, João

Hi João, Just wanted to send a brief reply. I think you make some good points about the diversity of possible node designs, and they're reaffirmed by Andreas and David's points as well. I think Andreas has swayed me to explore Jeff Mirwaisi's "feature-oriented" tree ideas some more, and I think your comments are particularly applicable in that context. Have you had a chance to look at the treelib source and docs that Andreas posted? I'd be grateful for your opinions on that, since I personally have very little experience with that style of programming. Cheers, dr On Tue, 29 Mar 2005 10:20:27 +0100, Joao Abecasis <jpabecasis@zmail.pt> wrote:
I also like the idea of interchangeable node types and from here to defining and exposing the concept it's only a short walk. For example, where space constraints are important a node could have two pointers:
* next sibling * first child
Where one will be constantly going up and down the tree, back and forth between siblings, the gains from keeping the extra previous sibling and parent are important.
In the end, there are many possible node designs, and I can't say that one size fits all (or which, for that matter), so I planned to implement a few that I might want and let the advanced user write his own.

Hi Dan! Dan Rosen wrote:
Hi João,
Just wanted to send a brief reply. I think you make some good points about the diversity of possible node designs, and they're reaffirmed by Andreas and David's points as well. I think Andreas has swayed me to explore Jeff Mirwaisi's "feature-oriented" tree ideas some more, and I think your comments are particularly applicable in that context.
Have you had a chance to look at the treelib source and docs that Andreas posted? I'd be grateful for your opinions on that, since I personally have very little experience with that style of programming.
I've downloaded the code but have yet to take an in-depth look. The approach looks interesting, to say the least, and as to the possibilities it seems very powerful. However I don't think we've got to that point of the library design. Well... I haven't, anyway ;-) IMO, wether, in the end, you go for an all-powerful, adaptable, extensible tree that performs optimally for any requirement set and scenario, for many different single-purpose trees, or you pick a mix of both approaches is not really important, right now. Perhaps the most important part of a Boost Tree library will be the set of concepts it endorses. Coming up with a bunch concepts that doesn't limit the implementation possibilities, I think, is the hard part and it is where such a Boost Tree library should begin. Best, João

Dan Rosen wrote:
[...] In short, although I'd like my concepts for trees to allow users to specify and access the sequences representing the children of a node, my implementation of this is proving very clumsy. So I'm wondering two things:
1. Is this a good idea in the first place? Are there really any compelling use cases, where you'd want one tree to store its node structure in vectors and another in lists? Or is this a case where a policy-based approach is misplaced effort? [...]
My personal intuition, not at all tested on real experience, is that you want to make the nodes as thin as possible. Trees are already fat structures by design, and something like std::list as a node type would bloat them beyond all recognition. The only reason I can think of to justify a dynamically sized node is to construct a tree where the size of the nodes varies considerably. But to recoup the cost, this variance would have to be rather significant (I'd say there would have to be a fairly even distribution of nodes having anywhere from 2 to 30 children). This seems to me to be a rather unusual tree that probably won't occur often in practice. The only exception I can think of offhand would be a trie, but that should probably be implemented on its own anyway. Even a vector is not appropriate in my mind, because a fixed-size node should be sufficient in the vast majority of cases. Thus, I would suggest making your node container be an array of fixed size determined by a template parameter. It would be very difficult to justify something more elaborate than this given the speed and simplicity advantages. I think in the cases where someone needs very specialized nodes, it shouldn't be much bother for them to go straight to boost::graph. Dave

Hi Dave,
My personal intuition, not at all tested on real experience, is that you want to make the nodes as thin as possible.
Agreed.
The only reason I can think of to justify a dynamically sized node is to construct a tree where the size of the nodes varies considerably.
Well, the concepts I'm hoping to support, roughly speaking, are binary trees (a node contains at most 2 children), k-ary trees (a node contains at most k children), and trees with no limit on the number of children per node. To say that users who want the third type of tree should just use boost::graph, or that a trie shouldn't be implemented as a tree, seems draconic to me. I think the tree concept should be general-purpose enough to encompass these uses, even if it's embodied by multiple implementations with different space/time tradeoffs. But the question I really wanted to ask in response to your thoughts was, how big do you think is too big? The way I'd think of implementing a node in a binary tree is something like: struct node { node* parent_; node* first_; node* second_; T* value_; }; In a k-ary tree, it would be: struct node { node* parent_; node* children_; // allocated using allocator::allocate(k, 0); T* value_; }; And in a general-purpose tree, it would be: struct node { node* parent_; node* first_child_; node* last_child_; node* prev_sibling_; node* next_sibling_; T* value_; }; Would you consider five pointers to be too much? Cheers, dr

Dan Rosen wrote:
[...]
The only reason I can think of to justify a dynamically sized node is to construct a tree where the size of the nodes varies considerably.
Well, the concepts I'm hoping to support, roughly speaking, are binary trees (a node contains at most 2 children), k-ary trees (a node contains at most k children), and trees with no limit on the number of children per node. To say that users who want the third type of tree should just use boost::graph, or that a trie shouldn't be implemented as a tree, seems draconic to me.
Well, we could implement a single container type with a massive number of policies that implements all the containers in the STL, but is that a good design decision? The points you are talking about are {2, N, oo}. I agree with 2 and N. oo is a different type of beast altogether, and trying to stuff it in a tree-sized box doesn't seem like sound judgement to me. I guarantee that if you tried to implement a trie as a generic tree with std::list nodes, or perhaps even a std::vector, that I would probably not use it. The whole point of a trie is to get super-fast access to strings, and fattening it up with those containers is the antithesis of that goal. Also, I'm not familiar with any use-cases for a tree with an unbounded number of keys per node. Perhaps you could cite some? Just because you can build in the kitchen sink doesn't mean it's a good idea. It's often a better idea to build a component that does one thing well.
I think the tree concept should be general-purpose enough to encompass these uses, even if it's embodied by multiple implementations with different space/time tradeoffs.
I'm not convinced that a trie necessarily has the same interface as a generic tree. And if you have a beast with nodes having an unbounded number of keys, I'm even more skeptical of the consistency of that interface with a generic tree. I imagine such a container would have very special needs that would best be served by a specialized data structure. I can tell you right now that such a tree, in the absence of any further constraints, would have almost no relevant performance guarantees (except possibly that it would be provably easy to find use cases where its performance is terrible).
[...] And in a general-purpose tree, it would be:
struct node { node* parent_; node* first_child_; node* last_child_; node* prev_sibling_; node* next_sibling_; T* value_; };
Would you consider five pointers to be too much?
Yes. I don't see what's wrong with my original suggestion: template <int K = 2, int N = 1> struct node { node* parent_; node* child_[K]; T value_[N]; }; Dave

On Tue, 29 Mar 2005 01:53:44 -0600, David B. Held <dheld@codelogicconsulting.com> wrote:
Also, I'm not familiar with any use-cases for a tree with an unbounded number of keys per node.
I'm getting the feeling we're talking about different things. I'm not suggesting that the tree would have multiple keys per node -- I'm not even sure what you mean by "keys" in this context -- I was saying that a node in a tree may have an unbounded number of *children*.
Would you consider five pointers to be too much?
Yes. I don't see what's wrong with my original suggestion:
template <int K = 2, int N = 1> struct node { node* parent_; node* child_[K]; T value_[N]; };
Well, a couple things: 1. There is no precedent for having an array of values at any given node. In my experience, the precedent in container design is to let clients fully specify the value type (and key type if appropriate). If a client wants a node to contain an array of values, the code indicates that by passing an array as the value type. 2. For the list of children, you've also specified array storage. But what if I want to insert between two children? What if I don't want to allocate K pointers for each node? What if I simply don't want the number of children per node to be bounded? The implementation you suggest here is decent for a k-ary tree, but it does have its inherent restrictions. Simply put, I can't use it in any context where I don't know how many children a given node might have, such as a parse tree or a game tree, hence my "draconian" comment. Anyway, I've posted a more complete document in response to João's post, which I hope will better illustrate my intent. Have a look! Thanks, dr

Dan Rosen wrote:
On Tue, 29 Mar 2005 01:53:44 -0600, David B. Held <dheld@codelogicconsulting.com> wrote:
Also, I'm not familiar with any use-cases for a tree with an unbounded number of keys per node.
I'm getting the feeling we're talking about different things. I'm not suggesting that the tree would have multiple keys per node -- I'm not even sure what you mean by "keys" in this context -- I was saying that a node in a tree may have an unbounded number of *children*.
In general terms, a tree node has N children and K keys. The keys are simply the values stored in that node. For a typical binary tree, K == 1. For a btree, K typically is N - 1. However, there is no reason to disallow configurations that have a different relationship between K and N. I didn't invent this terminology.
[...] Well, a couple things:
1. There is no precedent for having an array of values at any given node.
Apparently, you've never heard of a btree. Perhaps you'd like to brush up on your database implementation theory going back several decades. ;)
In my experience, the precedent in container design is to let clients fully specify the value type (and key type if appropriate). If a client wants a node to contain an array of values, the code indicates that by passing an array as the value type.
Well, there's two things wrong with that. First, that is not the precedent (witness std::map, for which the value_type is std::pair<key_type, mapped_type>). Second, you are confusing the value_type with an implementation detail of the node_type. Setting T = U[N] is entirely different from implementing node_type as T[N]. In the former, the node does not know whether T is an array or not, which implies that each tree node has a single value. If you think that nodes should all have single values, then you not only need to look at btrees (and all its relatives), but also compact tries.
2. For the list of children, you've also specified array storage. But what if I want to insert between two children?
Obviously, you have to shift the keys, if that makes sense for the tree type. But more typically, you would insert the value further down the tree. A tree isn't an array, after all. The arrays in each node merely represent the branching factor of the tree.
What if I don't want to allocate K pointers for each node?
Then set N = K.
What if I simply don't want the number of children per node to be bounded?
Then you obviously need a different design.
The implementation you suggest here is decent for a k-ary tree, but it does have its inherent restrictions.
Obviously. But those restrictions circumscribe a very large class of trees which can be implemented fairly efficiently with the given design. If you were to implement std::map with a tree that had, say, a std::list in each node, there's no way you could convince me to use such a beast. The point being, of course, that if total generality significantly raises the cost of the average case, then it's more expensive than many people will want to pay. Remember one of the guiding principles of C++ design: "Don't make them pay for what they don't use." Most std::map<>s are backed by a simple red-black tree that has three pointers per node. Changing that to five would significantly increase the size of the tree, for absolutely no benefit to std::map users. If the node type itself were parameterized, and defined by an interface rather than a type, then you could simply provide an appropriate node type. But then you end up with something closer to Jeff Mirwaisi's design, which is something you should carefully consider anyway.
Simply put, I can't use it in any context where I don't know how many children a given node might have, such as a parse tree or a game tree, hence my "draconian" comment.
Depends on the game. Tic-tac-toe, for instance, has a rather fixed branching game tree, if you collapse rotations. Connect-4 has a fixed branching factor as well. Chess is obviously a more difficult proposition. But if you're writing a chess program with a general tree type, you get what you deserve. Parse trees typically don't have a large unbounded number of children, though some of the nodes may certainly get arbitrarily large.
Anyway, I've posted a more complete document in response to João's post, which I hope will better illustrate my intent. Have a look!
I will. Dave

David, Let me take a step back. I think we're fundamentally in agreement, but we have very different use cases in mind. I had considered neither btrees nor tries in my initial work on developing general-purpose tree containers, so the examples I provided and the language I used might have made it seem like I didn't *want* to consider them. What I'm doing right now is just gathering requirements and ideas. One requirement is clearly "keep your tree nodes on a diet," which was the slant of my initial questions in this thread (i.e. how to satisfy a variety of needs without making my nodes and iterators obese); I see that as sort of a common-sense implementation-level requirement. I'm also just as interested in functional requirements, and it's clear that having a fast k-ary tree (single and multi-keyed alike) such as you're proposing is one such requirement. Another such requirement is my use case of trees with unbounded children per node. I make no claim that both requirements can be met optimally, or even adequately, by the same tree. Regarding Jeff's design: as I said, I think the "feature-oriented" paradigm seems extremely elegant from the client perspective (generally speaking). But because I'm not yet aware of enough different requirements, I don't yet see a reason to go with that type of implementation. Emphasis on "yet." But through the process of requirements gathering, it may turn out that there are compelling reasons to vary a tree's implementation on a significant enough number of axes to warrant going with that type of design. Let me ask a question which I hope will direct this discussion down a productive path: In what scenarios (as many as you can think of) would users need to be directly exposed to the structure of some sort of tree? This is in contrast to the situations where trees are the sensible implementation (such as std::set, std::map, etc.) but needn't be exposed. If we can put that list together, I think that would be a good start. Best, dr

"David B. Held" <dheld@codelogicconsulting.com> wrote in message news:d2b14r$ekf$1@sea.gmane.org... | Dan Rosen wrote: | > [...] | > And in a general-purpose tree, it would be: | > | > struct node { | > node* parent_; | > node* first_child_; | > node* last_child_; | > node* prev_sibling_; | > node* next_sibling_; | > T* value_; | > }; | > | > Would you consider five pointers to be too much? | | Yes. I don't see what's wrong with my original suggestion: | | template <int K = 2, int N = 1> | struct node | { | node* parent_; | node* child_[K]; | T value_[N]; | }; There is an alternative that doesn't really need a node concept. It goes like this template< class T > class tree { T data_; boost::optional< boost::ptr_vector<tree<T> > > children_; tree* parent_; ... }; for n-ary trees, exchange ptr_vector<> with ptr_array<,N>. -Thorsten

Thorsten Ottosen wrote:
[...] There is an alternative that doesn't really need a node concept. It goes like this
template< class T > class tree { T data_; boost::optional< boost::ptr_vector<tree<T> > > children_; tree* parent_; ... };
for n-ary trees, exchange ptr_vector<> with ptr_array<,N>.
This has "fat" written all over it. First, you have overhead with optional, and second, you have overhead with vector. If the vector conforms to the standard, the capacity will most likely be larger than the size, wasting space, unless you intentionally resize it every time, defeating the performance guarantees. On top of that, you need to store both the size and the capacity in each vector, as well as a pointer to the vector itself. That's three words not counting any of the pointer themselves. Furthermore, the vector is stored on the heap, so there is no possibility of creating a specialized node heap. I find this design to be, err..."less than optimal." Dave

"David B. Held" <dheld@codelogicconsulting.com> wrote in message news:d2cvju$lui$1@sea.gmane.org... | Thorsten Ottosen wrote: | | > [...] | > There is an alternative that doesn't really need a node concept. | > It goes like this | > | > template< class T > | > class tree | > { | > T data_; | > boost::optional< boost::ptr_vector<tree<T> > > children_; | > tree* parent_; | > ... | > }; | > | > for n-ary trees, exchange ptr_vector<> with ptr_array<,N>. | | This has "fat" written all over it. First, you have overhead | with optional, then use the state children_.empty() instead. | and second, you have overhead with vector. If | the vector conforms to the standard, the capacity will most | likely be larger than the size, wasting space, if you need flexibility, you won't get it for free. | unless you | intentionally resize it every time, defeating the performance | guarantees. On top of that, you need to store both the size | and the capacity in each vector, as well as a pointer to the | vector itself. That's three words not counting any of the | pointer themselves. 3 pointers (words) should be enough (first,last,capacity). | Furthermore, the vector is stored on | the heap, so there is no possibility of creating a specialized | node heap. I find this design to be, err..."less than optimal." what do you mean? you can define allocators for everything in a ptr_vector. Speaking of design | template <int K = 2, int N = 1> | struct node | { | node* parent_; | node* child_[K]; | T value_[N]; | }; if you know it is a k-ary tree, then use ptr_array<T,N>. No overhead, no flexibility, but a lot easier to make exception-safe. If you use ptr_array<T,N> in the no-node design, I can't see how it gives any overhead. It also seems to be simpler. -Thorsten
participants (6)
-
Andreas Pokorny
-
Dan Rosen
-
Darren Cook
-
David B. Held
-
Joao Abecasis
-
Thorsten Ottosen