[boost] Re: Boost library inclusion: generic tree containers?

28 Feb 2005

      Justin Gottschlich wrote:
...
"David B. Held" <dheld@codelogicconsulting.com> wrote in message 
news:cvqlqp$fnk$1@sea.gmane.org...
[...]
...
Second, trees are complicated because there are two sets of
iterators that need to be addressed.  First is the iterator
over objects in the root, and second is the iterator over
children.  However, they are two separate kinds of iterators.
Well, I think this depends on how the tree's structure is formed.
I think it's possible to achieve the tree building with a single
iterator type and then add additional iterators as needed. The
reason I say this is that if you consider the initial tree object
as the "root node" (owning one of the contained objects of the
tree), you should only need a single iterator type since by
definition a tree can only have one root node. Since you wouldn't 
need to iterator over the objects contained within the root level
as by definition, they'd be children of the root node. At least,
this is how I handled this problem in my tree container.
[...]
It sounds to me like you are not considering the general case
where a given node can have K keys and N children.  Now typically,
K = 1, which is why I made it the default in the sample code.  That
is what you are talking about.  But for a Btree, you often have
K = N - 1.  That is, each node contains K "objects" or "keys",
and N children.  Yes, you could provide a single iterator type
that iterates over keys both *within* a node and *among* nodes,
but it would be a fat iterator (at least algorithmically, if not
structurally).  I think a better decomposition is to recognize that
for trees with large aggregate nodes, the fundamental operations
are to access the keys within the node, and access the children
separately.

To illustrate, let me offer an exaggeration.  Suppose we build a
Btree for use in a disk-based container.  We could build it so that
K = 64, N = 4.  That is, each node contains 64 "keys", but only
4 children.  There are numerous ways to distribute keys and children
in such a configuration, and there is no a priori reason for an
implementation to exclude the possibility of any given distribution.
So such a tree would have a narrow branching factor, but fat nodes.
That might better correspond to the size of disk sectors while
not wasting too much space on a wide branching factor.

While it may seem wasteful to treat the key set for a node as an
array, I suspect that for the degenerate cases where K = 1, the
array access key_[0] will get optimized at compile time (and we
could certainly provide convenience functions that aid in this
optimization).

Dave