Re: [boost] [stl_ext_adv] ] Interest in advanced data structures

4 Dec 2011

      On Sun, Dec 4, 2011 at 10:43 AM, Nathan Ridge <zeratul976@hotmail.com>wrote:
...
...
...
...
...
...
* Question #1: why augmenting on B+ trees? The augmenting idea
can be as easily built on top of RB-trees, which are
the weapon of choice for implementing associative containers.
Unless proved otherwise, RB-trees are assumed to be faster than
B+
trees for in-memory structures (which your own tests hint at, cf.
"insert N random elements" table.)
Could a B+ tree library be a basis for implementing disk-backed B+
trees? I think that would be useful. In a clever implementation
the memory-backed and disk-backed B+ tree might even be the same
class, with the difference factored out into a policy template
parameter.
Regards,
Nate
I understood this comment is about adding persistence or
serialization
support to the proposed B+ trees. Please correct if this is not the
case.
There are two levels of complexity in this area.
A trivial one is just saving and reading data as a list, since these
B+
trees are build on top of doubly linked lists. This might be
sufficient
for
many practical applications, but certainly not for all.
Hence there is the second and less trivial problem of saving and
reading
operations which preserve topology of these B+ trees that are
designed to
work in RAM. (Note: "topology" has been used here as a state of
connectivity of nodes with data stored in the nodes).
In principle, the problem can be solved for the submitted B+ trees,
but I
will need some Boost specific requirements to implement this feature.
What I mean by a disk-backed B+ tree is a B+ tree where only the nodes
that
are needed at a given time are loaded into memory from disk - the
entire
tree on disk might be too large to even fit into memory.
Thank you for this clarification.
This is certainly the third level of complexity, this is why below only
ideas about possible solutions.
Assuming that problem #2 has been already solved the reading operation
can
be implemented through reading a sub-tree from the highest internal node
which represents the root node of the sub-tree.
Direct writing back the same sub-tree, which has been modified while
working in RAM, is not allowed, since it can destroy invariants of the
whole structure saved on disk. It seems to me the most reasonable option
is
to implement writing as efficient splice operations (in theory they have
logarithmic cost), which are applied to the structure stored on disk. I
know the efficient splice operations are possible for B+ trees in RAM,
but
I am not sure if these algorithms are also suitable for disk systems.
Hope this quick consideration makes some useful sense.
How is it done by relational database indexes? Aren't those commonly
implemented as disk-backed B+ trees?
The proposed B+ trees have not been designed for these applications. They
are implemented as dynamically allocated data structures, such as linked
lists and red black trees. One node stores only one element. Thus, they do
not store data contiguously in blocks (also named nodes!) as B+ trees for
external storage. For more details please have a look at the section:

http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/bp_trees.html

Please send me a reference if you know methods of disk backing for
dynamically allocated B+ trees. This can be useful in future development.

Regards,
Vadim

Re: [boost] [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik