
On Sun, Dec 4, 2011 at 10:43 AM, Nathan Ridge <zeratul976@hotmail.com>wrote:
* Question #1: why augmenting on B+ trees? The augmenting idea can be as easily built on top of RB-trees, which are the weapon of choice for implementing associative containers. Unless proved otherwise, RB-trees are assumed to be faster than B+ trees for in-memory structures (which your own tests hint at, cf. "insert N random elements" table.)
Could a B+ tree library be a basis for implementing disk-backed B+ trees? I think that would be useful. In a clever implementation the memory-backed and disk-backed B+ tree might even be the same class, with the difference factored out into a policy template parameter.
Regards, Nate
I understood this comment is about adding persistence or serialization support to the proposed B+ trees. Please correct if this is not the case. There are two levels of complexity in this area. A trivial one is just saving and reading data as a list, since these B+ trees are build on top of doubly linked lists. This might be sufficient for many practical applications, but certainly not for all. Hence there is the second and less trivial problem of saving and reading operations which preserve topology of these B+ trees that are designed to work in RAM. (Note: "topology" has been used here as a state of connectivity of nodes with data stored in the nodes). In principle, the problem can be solved for the submitted B+ trees, but I will need some Boost specific requirements to implement this feature.
What I mean by a disk-backed B+ tree is a B+ tree where only the nodes that are needed at a given time are loaded into memory from disk - the entire tree on disk might be too large to even fit into memory.
Thank you for this clarification. This is certainly the third level of complexity, this is why below only ideas about possible solutions. Assuming that problem #2 has been already solved the reading operation can be implemented through reading a sub-tree from the highest internal node which represents the root node of the sub-tree. Direct writing back the same sub-tree, which has been modified while working in RAM, is not allowed, since it can destroy invariants of the whole structure saved on disk. It seems to me the most reasonable option is to implement writing as efficient splice operations (in theory they have logarithmic cost), which are applied to the structure stored on disk. I know the efficient splice operations are possible for B+ trees in RAM, but I am not sure if these algorithms are also suitable for disk systems.
Hope this quick consideration makes some useful sense.
How is it done by relational database indexes? Aren't those commonly implemented as disk-backed B+ trees?
The proposed B+ trees have not been designed for these applications. They
are implemented as dynamically allocated data structures, such as linked lists and red black trees. One node stores only one element. Thus, they do not store data contiguously in blocks (also named nodes!) as B+ trees for external storage. For more details please have a look at the section: http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/bp_trees.html Please send me a reference if you know methods of disk backing for dynamically allocated B+ trees. This can be useful in future development. Regards, Vadim