
Vadim Stadnik <vadimstdk <at> gmail.com> writes:
Hi all,
Advanced data structures (ADS) offer useful STL extensions that are not possible or not efficient with basic data structures, such as arrays, linked lists and red-black trees.
The project
https://github.com/vstadnik/stl_ext_adv
http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html
focuses mainly on a generalized and unified computer representation of sequences using the augmented B+ trees. It provides the following STL extensions [...]
Hi, This is a rough study of the functionality provided by your lib in general and as compared with Boost.MultiIndex. * Container_bptree's internal data structure relies on B+ trees with two variants based on the "augmented data structure" idea explained in Cormen, Leiserson, Rivest and Stein's "Algorithms" (an online explanation can be found at http://tinyurl.com/d8lze7v , for instance.) Briefly put, such augmenting scheme boils down to adding additional information to the nodes updated on insertion time and erasure time which is based on the additional info of the sibling nodes. The variants are: - bp_tree_idx: augmented to allow for efficient indexing - bp_tree_idx_acc: doubly augmented to support efficient indexing plus a generalized "accumulate" notion usable for calculating summatories, mean values, etc. * Question #1: why augmenting on B+ trees? The augmenting idea can be as easily built on top of RB-trees, which are the weapon of choice for implementing associative containers. Unless proved otherwise, RB-trees are assumed to be faster than B+ trees for in-memory structures (which your own tests hint at, cf. "insert N random elements" table.) You say: "The red black trees are yet out of the scope of this project for the reasons explained in the documentation." I've failed to find such explanation. Also, I suspect memory consumption is higher with B+ trees than RB trees. - Request #1: explain why B+ trees rather than RB trees. - Request #2: provide code for performance tests so that Boost.MultiIndex and oher data structures can be compared. - Request #3: augment the performance tests so as to yield memory consumption figures. * Usage scenario 1: Plain B+ trees as a replacement for RB tress for associative containers. My hunch is that RB-trees would perform better, as explained above. * Usage scenario 2: sequence-like containers with efficient insertion/erasure. Directly based on bp_tree_idx, with a do-nothing sorting critera. These containers are not truly random-access, but rather provide O(log n) access, which can be acceptable (or not) for some applications. * Usage scenario 3: associative container with random access iterators. This roughly compares with a multi_index_container equipped with two indices, one ordered, the other random-access. Basic differences between both: - The multi_index_container provides truly O(1) random access whereas Container_bptree is O(logN) (which is not random access strictly speaking.) - With the multi_index_container, sorting order and insertion order are separate --some sort of synchronization between both indices should be needed to mimic the Continer_bptree case. - Insertion/erasure with the multi_index_container is linear (though much faster than with std::vector), logarithmic with Container_bptree. - The memory overhead introduced by the multi_index_container is two pointers vs. only one word for Container_bptree. * Usage scenario 4: containers with fast accumulate, a direct application of the augmenting idea explained above. There is nothing in Boost that provides this functionality. My conclusions: 1. Augmented data structures (of which efficient indexing is just a particular application) are definitely useful and worth having in Boost. 2. Unless proved otherwise, we should be using RB trees (or maybe AVL) rather than B+ trees. 3. Depending on the particular needs, some usage scenarios can be better served with Boost.MultiIndex (note I'm saying "depending on needs", since differences between approaches are substantial.) 4 To me, rather than providing out of the box containers with augmenting we should provide this via two libraries: - As an additional capability of Boost.Intrusive associative containers (and AVL-tree based containers as well, if technically possible.) - As an additional capability of Boost.MultiIndex ordered indices. Hope this helps, Joaquín M López Muñoz Telefónica Digital