Re: [boost] new proposal: order statistics tree

8 Jan 2013

      On Tue, Jan 8, 2013 at 7:31 AM, Gottlob Frege <gottlobfrege@gmail.com>wrote:

 Basically both an order statistics tree and boost multi-index are examples
...
of containers with multiple indices.  I suspect there may be interesting
results from a compare/contrast type of study.
Boost MultiIndex allows indexing of a container via any number/type of
indices.  So you could have a container indexed via key, and also indexed
via integer.  Now the difference, I think, is that MultiIndex uses
independent indexes - changing a entry's key from "alpha" to "beta" doesn't
change its integer index from 1 to 2.
So that is a significant difference.  Might be worth mentioning in docs or
somewhere, just to avoid confusion.
But furthermore, I could probably implement the same behaviour as a order
statistics tree by using Boost.MultiIndex internally (and maintaining the
interrelationships of the indices as needed).  Would that be drastically
less efficient?
+1.
This is a very good comment and I think there is no simple answer. I was
asked similar questions during Boost discussion of augmented dynamically
allocated B+ trees, this is why I would like to make a few comments based
on my analysis.

The fact that creates an overlap with Boost.MultIndex and can cause
confusion is that augmented data structures support multiple augmenting.

For example, basic balanced tree provides efficient search of elements by
keys. This tree requires a comparison object. Augmenting of such tree by
counter of elements adds efficient access to elements by rank, which is
equivalent to index value in the subscript operator of an array or
std::vector. This variant of tree has extended functionality and can work
both with and without a comparison object. In other words, it offers
equally efficient operations for both ordered and unordered containers. Yet
another augmenting by the sum of values in children
nodes improves computational complexity of calculation of sum of any number
of consecutive elements from linear to logarithmic. As a result of the
double augmenting, the calculation of the mean value and standard deviation
can be done in O(log N) time only.

Such variants of augmented data structures support STL sequences and
associative containers with improved efficiency of specific member
functions. In applications that require multiple views of a data set
these augmented data structures and containers cannot replace
Boost.MultiIndex functionality.

On the other hand, replacing functionality of augmented data structures
with Boost.MultiIndex should be generally possible. However, this solution
will be less efficient in the case when an augmented data structure offers
a set of more efficient algorithms compared to data structures of
Boost.MultiIndex. The disadvantage of this approach is that one less
efficient operation can cause a performance bottleneck. Normally, augmented
data structures offer the improvement of the running time from O(N) down to
O(log N) for a single operation or an algorithm.

This performance benefit of augmented data structures certainly should not
be ignored by Boost.MultiIndex. Unfortunately, at this stage it is not
possible for a user algorithm to take advantage of Boost.MultiIndex with
advanced data structures. This example shows declaration of a multiset:

    typedef multi_index_container<int,
indexed_by<ordered_non_unique<identity<int>>>>
                multi_set_int ;

The obvious idea of a parameterization of containers used by views of
Boost.MultiIndex is potentially useful, but deeper analysis shows that it
is quite difficult and costly to implement. Boost.MultiIndex has special
non-standard requirements to the view containers. For example, std::vector
cannot be used in place of random access indices, since it does not remain
stable after insertion or erasure operations.

Hope that the discussion helps better understand the complexity of these
issues.

Regards,
Vadim Stadnik