Re: [boost] new proposal: order statistics tree

13 Jan 2013

      On Sun, Jan 13, 2013 at 7:10 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:
...
...
...
I implemented augmented red-black tree which doesn't check the
uniqueness
of values. I have conducted a simple survey: I prepared 5 input files
with
unique 2,7*10e7 integers. The first data set has sorted values as
other has
randomized. I repeated 7 times on every input. A simple program reads
from
input integers and puts it to set (maybe I should use multiset) from
VisualStudio or my order statistics tree. After that I measured time.
Then
program erases values from data structure in ordered way (from 0, 1,
2...).
After that I also measured how much time it takes. I uploaded the
results
here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange.
What kind of performance test would you think about?
Did you implement or is it possible to implement for this augmented red
black tree split, join and splice operations with logarithmic
computational
complexity?
As for tests, there are a number of performance tests of basic container
operations in the following projects:
1. For augmented dynamically allocated B+ trees:
https://github.com/vstadnik/stl_ext_adv
I have added this link, since this project has results of performance
tests
for Boost.MultiIndex.
2. For augmented array based B+ trees:
https://github.com/vstadnik/stl_ext_adv_review
This project is prepared for Boost review. It compares performance of new
containers with standard containers and Boost.Container.
The file "boost_performance.cpp" uses Boost.Chrono timer. If your
containers are STL compliant you should not have any problem with this
test
code.
Regards,
Vadim Stadnik
I have performed such test for 6 data containers (including
boost::multi_index described by Tony and Vadim): the comparison is uploaded
here: http://wyslijto.pl/plik/mxxpjbpmfn
I prepared 5 datasets (first one was sorted). I repeated 5 times whole
test for each container and dataset. The test inserts and accumulates as in
Vadim's tests. However I added the erasure performance.
The performance of your trees looks quite promising.
What are the sizes of the data sets used in the tests?
...
I haven't thought about possiblility to implement such operations like
splice, join, split. However, it seems to me to be impossible. When adding
treeA + treeB, the elements of treeB should be considered independently
 where to place them in result. For example if thereA contains 1, 3, 5, 7,
9 and treeB contains 2, 4, 6, 8, 10 all treeB nodes will be independently
located and inserted in O(n).
The support of these operations with logarithmic running time would be
quite beneficial for your variant of augmented trees.

The augmented trees can support not only STL associative containers, but
also STL sequences, such as std::vector and std::list. String algorithms
can also take advantage of these efficient operations. The split operation
is general. The join and splice operations are general for STL sequences.
For associative containers they can be used only in special cases. You can
find more details in documentation to augmented B+ trees.

The CGAL library
http://www.cgal.org/Manual/latest/doc_html/cgal_manual/STL_Extension_ref/Cla...

implements split and join (as function catenate()) operations for
Multiset. However, these trees are not augmented, this is why container
sizes must be re-counted in linear time, which creates a performance
problem for STL variants of containers. Note also that this is a problem of
all dynamically allocated data structures that can support bidirectional
iterators only, including std::list.

With augmented red-black trees the cost of these operations and
corresponding member functions of STL containers should be logarithmic.

Regards,
Vadim Stadnik