
On Sun, Jan 13, 2013 at 7:10 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:
I implemented augmented red-black tree which doesn't check the uniqueness of values. I have conducted a simple survey: I prepared 5 input files with unique 2,7*10e7 integers. The first data set has sorted values as other has randomized. I repeated 7 times on every input. A simple program reads from input integers and puts it to set (maybe I should use multiset) from VisualStudio or my order statistics tree. After that I measured time. Then program erases values from data structure in ordered way (from 0, 1, 2...). After that I also measured how much time it takes. I uploaded the results here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange. What kind of performance test would you think about?
Did you implement or is it possible to implement for this augmented red black tree split, join and splice operations with logarithmic computational complexity?
As for tests, there are a number of performance tests of basic container operations in the following projects:
1. For augmented dynamically allocated B+ trees: https://github.com/vstadnik/stl_ext_adv
I have added this link, since this project has results of performance tests for Boost.MultiIndex.
2. For augmented array based B+ trees: https://github.com/vstadnik/stl_ext_adv_review
This project is prepared for Boost review. It compares performance of new containers with standard containers and Boost.Container. The file "boost_performance.cpp" uses Boost.Chrono timer. If your containers are STL compliant you should not have any problem with this test code.
Regards, Vadim Stadnik
I have performed such test for 6 data containers (including boost::multi_index described by Tony and Vadim): the comparison is uploaded here: http://wyslijto.pl/plik/mxxpjbpmfn I prepared 5 datasets (first one was sorted). I repeated 5 times whole test for each container and dataset. The test inserts and accumulates as in Vadim's tests. However I added the erasure performance.
The performance of your trees looks quite promising. What are the sizes of the data sets used in the tests?
I haven't thought about possiblility to implement such operations like splice, join, split. However, it seems to me to be impossible. When adding treeA + treeB, the elements of treeB should be considered independently where to place them in result. For example if thereA contains 1, 3, 5, 7, 9 and treeB contains 2, 4, 6, 8, 10 all treeB nodes will be independently located and inserted in O(n).
The support of these operations with logarithmic running time would be quite beneficial for your variant of augmented trees. The augmented trees can support not only STL associative containers, but also STL sequences, such as std::vector and std::list. String algorithms can also take advantage of these efficient operations. The split operation is general. The join and splice operations are general for STL sequences. For associative containers they can be used only in special cases. You can find more details in documentation to augmented B+ trees. The CGAL library http://www.cgal.org/Manual/latest/doc_html/cgal_manual/STL_Extension_ref/Cla... implements split and join (as function catenate()) operations for Multiset. However, these trees are not augmented, this is why container sizes must be re-counted in linear time, which creates a performance problem for STL variants of containers. Note also that this is a problem of all dynamically allocated data structures that can support bidirectional iterators only, including std::list. With augmented red-black trees the cost of these operations and corresponding member functions of STL containers should be logarithmic. Regards, Vadim Stadnik