new proposal: order statistics tree

Szymon Wojciechowski

2 Jan 2013 2 Jan '13

9:46 p.m.

Hi, I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010. Is there anybody interested in? Best regards, Szymon Wojciechowski

Show replies by date

Klaim - Joël Lamotte

3 Jan 3 Jan

3:21 p.m.

Hi, On Wed, Jan 2, 2013 at 10:46 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order.

Can you clarify what you call key order? Maybe some examples? Joel Lamotte

Szymon Wojciechowski

4 Jan 4 Jan

9:37 p.m.

Hi, @Joel: If one has order_statistics_tree<string, less<string> > with string values and less comparator, and the container has been inserted: cat, dog, bird, fish, dinosaur then the elements can by accessed by ranks of the keys in that container. In example tree[0] returns bird while bird is the least element, tree[1] = cat, tree[2] = dinosaur, tree[3] = dog, tree[4] = fish, which is the greatest. Best regards, Szymon Wojciechowski "Klaim - Joël Lamotte" <mjklaim@gmail.com> pisze:

...

Hi,

On Wed, Jan 2, 2013 at 10:46 PM, Szymon Wojciechowski wrote:

...
Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order.

Can you clarify what you call key order? Maybe some examples?

Joel Lamotte

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Rene Rivera

3 Jan 3 Jan

3:29 p.m.

On Wed, Jan 2, 2013 at 3:46 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

How does it compare to < http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html#tr.hierarchy.augment> and the half dozen other augmented tree implementations that have appeared in the Boost list? (one as recent as last year). -- -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rahul Sr

4 Jan 4 Jan

4:33 a.m.

Will it be like this? http://stackoverflow.com/questions/11230734/c-stl-order-statistic-tree http://www.opensource.apple.com/source/llvmgcc42/llvmgcc42-2336.9/libstdc++-... Regards, Rahul On Thu, Jan 3, 2013 at 8:59 PM, Rene Rivera <grafikrobot@gmail.com> wrote:

...

On Wed, Jan 2, 2013 at 3:46 PM, Szymon Wojciechowski <sw10@interia.pl

...
wrote:

...
I would like to propose to add new data structure - order statistics

tree.

...
Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

How does it compare to <

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html#tr.hierar...

...
and the half dozen other augmented tree implementations that have appeared in the Boost list? (one as recent as last year).

-- -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Szymon Wojciechowski

9:40 p.m.

"Rahul Sr" <srivasrrahul@gmail.com> pisze:

...

Will it be like this? http://stackoverflow.com/questions/11230734/c-stl-order-statistic-tree

http://www.opensource.apple.com/source/llvmgcc42/llvmgcc42-2336.9/libstdc++-...

Regards, Rahul

You are right. Although I haven't implemented it with policies, this is the same data structure as I thought. Best regards, Szymon Wojciechowski

...

On Thu, Jan 3, 2013 at 8:59 PM, Rene Rivera wrote:

...
On Wed, Jan 2, 2013 at 3:46 PM, Szymon Wojciechowski >wrote:

...
I would like to propose to add new data structure - order statistics

tree.

...
Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

How does it compare to <

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html#tr.hierar...

...
and the half dozen other augmented tree implementations that have appeared in the Boost list? (one as recent as last year).

-- -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Szymon Wojciechowski

9:39 p.m.

"Rene Rivera" <grafikrobot@gmail.com> pisze:

...

On Wed, Jan 2, 2013 at 3:46 PM, Szymon Wojciechowski wrote:

...
I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

How does it compare to < http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html#tr.hierarchy.augment> and the half dozen other augmented tree implementations that have appeared in the Boost list? (one as recent as last year).

It is exactly, what you have shown in the article. However, is this implementation available somewhere? Generally, I can't find any implementation of this kind of tree except that from Rahul's links. Best regards, Szymon Wojciechowski

Rene Rivera

11:19 p.m.

On Fri, Jan 4, 2013 at 3:39 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

...
On Wed, Jan 2, 2013 at 3:46 PM, Szymon Wojciechowski wrote:

...
I would like to propose to add new data structure - order statistics

"Rene Rivera" <grafikrobot@gmail.com> pisze: tree.

...
...
Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

How does it compare to <

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2101.html#tr.hierar...

...
and the half dozen other augmented tree implementations that have

appeared

...
in the Boost list? (one as recent as last year).

It is exactly, what you have shown in the article. However, is this implementation available somewhere? Generally, I can't find any implementation of this kind of tree except that from Rahul's links.

The current state of that TR is at < http://svn.boost.org/svn/boost/sandbox/tree/>. Although a few notes about it.. * The code is old and since Bernhard wrote it I don't know at the moment the state of closely it follows the TR. Since most of it was written before the TR. * I've been updating the docs rather slowly (too many other projects going on). But I'm hoping to get an update of the TR done by in the Spring. And work again on the implementation after that. -- -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Vadim Stadnik

6 Jan 6 Jan

11:30 a.m.

On Thu, Jan 3, 2013 at 7:46 AM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

Yes, I am interested. What kind of tree did you implement? Did you measured performance of basic container operations and compared with the standard set and/or multiset? Regards, Vadim Stadnik

Francisco José Tapia

7 Jan 7 Jan

5:08 p.m.

Hi Szymon Please, could you explain me more details of your idea, and mode examples ? The case propose in your message can be done with a countertree::set,. This is a sorted structure with additional access by position like in a vector. The insertion, deletion and access to the elements are O(logN). You can find the code and documentation in http://dl.dropbox.com/u/8437476/works/countertree/doc/index.html As commented in previous messages about the augmented trees. “The countertrees are augmented data structures. I redesign the insertion deletion and balanced algorithms. The actual version is around 10% - 15% slower than the GCC version of red-black trees. I have designed on paper a new algorithms for to insert, delete and balance the countertree, in order to improve the speed. In the new version you have a catalog of small operations with the nodes ( insertion, deletion, rotations, swap nodes... ) All the operations of the tree can be done combining operations of this catalog. The node operations have two parts, the pointers and node color, and the augmented information. In the countertree the augmented information are the node counters. If you design the process to do with your augmented information for all the operations of the catalog, you can create your augmented balanced tree mixing the pointer and color operations with your code for each operation. You can have augmented trees with several concepts, like the position, accumulators and statistical operations as are defined in the Policy Based Data Structures and in the Boost Accumulator library. In the node operation you mix the pointers and color operations with the process of the catalog for all the augmented information added. I am near to finish the concurrent version of the countertree data structures. My intention is to begin with the augmented data trees when finish this. The solution exposed can be combined with the concurrent process, which can improve a lot the speed of these accumulators, statistical operations or any other idea.” I don't know if we are talking about the same idea or if there are differences. Sincerely yours Francisco Tapia

Szymon Wojciechowski

8:50 p.m.

"Francisco José Tapia" <fjtapia@gmail.com> pisze:

...

Hi Szymon

Please, could you explain me more details of your idea, and mode examples ?

The case propose in your message can be done with a countertree::set,. This is a sorted structure with additional access by position like in a vector. The insertion, deletion and access to the elements are O(logN).

You can find the code and documentation in http://dl.dropbox.com/u/8437476/works/countertree/doc/index.html

As commented in previous messages about the augmented trees.

“The countertrees are augmented data structures. I redesign the insertion deletion and balanced algorithms. The actual version is around 10% - 15% slower than the GCC version of red-black trees.

I have designed on paper a new algorithms for to insert, delete and balance the countertree, in order to improve the speed. In the new version you have a catalog of small operations with the nodes ( insertion, deletion, rotations, swap nodes... ) All the operations of the tree can be done combining operations of this catalog.

The node operations have two parts, the pointers and node color, and the augmented information. In the countertree the augmented information are the node counters. If you design the process to do with your augmented information for all the operations of the catalog, you can create your augmented balanced tree mixing the pointer and color operations with your code for each operation.

You can have augmented trees with several concepts, like the position, accumulators and statistical operations as are defined in the Policy Based Data Structures and in the Boost Accumulator library. In the node operation you mix the pointers and color operations with the process of the catalog for all the augmented information added.

I am near to finish the concurrent version of the countertree data structures. My intention is to begin with the augmented data trees when finish this. The solution exposed can be combined with the concurrent process, which can improve a lot the speed of these accumulators, statistical operations or any other idea.”

I don't know if we are talking about the same idea or if there are differences.

My solution solves the same problem as your - insert, erase and find in O(log n) and additionally access to element by it's order. It's newly implemented red-black tree. Each node has pointers to parent and sons, colour and value which represents subtree size. Unlike your, this solution still hasn't allocators. As in previous messages it seems to be faster than VS's set. Generally I will try to publish my sources at the end of current week. Best regards, Szymon Wojciechowski

Szymon Wojciechowski

8:47 p.m.

"Vadim Stadnik" <vadimstdk@gmail.com> pisze:

...

On Thu, Jan 3, 2013 at 7:46 AM, Szymon Wojciechowski wrote:

...
I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

Yes, I am interested. What kind of tree did you implement? Did you measured performance of basic container operations and compared with the standard set and/or multiset?

I implemented augmented red-black tree which doesn't check the uniqueness of values. I have conducted a simple survey: I prepared 5 input files with unique 2,7*10e7 integers. The first data set has sorted values as other has randomized. I repeated 7 times on every input. A simple program reads from input integers and puts it to set (maybe I should use multiset) from VisualStudio or my order statistics tree. After that I measured time. Then program erases values from data structure in ordered way (from 0, 1, 2...). After that I also measured how much time it takes. I uploaded the results here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange. What kind of performance test would you think about? Best regards, Szymon Wojciechowski

Vadim Stadnik

8 Jan 8 Jan

9:13 a.m.

On Tue, Jan 8, 2013 at 7:47 AM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

I implemented augmented red-black tree which doesn't check the uniqueness of values. I have conducted a simple survey: I prepared 5 input files with unique 2,7*10e7 integers. The first data set has sorted values as other has randomized. I repeated 7 times on every input. A simple program reads from input integers and puts it to set (maybe I should use multiset) from VisualStudio or my order statistics tree. After that I measured time. Then program erases values from data structure in ordered way (from 0, 1, 2...). After that I also measured how much time it takes. I uploaded the results here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange. What kind of performance test would you think about?

Did you implement or is it possible to implement for this augmented red black tree split, join and splice operations with logarithmic computational complexity? As for tests, there are a number of performance tests of basic container operations in the following projects: 1. For augmented dynamically allocated B+ trees: https://github.com/vstadnik/stl_ext_adv I have added this link, since this project has results of performance tests for Boost.MultiIndex. 2. For augmented array based B+ trees: https://github.com/vstadnik/stl_ext_adv_review This project is prepared for Boost review. It compares performance of new containers with standard containers and Boost.Container. The file "boost_performance.cpp" uses Boost.Chrono timer. If your containers are STL compliant you should not have any problem with this test code. Regards, Vadim Stadnik

Szymon Wojciechowski

13 Jan 13 Jan

9:10 a.m.

"Vadim Stadnik" <vadimstdk@gmail.com> pisze:

...

On Tue, Jan 8, 2013 at 7:47 AM, Szymon Wojciechowski wrote:

...
I implemented augmented red-black tree which doesn't check the uniqueness of values. I have conducted a simple survey: I prepared 5 input files with unique 2,7*10e7 integers. The first data set has sorted values as other has randomized. I repeated 7 times on every input. A simple program reads from input integers and puts it to set (maybe I should use multiset) from VisualStudio or my order statistics tree. After that I measured time. Then program erases values from data structure in ordered way (from 0, 1, 2...). After that I also measured how much time it takes. I uploaded the results here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange. What kind of performance test would you think about?

Did you implement or is it possible to implement for this augmented red black tree split, join and splice operations with logarithmic computational complexity?

As for tests, there are a number of performance tests of basic container operations in the following projects:

1. For augmented dynamically allocated B+ trees: https://github.com/vstadnik/stl_ext_adv

I have added this link, since this project has results of performance tests for Boost.MultiIndex.

2. For augmented array based B+ trees: https://github.com/vstadnik/stl_ext_adv_review

This project is prepared for Boost review. It compares performance of new containers with standard containers and Boost.Container. The file "boost_performance.cpp" uses Boost.Chrono timer. If your containers are STL compliant you should not have any problem with this test code.

Regards, Vadim Stadnik

I have performed such test for 6 data containers (including boost::multi_index described by Tony and Vadim): the comparison is uploaded here: http://wyslijto.pl/plik/mxxpjbpmfn I prepared 5 datasets (first one was sorted). I repeated 5 times whole test for each container and dataset. The test inserts and accumulates as in Vadim's tests. However I added the erasure performance. I haven't thought about possiblility to implement such operations like splice, join, split. However, it seems to me to be impossible. When adding treeA + treeB, the elements of treeB should be considered independently where to place them in result. For example if thereA contains 1, 3, 5, 7, 9 and treeB contains 2, 4, 6, 8, 10 all treeB nodes will be independently located and inserted in O(n). Best regards, Szymon Wojciechowski

Vadim Stadnik

9:31 p.m.

On Sun, Jan 13, 2013 at 7:10 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

...
...
I implemented augmented red-black tree which doesn't check the uniqueness of values. I have conducted a simple survey: I prepared 5 input files with unique 2,7*10e7 integers. The first data set has sorted values as other has randomized. I repeated 7 times on every input. A simple program reads from input integers and puts it to set (maybe I should use multiset) from VisualStudio or my order statistics tree. After that I measured time. Then program erases values from data structure in ordered way (from 0, 1, 2...). After that I also measured how much time it takes. I uploaded the results here: http://wyslijto.pl/plik/q5om39aoua - it's a little bit strange. What kind of performance test would you think about?

Did you implement or is it possible to implement for this augmented red black tree split, join and splice operations with logarithmic computational complexity?

As for tests, there are a number of performance tests of basic container operations in the following projects:

1. For augmented dynamically allocated B+ trees: https://github.com/vstadnik/stl_ext_adv

I have added this link, since this project has results of performance tests for Boost.MultiIndex.

2. For augmented array based B+ trees: https://github.com/vstadnik/stl_ext_adv_review

This project is prepared for Boost review. It compares performance of new containers with standard containers and Boost.Container. The file "boost_performance.cpp" uses Boost.Chrono timer. If your containers are STL compliant you should not have any problem with this test code.

Regards, Vadim Stadnik

I have performed such test for 6 data containers (including boost::multi_index described by Tony and Vadim): the comparison is uploaded here: http://wyslijto.pl/plik/mxxpjbpmfn I prepared 5 datasets (first one was sorted). I repeated 5 times whole test for each container and dataset. The test inserts and accumulates as in Vadim's tests. However I added the erasure performance.

The performance of your trees looks quite promising. What are the sizes of the data sets used in the tests?

...

I haven't thought about possiblility to implement such operations like splice, join, split. However, it seems to me to be impossible. When adding treeA + treeB, the elements of treeB should be considered independently where to place them in result. For example if thereA contains 1, 3, 5, 7, 9 and treeB contains 2, 4, 6, 8, 10 all treeB nodes will be independently located and inserted in O(n).

The support of these operations with logarithmic running time would be quite beneficial for your variant of augmented trees. The augmented trees can support not only STL associative containers, but also STL sequences, such as std::vector and std::list. String algorithms can also take advantage of these efficient operations. The split operation is general. The join and splice operations are general for STL sequences. For associative containers they can be used only in special cases. You can find more details in documentation to augmented B+ trees. The CGAL library http://www.cgal.org/Manual/latest/doc_html/cgal_manual/STL_Extension_ref/Cla... implements split and join (as function catenate()) operations for Multiset. However, these trees are not augmented, this is why container sizes must be re-counted in linear time, which creates a performance problem for STL variants of containers. Note also that this is a problem of all dynamically allocated data structures that can support bidirectional iterators only, including std::list. With augmented red-black trees the cost of these operations and corresponding member functions of STL containers should be logarithmic. Regards, Vadim Stadnik

Szymon Wojciechowski

15 Jan 15 Jan

8:43 p.m.

"Vadim Stadnik" <vadimstdk@gmail.com> pisze:

...

The performance of your trees looks quite promising. What are the sizes of the data sets used in the tests?

I forgot to add this. The data sets had 30 000 elements.

...

The support of these operations with logarithmic running time would be quite beneficial for your variant of augmented trees.

The augmented trees can support not only STL associative containers, but also STL sequences, such as std::vector and std::list. String algorithms can also take advantage of these efficient operations. The split operation is general. The join and splice operations are general for STL sequences. For associative containers they can be used only in special cases. You can find more details in documentation to augmented B+ trees.

The CGAL library http://www.cgal.org/Manual/latest/doc_html/cgal_manual/STL_Extension_ref/Cla...

implements split and join (as function catenate()) operations for Multiset. However, these trees are not augmented, this is why container sizes must be re-counted in linear time, which creates a performance problem for STL variants of containers. Note also that this is a problem of all dynamically allocated data structures that can support bidirectional iterators only, including std::list.

With augmented red-black trees the cost of these operations and corresponding member functions of STL containers should be logarithmic.

If it is possible as you said, it is worth to consider and do. However, now I would rather publish the code, so I am preparing it.

Szymon Wojciechowski

19 Jan 19 Jan

8:08 p.m.

Hi Everyone, I have just uploaded my source code here: https://sourceforge.net/projects/orderstatistics/ Of course, the project isn't finished, so I encourage Everyone to comment and critizize. I hope that it will help to improve it. I am still adding new overloading forms of some functions. Best regards, Szymon Wojciechowski

Szymon Wojciechowski

3 Feb 3 Feb

9:22 a.m.

Hi All, As the discussion fall silent, I would like to ask if is still anyone interested in this project or there is any reason for continuing it? Best regards, Szymon Wojciechowski

Gottlob Frege

7 Jan 7 Jan

5:53 p.m.

On Wed, Jan 2, 2013 at 4:46 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

Hi,

I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

Best regards, Szymon Wojciechowski

This should be compared against using boost multi-index. I suspect it has significant differences, but might still be enlightening. Tony

Szymon Wojciechowski

8:57 p.m.

"Gottlob Frege" <gottlobfrege@gmail.com> pisze:

...

On Wed, Jan 2, 2013 at 4:46 PM, Szymon Wojciechowski wrote:

...
Hi,

I would like to propose to add new data structure - order statistics tree. Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

Best regards, Szymon Wojciechowski

This should be compared against using boost multi-index. I suspect it has significant differences, but might still be enlightening. Tony

Could you explain to me, what do you mean, please? Best regards, Szymon Wojciechowski

Gottlob Frege

9:31 p.m.

On Mon, Jan 7, 2013 at 3:57 PM, Szymon Wojciechowski <sw10@interia.pl>wrote:

...

...
On Wed, Jan 2, 2013 at 4:46 PM, Szymon Wojciechowski wrote:

...
Hi,

I would like to propose to add new data structure - order statistics

"Gottlob Frege" <gottlobfrege@gmail.com> pisze: tree.

...
...
Briefly, it is container which allows logarithmic inserting, searching and erasing as in set, but additionally it permits to access elements via numerical value - key order. I have some ready implementation with all appropriate functions (without their all overloaded forms) with gtest regression list, but the code isn't boostified. The regression passes on gcc 4.6.3 and vs2010.

Is there anybody interested in?

Best regards, Szymon Wojciechowski

This should be compared against using boost multi-index. I suspect it has significant differences, but might still be enlightening. Tony

Could you explain to me, what do you mean, please?

Best regards, Szymon Wojciechowski

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Basically both an order statistics tree and boost multi-index are examples of containers with multiple indices. I suspect there may be interesting results from a compare/contrast type of study. Boost MultiIndex allows indexing of a container via any number/type of indices. So you could have a container indexed via key, and also indexed via integer. Now the difference, I think, is that MultiIndex uses independent indexes - changing a entry's key from "alpha" to "beta" doesn't change its integer index from 1 to 2. So that is a significant difference. Might be worth mentioning in docs or somewhere, just to avoid confusion. But furthermore, I could probably implement the same behaviour as a order statistics tree by using Boost.MultiIndex internally (and maintaining the interrelationships of the indices as needed). Would that be drastically less efficient? And also, how do the interfaces compare? Should there be any alignmen/consistency t in the interfaces, so that some template code could use either? Not sure how much template code exists that requires "any container with 2 indices, one of which is an int". Probably consistency of interface is more important for learning/understanding/remembering than for template code. Tony

Vadim Stadnik

8 Jan 8 Jan

1:55 p.m.

On Tue, Jan 8, 2013 at 7:31 AM, Gottlob Frege <gottlobfrege@gmail.com>wrote: Basically both an order statistics tree and boost multi-index are examples

...

of containers with multiple indices. I suspect there may be interesting results from a compare/contrast type of study.

Boost MultiIndex allows indexing of a container via any number/type of indices. So you could have a container indexed via key, and also indexed via integer. Now the difference, I think, is that MultiIndex uses independent indexes - changing a entry's key from "alpha" to "beta" doesn't change its integer index from 1 to 2.

So that is a significant difference. Might be worth mentioning in docs or somewhere, just to avoid confusion.

But furthermore, I could probably implement the same behaviour as a order statistics tree by using Boost.MultiIndex internally (and maintaining the interrelationships of the indices as needed). Would that be drastically less efficient?

+1. This is a very good comment and I think there is no simple answer. I was asked similar questions during Boost discussion of augmented dynamically allocated B+ trees, this is why I would like to make a few comments based on my analysis. The fact that creates an overlap with Boost.MultIndex and can cause confusion is that augmented data structures support multiple augmenting. For example, basic balanced tree provides efficient search of elements by keys. This tree requires a comparison object. Augmenting of such tree by counter of elements adds efficient access to elements by rank, which is equivalent to index value in the subscript operator of an array or std::vector. This variant of tree has extended functionality and can work both with and without a comparison object. In other words, it offers equally efficient operations for both ordered and unordered containers. Yet another augmenting by the sum of values in children nodes improves computational complexity of calculation of sum of any number of consecutive elements from linear to logarithmic. As a result of the double augmenting, the calculation of the mean value and standard deviation can be done in O(log N) time only. Such variants of augmented data structures support STL sequences and associative containers with improved efficiency of specific member functions. In applications that require multiple views of a data set these augmented data structures and containers cannot replace Boost.MultiIndex functionality. On the other hand, replacing functionality of augmented data structures with Boost.MultiIndex should be generally possible. However, this solution will be less efficient in the case when an augmented data structure offers a set of more efficient algorithms compared to data structures of Boost.MultiIndex. The disadvantage of this approach is that one less efficient operation can cause a performance bottleneck. Normally, augmented data structures offer the improvement of the running time from O(N) down to O(log N) for a single operation or an algorithm. This performance benefit of augmented data structures certainly should not be ignored by Boost.MultiIndex. Unfortunately, at this stage it is not possible for a user algorithm to take advantage of Boost.MultiIndex with advanced data structures. This example shows declaration of a multiset: typedef multi_index_container<int, indexed_by<ordered_non_unique<identity<int>>>> multi_set_int ; The obvious idea of a parameterization of containers used by views of Boost.MultiIndex is potentially useful, but deeper analysis shows that it is quite difficult and costly to implement. Boost.MultiIndex has special non-standard requirements to the view containers. For example, std::vector cannot be used in place of random access indices, since it does not remain stable after insertion or erasure operations. Hope that the discussion helps better understand the complexity of these issues. Regards, Vadim Stadnik

4555

Age (days ago)

4587

Last active (days ago)

List overview

Download

21 comments

7 participants

participants (7)

Francisco José Tapia
Gottlob Frege
Klaim - Joël Lamotte
Rahul Sr
Rene Rivera
Szymon Wojciechowski
Vadim Stadnik