[Boost Graph Library] Possible succint data structure addition

newer
[Boost Real] GSoC 2020 -...

Joana Modesto Hrotko

8 Feb 2020 8 Feb '20

4:44 p.m.

Dear Boost community, I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library. INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images. Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019). Thank you in advance for your attention Best regards, Joana Hrotkó

Show replies by date

degski

9 Feb 9 Feb

1:18 p.m.

On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost < boost@lists.boost.org> wrote:

...

Dear Boost community,

I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library.

INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images.

Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019).

Great subject for a thesis. Munro, the man of the beap <https://github.com/pfalcon/beap> (a dynamic implicit data structure <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't really understand why you would like a dynamic k2-tree, you can just as well have a static one <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp> [Eytzinger-tree, the levels are implicit and the sorting is recursively done, partially ( with std::nth_element, (in-place) sorting around the median (the nth-element) ) along the way, getting more sorted on lower levels, on my laptop I can easily run the recursion, without exhausting stack-space, till my heap space starts to run out (with default stack size settings)] and copy the data on re-balancing (or re-balance the tree on copying), as, in general, one insertion will completely change the tree, due to the alternating dimensions, and the mechanics of updating is out-performed very quickly by the recursive (in place) construction. Having said that, I think there is still some (maybe a lot of) mileage into exploring how these things can/should be well implemented in C++17. Best of luck with your thesis. degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

degski

1:27 p.m.

On Sun, 9 Feb 2020 at 07:18, degski <degski@gmail.com> wrote:

...

On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost < boost@lists.boost.org> wrote:

...
Dear Boost community,

I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library.

INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images.

Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019).

Great subject for a thesis. Munro, the man of the beap <https://github.com/pfalcon/beap> (a dynamic implicit data structure <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't really understand why you would like a dynamic k2-tree, you can just as well have a static one <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp> [Eytzinger-tree, the levels are implicit and the sorting is recursively done, partially ( with std::nth_element, (in-place) sorting around the median (the nth-element) ) along the way, getting more sorted on lower levels, on my laptop I can easily run the recursion, without exhausting stack-space, till my heap space starts to run out (with default stack size settings)] and copy the data on re-balancing (or re-balance the tree on copying), as, in general, one insertion will completely change the tree, due to the alternating dimensions, and the mechanics of updating is out-performed very quickly by the recursive (in place) construction. Having said that, I think there is still some (maybe a lot of) mileage into exploring how these things can/should be well implemented in C++17.

Best of luck with your thesis.

degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

I just realized, one can completely (and simply) hide the copying/re-construction behind a dynamic (vector-like) interface, and the dynamic implicit kd-tree is a fact. This is now on my TODO-list. KD-trees of dimensions 1, 2 and 3 seem to be the norm (at 4 dimensions one runs already in the curse of dimensionality, and linear search does better). degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Mateusz Loskot

8:09 p.m.

On Sun, 9 Feb 2020 at 14:18, degski via Boost <boost@lists.boost.org> wrote:

...

On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost <boost@lists.boost.org> wrote:

...
Dear Boost community,

I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library.

INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images.

Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019).

Great subject for a thesis. Munro, the man of the beap <https://github.com/pfalcon/beap> (a dynamic implicit data structure <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't really understand why you would like a dynamic k2-tree, you can just as well have a static one <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>

By the way, I think k2-tree which is of Joana interest may be one of the special cases of the k-ary https://github.com/boostorg/graph/pull/139 Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Joana Modesto Hrotko

9:53 p.m.

A 2020-02-09 20:09, Mateusz Loskot via Boost escreveu:

...

On Sun, 9 Feb 2020 at 14:18, degski via Boost <boost@lists.boost.org> wrote:

...
On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost <boost@lists.boost.org> wrote:

...
Dear Boost community,

I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library.

INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images.

Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019).

Great subject for a thesis. Munro, the man of the beap <https://github.com/pfalcon/beap> (a dynamic implicit data structure <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't really understand why you would like a dynamic k2-tree, you can just as well have a static one <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>

By the way, I think k2-tree which is of Joana interest may be one of the special cases of the k-ary https://github.com/boostorg/graph/pull/139

Best regards,

Hi first of all thank you so much for the reply of all of you. So the k2-tree is a succint data structure and it is equivalent to a web graph, so conceptually is somehow similar to the k-ary tree however, implementation wise it is not the same. In the k2-tree it is meant to represent the relation between nodes in a graph, based on the matrix representation of the graph. In order to compress the information in the matrix (hopefully sparse), the method consists in subdividing the binary matrix into k rows and k columns, thus producing k2 submatrices of size n/k by n/k. Each of these submatriceswill be a child of the root node and its value will be 1 iff there is at least one 1 in the cells of the submatrix. A 0 child means that the submatrix has all 0s and hence the tree decomposition ends there. Once the level 1, with the children of the root, has been built, the method proceeds recursively for each child with value 1, until we reach submatrices full of 0s, or we reach the cells of the original matrix. The k2-ary tree described is represented in a compact way with just two bit arrays: T (tree) and L(leaves). T stores all the bits of the k2-tree except those in the last level following a level-wise traversal. Bitmap L stores the last level of the tree. From the pull requests it seems there is no similar data structure to the k2-tree being implemented right now. However how hard is it to be able to complete the pull request? And how is the decision taken? Best regards, Joana

2007

Age (days ago)

2008

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

degski
Joana Modesto Hrotko
Mateusz Loskot