[gsoc] Boost graph library - Hypergraphs

Hi, I'm a Computer Science undergrad from India applying for GSoC 2009. This mail is regarding the hypergraph project for the BGL on the ideas list. I have taken courses in graph theory and algorithms and have the C++ background required. I have gone through the idea and done a little 'research' on the topic. I have also read the boost graph library documentation and found the concept to be quite interesting and somewhat tough to implement. I have certain ideas and doubts on the hypergraph project which I would like to discuss with a mentor. First of all, what about the assumptions. Can we assume the hypergraph to be directed ro undirected or do we have to try to find a generic solution for both? "This project minimally involves the definition of one hypergraph data structure and a generic traversal algorithm". Now, for the data structure, we can use an incidence matrix with vertices as rows and edges as columns. This incidence matrix structure will have to be implemented. As an alternative, we can have a 'bipartite' kind of data structure with two groups of vertices V and E, where V represents the group of vertices and E the group of edges. Identification and definition of other concepts related to hypergraphs, like hyperedges etc. can then be done accordingly. As for the traversal algorithm, is it to be decided now (before applying) which algorithm is to be used ? I searched for and found a few on the internet like http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=000778 . I would be happy if you could provide me links of some traversal algorithms I should look at and which would help me understand the concept better. I'm new to Boost and I hope I have made an humble beginning. :) Hoping for some replies and suggestions, Ciao, Om P Patri, Sophomore, Computer Science and Engineering Indian Institute of Technology, Guwahati, India --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- A hypergraph (http://en.wikipedia.org/wiki/Hypergraph) is a generalization of graphs where an edge can connect any number of vertices (instead of exactly two). Hypergraphs can also be directed or undirected, and have their own algorithms (e.g., traversal and partitioning). This project minimally involves the definition of one hypergraph data structure and a generic traversal algorithm. This project would also involve the identification and definition of generic concepts related to hypergraphs. Note: This is a non-trivial project that requires substantial knowledge of graph theory, algorithms, and generic programming.

Om Prasad Patri skrev:
Hi,
I'm a Computer Science undergrad from India applying for GSoC 2009. This mail is regarding the hypergraph project for the BGL on the ideas list. I have taken courses in graph theory and algorithms and have the C++ background required. I have gone through the idea and done a little 'research' on the topic. I have also read the boost graph library documentation and found the concept to be quite interesting and somewhat tough to implement. I have certain ideas and doubts on the hypergraph project which I would like to discuss with a mentor.
First of all, what about the assumptions. Can we assume the hypergraph to be directed ro undirected or do we have to try to find a generic solution for both?
IIRC, hypergraphs are only "undirected". An edge is represented as a set of vertices, that is, as an element of the powerset of vertices. -Thorsten

IIRC, hypergraphs are only "undirected". An edge is represented as a set of vertices, that is, as an element of the powerset of vertices.
This is incorrect. http://www.cis.upenn.edu/~lhuang3/wpe2/papers/gallo92directed.pdf Andrew Sutton andrew.n.sutton@gmail.com

Hello, it is (obviously) correct that hypergraphs are "undirected", since those structures mentioned below are "directed hypergraphs". More substantially, I would strongly warn against considering "directed hypergraphs", if the people involved are not actually doing research in this direction (for some time, that is). There are only a few people considering directed hypergraphs, and it is a research topic (already that the document is from 1992 says something; I'm actually working on the topic, on the connections between hypergraphs and SAT (as in that paper), and thus I would be glad to see something in this direction, but my impression is that the people involved here are not really specialists, but just want to implement something "standard"). Hypergraphs themselves are already such complicated beasts, that I would predict that no useful library can come out of a concept combining "directed hypergraphs" with hypergraphs. Oliver On Thu, Mar 26, 2009 at 06:00:17PM -0400, Andrew Sutton wrote:
IIRC, hypergraphs are only "undirected". An edge is represented as a set of vertices, that is, as an element of the powerset of vertices.
This is incorrect.
http://www.cis.upenn.edu/~lhuang3/wpe2/papers/gallo92directed.pdf
Andrew Sutton andrew.n.sutton@gmail.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Dr. Oliver Kullmann Computer Science Department Swansea University Faraday Building, Singleton Park Swansea SA2 8PP, UK http://cs.swan.ac.uk/~csoliver/

it is (obviously) correct that hypergraphs are "undirected", since those structures mentioned below are "directed hypergraphs".
Since an incidence matrix can be used to implement either, it doesn't make sense to (to me) discuss them exclusively. The fact that the term "hypergraph" denotes to its "undirected" property does not exclude the data structure from being used to implement both. This is directly analogous to the definitions of the adjacency list and matrix in the BGL. Hypergraphs themselves are already such complicated beasts, that I would
predict that no useful library can come out of a concept combining "directed hypergraphs" with hypergraphs.
I don't think that "It's challenging, therefore it's not useful" strikes the right tone here. Andrew Sutton andrew.n.sutton@gmail.com

On 03/26/09 17:00, Andrew Sutton wrote:
IIRC, hypergraphs are only "undirected". An edge is represented as a set of vertices, that is, as an element of the powerset of vertices.
This is incorrect.
http://www.cis.upenn.edu/~lhuang3/wpe2/papers/gallo92directed.pdf
Hi Andrew and Om. This is a little off-topic, but I was wondering about an application. The application is representing Makefile dependencies. It seems that A target, Target (some vertex), would have a set of dependencies, Dep(Target) (a subset of the vertices), which could be represented by a hypergraph arc, E, (as defined on page 3 of gallo92directed.pdf, and where H(E) == {Target} and T(E) = Dep(Target)). Now if that hypergraph arc were labelled, then the label would represent the actions to produce the target from the dependencies. Now one added complication is that some labels may require "refinement". For example, boost build, when creating executables from object files, requires (AFAICT) all object files be produced with the same compiler. So, maybe for those labels requiring refinement, there could be some morphism to map each node in the hypergraph ending at an exectuable node to a "similar" hypergraph where each label has the same compiler or toolset in boost build terms. Maybe this could a future project building on this hypergraph project. -regards, Larry

Maybe this could a future project building on this hypergraph project.
That's an interesting idea. It actually sounds like a viable research project - "Hypergraph transformations, morphisms, and/or rewrite rules for software build management". Feel free to borrow the title when you submit the paper :) Andrew Sutton andrew.n.sutton@gmail.com

I have certain ideas and doubts on the hypergraph project which I would like to discuss with a mentor.
I posted the project idea, so I would be more than happy to address your questions.
First of all, what about the assumptions. Can we assume the hypergraph to be directed ro undirected or do we have to try to find a generic solution for both?
My guess is that whatever data structure you choose to implement hypergraphs will actually support both directed and undirected graphs. For example, the BGL adjacency_list supports both directed and undirected graphs (via parameterization over a tag class, actually). I think an ideal solution could use the same data structure to do both. I've read something, somewhere on how an incidence matrix can be used to represent directed hypergraphs (maybe wikipedia).
"This project minimally involves the definition of one hypergraph data structure and a generic traversal algorithm". Now, for the data structure, we can use an incidence matrix with vertices as rows and edges as columns. This incidence matrix structure will have to be implemented. As an alternative, we can have a 'bipartite' kind of data structure with two groups of vertices V and E, where V represents the group of vertices and E the group of edges. Identification and definition of other concepts related to hypergraphs, like hyperedges etc. can then be done accordingly.
You could choose to implement a hypergraph using either approach - or both :) Just remember that resizing matrices can be a very expensive operation, so the bipartite approach may be more flexible. Defining hypergraph *concepts* and a generic hypergraph interface would best be done using both.
As for the traversal algorithm, is it to be decided now (before applying) which algorithm is to be used ? I searched for and found a few on the internet like http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=000778 . I would be happy if you could provide me links of some traversal algorithms I should look at and which would help me understand the concept better.
That's probably the best place to start. As far writing a proposal goes, citing this and trying to dig up a couple of other algorithms would be sufficient. I don't know of any off the top of my head, and a quick search just turned up a couple of variations on this algorithm.
I'm new to Boost and I hope I have made an humble beginning. :) Hoping for some replies and suggestions,
Welcome :) Let us know if you have any questions. Andrew Sutton andrew.n.sutton@gmail.com

Hi, On Fri, Mar 27, 2009 at 3:51 AM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote:
I have certain ideas and doubts on the hypergraph project which I would like to discuss with a mentor.
I posted the project idea, so I would be more than happy to address your
questions.
Thanks for your reply. I appreciate it :)
First of all, what about the assumptions. Can we assume the hypergraph to be directed ro undirected or do we have to try to find a generic solution for both?
My guess is that whatever data structure you choose to implement hypergraphs will actually support both directed and undirected graphs. For example, the BGL adjacency_list supports both directed and undirected graphs (via parameterization over a tag class, actually). I think an ideal solution could use the same data structure to do both. I've read something, somewhere on how an incidence matrix can be used to represent directed hypergraphs (maybe wikipedia).
We will try to implement such a data structure which works for both. This should work since anything which works for directed graphs should work for undirected graphs (I read something like this in the BGL docs).
"This project minimally involves the definition of one hypergraph data structure and a generic traversal algorithm". Now, for the data structure, we can use an incidence matrix with vertices as rows and edges as
columns. > This incidence matrix structure will have to be implemented. As an > alternative, we can have a 'bipartite' kind of data structure with two > groups of vertices V and E, where V represents the group of vertices and E > the group of edges. Identification and definition of other concepts related > to hypergraphs, like hyperedges etc. can then be done accordingly.
You could choose to implement a hypergraph using either approach - or both :) Just remember that resizing matrices can be a very expensive operation, so the bipartite approach may be more flexible. Defining hypergraph *concepts* and a generic hypergraph interface would best be done using both.
The ideal solution to use both because:
for some algorithms will be easier to use through the incidence matrix (but as you said, resizing the matrix for every operation is an expensive operation of O(V*E) ). For some other algorithms, it is easier to use the bipartite approach. We should leave to the user who adds an algorithm to BGL to choose whichever suits his needs the most. Defining hypergraph concepts again, can be done using the best of both.
As for the traversal algorithm, is it to be decided now (before applying) which algorithm is to be used ? I searched for and found a few on the internet like http://www.icsi.berkeley.edu/cgi-bin/pubs/publication.pl?ID=000778 . I would be happy if you could provide me links of some traversal algorithms I should look at and which would help me understand the concept better.
That's probably the best place to start. As far writing a proposal goes, citing this and trying to dig up a couple of other algorithms would be sufficient. I don't know of any off the top of my head, and a quick search just turned up a couple of variations on this algorithm.
Thanks for telling me that. There are also some directed hypergraph
'visit' algorithms in the paper you mentioned. We could implement those. As an afterthought, and being frank, I'm not very sure as of now as to how the exact implementation will be done using the BGL and this is appearing to be a tough (but tremendously exciting) problem! I'm looking at the other proposed project idea on *incidence matrices* and will post on it soon. Finally, thanks for all the support and the warm welcome. :) Ciao, Regards, Om Patri.

The ideal solution to use both because: for some algorithms will be easier to use through the incidence matrix (but as you said, resizing the matrix for every operation is an expensive operation of O(V*E) ). For some other algorithms, it is easier to use the bipartite approach. We should leave to the user who adds an algorithm to BGL to choose whichever suits his needs the most. Defining hypergraph concepts again, can be done using the best of both.
This is the same argument for supporting adjacency lists and matrices in the graph library. Some algorithms work better with matrices. The graph concepts are really derived from observations about the structure and similar use of different implementations. You basically have to ask yourself, "How are the similar? How are they different? How can we make dissimilar structures appear similar?" This may not be worth attempting until late in the summer, and may not even be possible without both implementations. Maybe. As an afterthought, and being frank, I'm not very sure as of now as to how
the exact implementation will be done using the BGL and this is appearing to be a tough (but tremendously exciting) problem! I'm looking at the other proposed project idea on *incidence matrices* and will post on it soon.
That's probably a better place to start. Good luck. Andrew Sutton andrew.n.sutton@gmail.com
participants (5)
-
Andrew Sutton
-
Larry Evans
-
Oliver Kullmann
-
Om Prasad Patri
-
Thorsten Ottosen