[graph_parallel] Is a distributed directed edge list possible?

Is it currently possible to represent a highly connected graph where any given vertex may contain more directed edges than can be held in a single process' OutEdgeListS? I see that the distributed version of adjacency_list allows vertices to be distributed across multiple processes in a process group, but I don't see the ability to allow directed edges to be distributed across multiple processes. Is this an intentional limitation? caustik

On Mon, 3 Jan 2011, caustik wrote:
Is it currently possible to represent a highly connected graph where any given vertex may contain more directed edges than can be held in a single process' OutEdgeListS?
I see that the distributed version of adjacency_list allows vertices to be distributed across multiple processes in a process group, but I don't see the ability to allow directed edges to be distributed across multiple processes. Is this an intentional limitation?
No, it is not possible. PBGL currently distributes graph vertices (a so-called 1-D distribution) and keeps each edge with its source vertex. To do what you are describing, we would need to implement 2-D (edge-based) distributions, which do not fit into the current PBGL design. What do you need to do with the graphs? Combinatorial BLAS (http://gauss.cs.ucsb.edu/~aydin/doc/html/index.html) supports 2-D distributions and might be useful for you to try instead of PBGL. -- Jeremiah Willcock

At Wed, 5 Jan 2011 13:35:59 -0500 (EST), Jeremiah Willcock wrote:
On Mon, 3 Jan 2011, caustik wrote:
Is it currently possible to represent a highly connected graph where any given vertex may contain more directed edges than can be held in a single process' OutEdgeListS?
I see that the distributed version of adjacency_list allows vertices to be distributed across multiple processes in a process group, but I don't see the ability to allow directed edges to be distributed across multiple processes. Is this an intentional limitation?
No, it is not possible. PBGL currently distributes graph vertices (a so-called 1-D distribution) and keeps each edge with its source vertex. To do what you are describing, we would need to implement 2-D (edge-based) distributions, which do not fit into the current PBGL design.
Another thought I had, if graph connectivity is really super-high, is that you might be able to use the inverse graph representation. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Dave Abrahams <dave <at> boostpro.com> writes:
At Wed, 5 Jan 2011 13:35:59 -0500 (EST), Jeremiah Willcock wrote:
On Mon, 3 Jan 2011, caustik wrote:
Is it currently possible to represent a highly connected graph where any given vertex may contain more directed edges than can be held in a single process' OutEdgeListS?
I see that the distributed version of adjacency_list allows vertices to be distributed across multiple processes in a process group, but I don't see the ability to allow directed edges to be distributed across multiple processes. Is this an intentional limitation?
No, it is not possible. PBGL currently distributes graph vertices (a so-called 1-D distribution) and keeps each edge with its source vertex. To do what you are describing, we would need to implement 2-D (edge-based) distributions, which do not fit into the current PBGL design.
Another thought I had, if graph connectivity is really super-high, is that you might be able to use the inverse graph representation.
Caustik, what kind of kernels and/or algorithms you want to run on the distributed graph? Combinatorial BLAS has evolved quite a bit (with many more primitives, more flexible vector distributions, a few more example algorithms, and numerous bug fixes) from the 1.0 alpha version that is on the web. Until the next release (likely to happen in a month or two), I can share the active branch with you if your application seems to fit well. If it doesn't, it'll still be very useful for us to see what kind of primitives we do not support so that we can include them in the future. Cheers, - Aydin

On Wed, Jan 5, 2011 at 12:44 PM, Aydin Buluc <abuluc@lbl.gov> wrote:
Dave Abrahams <dave <at> boostpro.com> writes:
At Wed, 5 Jan 2011 13:35:59 -0500 (EST), Jeremiah Willcock wrote:
On Mon, 3 Jan 2011, caustik wrote:
Is it currently possible to represent a highly connected graph where
any
given vertex may contain more directed edges than can be held in a single process' OutEdgeListS?
I see that the distributed version of adjacency_list allows vertices to be distributed across multiple processes in a process group, but I don't see the ability to allow directed edges to be distributed across multiple processes. Is this an intentional limitation?
No, it is not possible. PBGL currently distributes graph vertices (a so-called 1-D distribution) and keeps each edge with its source vertex. To do what you are describing, we would need to implement 2-D (edge-based) distributions, which do not fit into the current PBGL design.
Another thought I had, if graph connectivity is really super-high, is that you might be able to use the inverse graph representation.
Caustik, what kind of kernels and/or algorithms you want to run on the distributed graph?
Combinatorial BLAS has evolved quite a bit (with many more primitives, more flexible vector distributions, a few more example algorithms, and numerous bug fixes) from the 1.0 alpha version that is on the web.
Until the next release (likely to happen in a month or two), I can share the active branch with you if your application seems to fit well. If it doesn't, it'll still be very useful for us to see what kind of primitives we do not support so that we can include them in the future.
Cheers, - Aydin
I'm not familiar with an 'inverse graph' -- I take it that is where each vertex holds a list of the edges which do not exist, and assumes all others to exist? Unfortunately that would still be a problem for these graphs, because not every vertex will be highly connected (there are some with no connections, many with a few connections, any a few with many connections). The graph I'm trying to build is one where there are leaf vertices which hold data, and other vertices which hold subgraphs of the overall graph. So a vertex may be a leaf, a subgraph containing leafs, a subgraph containing other subgraphs, or a subgraph containing both leafs and other subgraphs. The algorithms that it needs to run take any of those subgraphs, enumerate all of it's vertices recursively (a vertex holding a subgraph can contain other vertices which hold other subgraphs, etc), and run various operations on the data. Combinatorial BLAS sounds fascinating, I'll have to take a deeper look. Not quite sure yet if even a 2-D distribution will be the right data model. I'm beginning to lean more toward a map/reduce mechanism where each process holds only local data which can then be modified and eventually merged in the map/reduce functions. caustik

At Wed, 5 Jan 2011 19:58:01 -0800, caustik wrote:
I'm not familiar with an 'inverse graph' -- I take it that is where each vertex holds a list of the edges which do not exist, and assumes all others to exist?
That's what I had in mind, but...
Unfortunately that would still be a problem for these graphs, because not every vertex will be highly connected (there are some with no connections, many with a few connections, any a few with many connections).
...it sounds like that might not work too well for you. -- Dave Abrahams BoostPro Computing http://www.boostpro.com
participants (4)
-
Aydin Buluc
-
caustik
-
Dave Abrahams
-
Jeremiah Willcock