space complexity of push_relabel_max_flow

Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions: 1) How do I find out space used per node and per arc? 2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space? 3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)? 4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be? Thanks, Dan Jiang _________________________________________________________________ Hotmail & Messenger. Get them on your phone now. http://go.microsoft.com/?linkid=9724463

On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions:
1) How do I find out space used per node and per arc?
I do not know off hand for adjacency_list; it is probably reasonable (one integer per edge, some per vertex, plus properties). However, remember that dynamically allocated vectors can have up to a 100% space overhead. I would recommend using compressed_sparse_row_graph and then changing the EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus properties; accesses will also likely be faster than for adjacency_list.
2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space?
It does not appear so, because of the residual capacities that need to be stored. If you can find an algorithm description that doesn't need reverse edges, though, it might be possible to change the algorithm to remove them.
3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)?
It looks like there is a bunch of that. I see six vectors and a queue in there, and one of those vectors is a vector of std::lists.
4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
It would probably be quite a bit slower. If you need to go beyond your system's memory, why not let the OS's swap mechanism do the disk accesses? That would take care of all of the caching and such that you would need, and I'm not sure writing your own would provide much benefit. How much memory do you have, and what type are you using for your edge capacities? I have some ideas for how to shrink things, but I would first like to see a baseline of how bad things are now. If you switch to compressed_sparse_row_graph and try to run the algorithm (possibly on scaled-down graphs), how much memory does it use? Running that for a few data sizes might give an idea of how much actual space per vertex and edge the algorithm uses. BTW, is there any regular structure in your graph? For example, is it a grid graph like some computer vision algorithms use? -- Jeremiah Willcock

Thanks for the reply. Please see my reply inline.
Date: Tue, 13 Apr 2010 12:53:45 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions:
1) How do I find out space used per node and per arc?
I do not know off hand for adjacency_list; it is probably reasonable (one integer per edge, some per vertex, plus properties). However, remember that dynamically allocated vectors can have up to a 100% space overhead. I would recommend using compressed_sparse_row_graph and then changing the EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus properties; accesses will also likely be faster than for adjacency_list.
I'll try that later on.
2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space?
It does not appear so, because of the residual capacities that need to be stored. If you can find an algorithm description that doesn't need reverse edges, though, it might be possible to change the algorithm to remove them.
I just read the push_relabel_max_flow algorithm, it requires reverse edge list and their capacities.
3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)?
It looks like there is a bunch of that. I see six vectors and a queue in there, and one of those vectors is a vector of std::lists.
4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
It would probably be quite a bit slower. If you need to go beyond your system's memory, why not let the OS's swap mechanism do the disk accesses? That would take care of all of the caching and such that you would need, and I'm not sure writing your own would provide much benefit. I tried OS paging on 64-bit OS using my application. The performance decreases from 100 to 1000times and plus the paging happens only when available physical ram runs out. I'd like to be able to do partial paging on demand. I am looking into using STLdb to hold capacity properties, one for forward edge, another for reverse edge. But saving is just 2*4*NumOfArcs (I am using float for capacity), which might not be dramatic saving... How much memory do you have, and what type are you using for your edge capacities? I have some ideas for how to shrink things, but I would first like to see a baseline of how bad things are now. If you switch to compressed_sparse_row_graph and try to run the algorithm (possibly on scaled-down graphs), how much memory does it use? Running that for a few data sizes might give an idea of how much actual space per vertex and edge the algorithm uses. I have 4gb physical ram, usually with 3.2gb free. I use float for edge capacity. I'll look into compressed_sparse_row_graph and see if there is much saving.
BTW, is there any regular structure in your graph? For example, is it a grid graph like some computer vision algorithms use?
My application is for mining. I construct a network from voxel model which is typically of size 200 * 200 * 125. For each "chosen" voxel (a voxel deemed to be used as a node in a graph), I create about 100 arcs to other relevant voxels. I my test case, I have 0.5million voxels and hence 500mil arcs. I need to find max flow from my source voxel to destination voxel. If I use STLdb, I can decide at run time when to page all property maps used in the push_relabel_max_flow algorithm. vertex and edge lists are not to be paged as STDdb supports only maps. -Your insights would be greatly appreciated. Dan _________________________________________________________________ Videos that have everyone talking! Now also in HD! http://go.microsoft.com/?linkid=9724465

On Tue, 13 Apr 2010, Dan Jiang wrote:
Thanks for the reply. Please see my reply inline.
Date: Tue, 13 Apr 2010 12:53:45 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions:
1) How do I find out space used per node and per arc?
I do not know off hand for adjacency_list; it is probably reasonable (one integer per edge, some per vertex, plus properties). However, remember that dynamically allocated vectors can have up to a 100% space overhead. I would recommend using compressed_sparse_row_graph and then changing the EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus properties; accesses will also likely be faster than for adjacency_list.
I'll try that later on.
OK. I saw your other email about that -- I'll answer it soon.
2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space?
It does not appear so, because of the residual capacities that need to be stored. If you can find an algorithm description that doesn't need reverse edges, though, it might be possible to change the algorithm to remove them.
I just read the push_relabel_max_flow algorithm, it requires reverse edge list and their capacities.
It actually doesn't need the edges AFAICT. It just uses them to look elements up in property maps, so it might be possible to work around that requirement. However, there are other memory uses in the algorithm that are likely to be bigger.
3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)?
It looks like there is a bunch of that. I see six vectors and a queue in there, and one of those vectors is a vector of std::lists.
4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
It would probably be quite a bit slower. If you need to go beyond your system's memory, why not let the OS's swap mechanism do the disk accesses? That would take care of all of the caching and such that you would need, and I'm not sure writing your own would provide much benefit.
I tried OS paging on 64-bit OS using my application. The performance decreases from 100 to 1000times and plus the paging happens only when available physical ram runs out. I'd like to be able to do partial paging on demand. I am looking into using STLdb to hold capacity properties, one for forward edge, another for reverse edge. But saving is just 2*4*NumOfArcs (I am using float for capacity), which might not be dramatic saving...
I'm surprised OS paging was so bad, but that can happen and the algorithm doesn't seem to use a lot of locality between related data structures. The 2*4*nedges thing is probably not worth doing too much work to get yet.
How much memory do you have, and what type are you using for your edge capacities? I have some ideas for how to shrink things, but I would first like to see a baseline of how bad things are now. If you switch to compressed_sparse_row_graph and try to run the algorithm (possibly on scaled-down graphs), how much memory does it use? Running that for a few data sizes might give an idea of how much actual space per vertex and edge the algorithm uses. I have 4gb physical ram, usually with 3.2gb free. I use float for edge capacity. I'll look into compressed_sparse_row_graph and see if there is much saving.
It will help you with the graph structure information and maybe the properties. Let's see what the actual usage is now and how that compares to what you have space for.
BTW, is there any regular structure in your graph? For example, is it a grid graph like some computer vision algorithms use?
My application is for mining. I construct a network from voxel model which is typically of size 200 * 200 * 125. For each "chosen" voxel (a voxel deemed to be used as a node in a graph), I create about 100 arcs to other relevant voxels. I my test case, I have 0.5million voxels and hence 500mil arcs. I need to find max flow from my source voxel to destination voxel. If I use STLdb, I can decide at run time when to page all property maps used in the push_relabel_max_flow algorithm. vertex and edge lists are not to be paged as STDdb supports only maps.
How do you define "relevant"? Are they local in terms of the grid of voxels? Is there any regular structure at all, such as linking to all neighbors within a certain distance that have a certain weight or something? What are you able to say about the conditions for which edges are put in? Can any of the data used to create edges be reused for computing capacities? -- Jeremiah Willcock

Date: Tue, 13 Apr 2010 22:31:17 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Thanks for the reply. Please see my reply inline.
Date: Tue, 13 Apr 2010 12:53:45 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions:
1) How do I find out space used per node and per arc?
I do not know off hand for adjacency_list; it is probably reasonable (one integer per edge, some per vertex, plus properties). However, remember that dynamically allocated vectors can have up to a 100% space overhead. I would recommend using compressed_sparse_row_graph and then changing the EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus properties; accesses will also likely be faster than for adjacency_list.
I'll try that later on.
OK. I saw your other email about that -- I'll answer it soon.
2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space?
It does not appear so, because of the residual capacities that need to be stored. If you can find an algorithm description that doesn't need reverse edges, though, it might be possible to change the algorithm to remove them.
I just read the push_relabel_max_flow algorithm, it requires reverse edge list and their capacities.
It actually doesn't need the edges AFAICT. It just uses them to look elements up in property maps, so it might be possible to work around that requirement. However, there are other memory uses in the algorithm that are likely to be bigger.
3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)?
It looks like there is a bunch of that. I see six vectors and a queue in there, and one of those vectors is a vector of std::lists.
4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
It would probably be quite a bit slower. If you need to go beyond your system's memory, why not let the OS's swap mechanism do the disk accesses? That would take care of all of the caching and such that you would need, and I'm not sure writing your own would provide much benefit.
I tried OS paging on 64-bit OS using my application. The performance decreases from 100 to 1000times and plus the paging happens only when available physical ram runs out. I'd like to be able to do partial paging on demand. I am looking into using STLdb to hold capacity properties, one for forward edge, another for reverse edge. But saving is just 2*4*NumOfArcs (I am using float for capacity), which might not be dramatic saving...
I'm surprised OS paging was so bad, but that can happen and the algorithm doesn't seem to use a lot of locality between related data structures. The 2*4*nedges thing is probably not worth doing too much work to get yet.
How much memory do you have, and what type are you using for your edge capacities? I have some ideas for how to shrink things, but I would first like to see a baseline of how bad things are now. If you switch to compressed_sparse_row_graph and try to run the algorithm (possibly on scaled-down graphs), how much memory does it use? Running that for a few data sizes might give an idea of how much actual space per vertex and edge the algorithm uses. I have 4gb physical ram, usually with 3.2gb free. I use float for edge capacity. I'll look into compressed_sparse_row_graph and see if there is much saving.
It will help you with the graph structure information and maybe the properties. Let's see what the actual usage is now and how that compares to what you have space for.
BTW, is there any regular structure in your graph? For example, is it a grid graph like some computer vision algorithms use?
My application is for mining. I construct a network from voxel model which is typically of size 200 * 200 * 125. For each "chosen" voxel (a voxel deemed to be used as a node in a graph), I create about 100 arcs to other relevant voxels. I my test case, I have 0.5million voxels and hence 500mil arcs. I need to find max flow from my source voxel to destination voxel. If I use STLdb, I can decide at run time when to page all property maps used in the push_relabel_max_flow algorithm. vertex and edge lists are not to be paged as STDdb supports only maps.
How do you define "relevant"? Are they local in terms of the grid of voxels? Is there any regular structure at all, such as linking to all neighbors within a certain distance that have a certain weight or something? What are you able to say about the conditions for which edges are put in? Can any of the data used to create edges be reused for computing capacities?
"relevant" is parameterized on a formula which means, # of arcs from a node is a variable whose value ranges from 5 to 150. So there is really no patten because the parameter used to compute "relavency" is user-defined. I think CSR would work better in terms of memory requirement. However, I do not know how much as push_relabel_max_flow (or anall max flow algorithms in BOOST) requires "capacity" map and "reverse_edge" map which seem not provided by the CSR graph (in fact, I do not even know how to get these two maps from a CSR if possible at all, see my reply in another email). -Thanks, Dan Jiang _________________________________________________________________ Live connected. Get Hotmail & Messenger on your phone. http://go.microsoft.com/?linkid=9724462

On Wed, 14 Apr 2010, Dan Jiang wrote: (snip)
How do you define "relevant"? Are they local in terms of the grid of voxels? Is there any regular structure at all, such as linking to all neighbors within a certain distance that have a certain weight or something? What are you able to say about the conditions for which edges are put in? Can any of the data used to create edges be reused for computing capacities?
"relevant" is parameterized on a formula which means, # of arcs from a node is a variable whose value ranges from 5 to 150. So there is really no patten because the parameter used to compute "relavency" is user-defined. I think CSR would work better in terms of memory requirement. However, I do not know how much as push_relabel_max_flow (or anall max flow algorithms in BOOST) requires "capacity" map and "reverse_edge" map which seem not provided by the CSR graph (in fact, I do not even know how to get these two maps from a CSR if possible at all, see my reply in another email).
The reason I'm asking these questions is to see if there is some property like "all relevant voxels to a particular voxel are within a certain distance", even if not all voxels within that distance are considered relevant by your criteria. That may allow some compression of the graph's topology, plus possibly make the reverse map easy to compute implicitly. Also, is there any simple formula to compute the capacities? Do those need to be stored explicitly? -- Jeremiah Willcock

Date: Thu, 15 Apr 2010 11:41:19 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Wed, 14 Apr 2010, Dan Jiang wrote:
(snip)
How do you define "relevant"? Are they local in terms of the grid of voxels? Is there any regular structure at all, such as linking to all neighbors within a certain distance that have a certain weight or something? What are you able to say about the conditions for which edges are put in? Can any of the data used to create edges be reused for computing capacities?
"relevant" is parameterized on a formula which means, # of arcs from a node is a variable whose value ranges from 5 to 150. So there is really no patten because the parameter used to compute "relavency" is user-defined. I think CSR would work better in terms of memory requirement. However, I do not know how much as push_relabel_max_flow (or anall max flow algorithms in BOOST) requires "capacity" map and "reverse_edge" map which seem not provided by the CSR graph (in fact, I do not even know how to get these two maps from a CSR if possible at all, see my reply in another email).
The reason I'm asking these questions is to see if there is some property like "all relevant voxels to a particular voxel are within a certain distance", No they are not. An arc can be constructed between neighbouring nodes and between distant nodes. It depends on a user-specified parameter. even if not all voxels within that distance are considered relevant by your criteria. That may allow some compression of the graph's topology, plus possibly make the reverse map easy to compute implicitly. Also, is there any simple formula to compute the capacities? Do those need to be stored explicitly?
Capacities for each arc are user-defined as well. I think CSR would save memory. But I do not know if the max flow algorithms can operate on them. It seems not based on the source code as CSR does not seem to provide reverse_edge map. I have another question on the max flow algorithms: why does BOOST use map to store edge capacities and reverse edges? Map requires lots of ram, cannot array of list do the trick? I also took a look at Golberg's original implementation of the push_relabel_max_flow algorithm (hipr), it seems he does not use maps at all. -Thanks, Dan Jiang _________________________________________________________________ Videos that have everyone talking! Now also in HD! http://go.microsoft.com/?linkid=9724465

On Thu, 15 Apr 2010, Dan Jiang wrote: (snip)
The reason I'm asking these questions is to see if there is some property like "all relevant voxels to a particular voxel are within a certain distance",
No they are not. An arc can be constructed between neighbouring nodes and between distant nodes. It depends on a user-specified parameter.
OK.
even if not all voxels within that distance are considered relevant by your criteria. That may allow some compression of the graph's topology, plus possibly make the reverse map easy to compute implicitly. Also, is there any simple formula to compute the capacities? Do those need to be stored explicitly?
Capacities for each arc are user-defined as well. I think CSR would save memory. But I do not know if the max flow algorithms can operate on them. It seems not based on the source code as CSR does not seem to provide reverse_edge map. I have another question on the max flow algorithms: why does BOOST use map to store edge capacities and reverse edges? Map requires lots of ram, cannot array of list do the trick? I also took a look at Golberg's original implementation of the push_relabel_max_flow algorithm (hipr), it seems he does not use maps at all.
I do not believe any of the graph types have a reverse edge map -- you need to create one. The Boost property maps are not necessarily (or usually) std::maps, which do have high overhead as you say. The usual property map created for internal use is an std::vector whose size is the number of vertices or edges in the graph, indexed using the graph's vertex or edge index maps. -- Jeremiah Willcock

Date: Thu, 15 Apr 2010 14:22:00 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Thu, 15 Apr 2010, Dan Jiang wrote:
(snip)
The reason I'm asking these questions is to see if there is some property like "all relevant voxels to a particular voxel are within a certain distance",
No they are not. An arc can be constructed between neighbouring nodes and between distant nodes. It depends on a user-specified parameter.
OK.
even if not all voxels within that distance are considered relevant by your criteria. That may allow some compression of the graph's topology, plus possibly make the reverse map easy to compute implicitly. Also, is there any simple formula to compute the capacities? Do those need to be stored explicitly?
Capacities for each arc are user-defined as well. I think CSR would save memory. But I do not know if the max flow algorithms can operate on them. It seems not based on the source code as CSR does not seem to provide reverse_edge map. I have another question on the max flow algorithms: why does BOOST use map to store edge capacities and reverse edges? Map requires lots of ram, cannot array of list do the trick? I also took a look at Golberg's original implementation of the push_relabel_max_flow algorithm (hipr), it seems he does not use maps at all.
I do not believe any of the graph types have a reverse edge map -- you need to create one. Ok, will do. The Boost property maps are not necessarily (or usually) std::maps, which do have high overhead as you say. The usual property map created for internal use is an std::vector whose size is the number of vertices or edges in the graph, indexed using the graph's vertex or edge index maps.
Wow, that is great to know about the maps. It calms me down as std::maps is really heavy weight. Dan _________________________________________________________________ Got a phone? Get Hotmail & Messenger for mobile! http://go.microsoft.com/?linkid=9724464

Hi, I am trying to use compressed_sparse_row_graph with push_relabel_max_flow. But I am stuck as a newbie to BOOST. I could not find any example using compressed_sparse_row_graph with the algorithm. Can anyone help? Here is my code with adjacency_list: typedef adjacency_list< vecS, vecS, directedS, no_property, property<edge_capacity_t, float, property<edge_residual_capacity_t, float, property<edge_reverse_t, Traits::edge_descriptor> > >, no_property, vecS > BOOST_Graph; property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g);property_map<BOOST_Graph, edge_residual_capacity_t>::type residual_capacity = get(edge_residual_capacity, g);Traits::vertex_descriptor s, Traits::vertex_descriptor t; // Code to set up s, t, capacity and reverse_capacity is omited... // Now find max flowfloat flow = push_relabel_max_flow(g, s, t); -Thanks Dan Jiang
Date: Tue, 13 Apr 2010 12:53:45 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions:
1) How do I find out space used per node and per arc?
I do not know off hand for adjacency_list; it is probably reasonable (one integer per edge, some per vertex, plus properties). However, remember that dynamically allocated vectors can have up to a 100% space overhead. I would recommend using compressed_sparse_row_graph and then changing the EdgeIndex and Vertex types to uint32_t (since you have fewer than 2^32 of each). That will use 4 * (nedges + nvertices + 1) bytes of space, plus properties; accesses will also likely be faster than for adjacency_list.
2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space?
It does not appear so, because of the residual capacities that need to be stored. If you can find an algorithm description that doesn't need reverse edges, though, it might be possible to change the algorithm to remove them.
3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)?
It looks like there is a bunch of that. I see six vectors and a queue in there, and one of those vectors is a vector of std::lists.
4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
It would probably be quite a bit slower. If you need to go beyond your system's memory, why not let the OS's swap mechanism do the disk accesses? That would take care of all of the caching and such that you would need, and I'm not sure writing your own would provide much benefit.
How much memory do you have, and what type are you using for your edge capacities? I have some ideas for how to shrink things, but I would first like to see a baseline of how bad things are now. If you switch to compressed_sparse_row_graph and try to run the algorithm (possibly on scaled-down graphs), how much memory does it use? Running that for a few data sizes might give an idea of how much actual space per vertex and edge the algorithm uses.
BTW, is there any regular structure in your graph? For example, is it a grid graph like some computer vision algorithms use?
-- Jeremiah Willcock _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_________________________________________________________________ Got a phone? Get Hotmail & Messenger for mobile! http://go.microsoft.com/?linkid=9724464

On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi, I am trying to use compressed_sparse_row_graph with push_relabel_max_flow. But I am stuck as a newbie to BOOST. I could not find any example using compressed_sparse_row_graph with the algorithm. Can anyone help? Here is my code with adjacency_list:
typedef adjacency_list< vecS, vecS, directedS, no_property, property<edge_capacity_t, float, property<edge_residual_capacity_t, float, property<edge_reverse_t, Traits::edge_descriptor> > >, no_property, vecS > BOOST_Graph;
To use the CSR graph, you will need to use bundled properties or external properties. Just create a struct for the three edge properties (one field for each property), following the instructions at <URL:https://svn.boost.org/svn/boost/trunk/libs/graph/doc/bundles.html>.
property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g);property_map<BOOST_Graph, edge_residual_capacity_t>::type residual_capacity = get(edge_residual_capacity, g);Traits::vertex_descriptor s, Traits::vertex_descriptor t;
// Code to set up s, t, capacity and reverse_capacity is omited... // Now find max flowfloat flow = push_relabel_max_flow(g, s, t);
For CSR, you'll need to have your graph structure in advance (or be able to compute it on the fly). Look at the different constructors in the documentation and see which one matches your input data structures best. Prefer the ones that take sorted inputs first, then if you can't do that use edges_are_unsorted_multi_pass, and (least preferably) the edges_are_unsorted and construct_inplace_from_sources_and_targets versions. If you don't have sorting or the ability to do multiple passes over your input, though, you will need to use the later ones to save memory (don't sort the data or copy it into a temporary buffer for multi-pass capability yourself -- CSR has various tricks to do those things more efficiently). -- Jeremiah Willcock

Date: Tue, 13 Apr 2010 22:41:07 -0400 From: jewillco@osl.iu.edu To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, 13 Apr 2010, Dan Jiang wrote:
Hi, I am trying to use compressed_sparse_row_graph with push_relabel_max_flow. But I am stuck as a newbie to BOOST. I could not find any example using compressed_sparse_row_graph with the algorithm. Can anyone help? Here is my code with adjacency_list:
typedef adjacency_list< vecS, vecS, directedS, no_property, property<edge_capacity_t, float, property<edge_residual_capacity_t, float, property<edge_reverse_t, Traits::edge_descriptor> > >, no_property, vecS > BOOST_Graph;
To use the CSR graph, you will need to use bundled properties or external properties. Just create a struct for the three edge properties (one field for each property), following the instructions at <URL:https://svn.boost.org/svn/boost/trunk/libs/graph/doc/bundles.html>.
Ok, I followed that, here is my code:struct EdgeProp{ float capacity; float residual_capacity;};typedef compressed_sparse_row_graph<directedS, void, //property<vertex_index_t, DWORD32>, do I need this? EdgeProp, no_property, // Graph Property DWORD32, // Vertex Type DWORD32> // EdgeIndex Type BOOST_Graph; then I could set the capacity like so: typedef std::pair<int, int> E; E *theEdges = new E [nEdges]; EdgeProp *edgeProps = new EdgeProp [nEdges]; for (int i=0; i<nEdges/2; i++){ E e = E(from, to); E re = E(to, from); theEdges[i-1] = e; theEdges[i] = re; edgeProps[i-1].capacity = temp.cap; edgeProps[i].capacity = 0; }
property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g);property_map<BOOST_Graph, edge_residual_capacity_t>::type residual_capacity = get(edge_residual_capacity, g);Traits::vertex_descriptor s, Traits::vertex_descriptor t; Then need to get "capacity" map and "reverse_edge" map like below as max flow algorithm depends on them: BOOST_Graph g(edges_are_unsorted, &theEdges[0], &theEdges[0] + nEdges, &edgeProps[0], nVertices); property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g); property_map<BOOST_Graph, edge_reverse_t>::type rev = get(edge_reverse, g); However, I get compile error: 'boost::get' : none of the 2 overloads could convert all the argument types Can you give me an example how to get these two property maps for CSR graph?
// Code to set up s, t, capacity and reverse_capacity is omited... // Now find max flowfloat flow = push_relabel_max_flow(g, s, t);
For CSR, you'll need to have your graph structure in advance (or be able to compute it on the fly). Look at the different constructors in the documentation and see which one matches your input data structures best. Prefer the ones that take sorted inputs first, then if you can't do that use edges_are_unsorted_multi_pass, and (least preferably) the edges_are_unsorted and construct_inplace_from_sources_and_targets versions. If you don't have sorting or the ability to do multiple passes over your input, though, you will need to use the later ones to save memory (don't sort the data or copy it into a temporary buffer for multi-pass capability yourself -- CSR has various tricks to do those things more efficiently).
As shown in the above code, I use edges_are_unsorted for now and will use edges_are_sorted later once I get those two property maps working. Dan Jiang _________________________________________________________________ Videos that have everyone talking! Now also in HD! http://go.microsoft.com/?linkid=9724465

On Wed, 14 Apr 2010, Dan Jiang wrote: (snip)
To use the CSR graph, you will need to use bundled properties or external properties. Just create a struct for the three edge properties (one field for each property), following the instructions at <URL:https://svn.boost.org/svn/boost/trunk/libs/graph/doc/bundles.html>.
Ok, I followed that, here is my code: struct EdgeProp{ float capacity; float residual_capacity; }; typedef compressed_sparse_row_graph<directedS, void, //property<vertex_index_t, DWORD32>, do I need this? EdgeProp, no_property, // Graph Property DWORD32, // Vertex Type DWORD32> // EdgeIndex Type BOOST_Graph;
then I could set the capacity like so: typedef std::pair<int, int> E; E *theEdges = new E [nEdges]; EdgeProp *edgeProps = new EdgeProp [nEdges]; for (int i=0; i<nEdges/2; i++){ E e = E(from, to); E re = E(to, from); theEdges[i-1] = e; theEdges[i] = re; edgeProps[i-1].capacity = temp.cap; edgeProps[i].capacity = 0; }
property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g);property_map<BOOST_Graph, edge_residual_capacity_t>::type residual_capacity = get(edge_residual_capacity, g);Traits::vertex_descriptor s, Traits::vertex_descriptor t; Then need to get "capacity" map and "reverse_edge" map like below as max flow algorithm depends on them:
BOOST_Graph g(edges_are_unsorted, &theEdges[0], &theEdges[0] + nEdges, &edgeProps[0], nVertices); property_map<BOOST_Graph, edge_capacity_t>::type capacity = get(edge_capacity, g); property_map<BOOST_Graph, edge_reverse_t>::type rev = get(edge_reverse, g); However, I get compile error: 'boost::get' : none of the 2 overloads could convert all the argument types Can you give me an example how to get these two property maps for CSR graph?
Now that you're using bundled properties, you need to use the syntax in the URL I sent above. In particular, you need: property_map<BOOST_Graph, &EdgeProp::float>::type capacity = get(&EdgeProp::capacity, g); and similar code for the residual map. You will also need to add an edge reverse map to EdgeProp and use a similar statement to get it as a property map.
For CSR, you'll need to have your graph structure in advance (or be able to compute it on the fly). Look at the different constructors in the documentation and see which one matches your input data structures best. Prefer the ones that take sorted inputs first, then if you can't do that use edges_are_unsorted_multi_pass, and (least preferably) the edges_are_unsorted and construct_inplace_from_sources_and_targets versions. If you don't have sorting or the ability to do multiple passes over your input, though, you will need to use the later ones to save memory (don't sort the data or copy it into a temporary buffer for multi-pass capability yourself -- CSR has various tricks to do those things more efficiently).
As shown in the above code, I use edges_are_unsorted for now and will use edges_are_sorted later once I get those two property maps working.
OK. -- Jeremiah Willcock PS: Could you please format your code so that it is marked as keeping the literal spacing or something? When I get it everything is jammed onto one line making it hard to read. I am reformatting it in my reply here, but that is inconvenient.

On Tue, Apr 13, 2010 at 6:06 AM, Dan Jiang <danjiang@live.ca> wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions: 1) How do I find out space used per node and per arc? 2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space? 3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)? 4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
Dear Dan, I have implemented a set of prototype graph classes that use PostgreSQL to store edge lists. The graph classes do not store data, only prepared queries. The code is at https://code.launchpad.net/postgraph I would actually love some help with this code. Implementing your own graph classes in the BGL is I found non-trivial. I never got push-relabel max-flow to compile with my classes. I probably need to implement concept checking. Unfortunately, I've largely run out of time to work on this. If anyone is interested in the code, I will be happy to assist their efforts in anyway I can. I've looked at push-relabel max-flow a lot and I can say the implementation is not particularly space efficient. One of the issues is that you have to store both forward and backward edges in the graph. That is how the algorithm was designed, but I think one could modify it to only use a single edge with forward and backward capacity maps. Cheers, Tim
Thanks, Dan Jiang
_________________________________________________________________ Hotmail & Messenger. Get them on your phone now. http://go.microsoft.com/?linkid=9724463 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Timothy H. Keitt http://www.keittlab.org/

I've looked at push-relabel max-flow a lot and I can say the implementation is not particularly space efficient. One of the issues is that you have to store both forward and backward edges in the graph. That is how the algorithm was designed, but I think one could modify it to only use a single edge with forward and backward capacity maps.
Actually, you probably don't even need two maps -- the flow is skew-symmetric, so the flow itself doesn't need to be stored in both directions. The capacities might differ, though (although one of them is zero in Dan's code), and the residuals may need to be stored for both directions. -- Jeremiah Willcock

On Thu, 15 Apr 2010, Tim Keitt wrote: (snip)
Dear Dan,
I have implemented a set of prototype graph classes that use PostgreSQL to store edge lists. The graph classes do not store data, only prepared queries. The code is at https://code.launchpad.net/postgraph
I would actually love some help with this code. Implementing your own graph classes in the BGL is I found non-trivial. I never got push-relabel max-flow to compile with my classes. I probably need to implement concept checking. Unfortunately, I've largely run out of time to work on this. If anyone is interested in the code, I will be happy to assist their efforts in anyway I can.
Is there a particular test program that you have that doesn't compile? I can probably take a look at it and see what the issue is. -- Jeremiah Willcock

On Thu, Apr 15, 2010 at 10:53 AM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote:
On Thu, 15 Apr 2010, Tim Keitt wrote:
(snip)
Dear Dan,
I have implemented a set of prototype graph classes that use PostgreSQL to store edge lists. The graph classes do not store data, only prepared queries. The code is at https://code.launchpad.net/postgraph
I would actually love some help with this code. Implementing your own graph classes in the BGL is I found non-trivial. I never got push-relabel max-flow to compile with my classes. I probably need to implement concept checking. Unfortunately, I've largely run out of time to work on this. If anyone is interested in the code, I will be happy to assist their efforts in anyway I can.
Is there a particular test program that you have that doesn't compile? I can probably take a look at it and see what the issue is.
OK. 30 pages of gcc template errors coming your way... he he Seriously, though _thanks_ for the offer. I'll see if I can find a moment to reconstruct the error. Cheers, Tim
-- Jeremiah Willcock _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Timothy H. Keitt http://www.keittlab.org/

Hello Tim, Thanks for the offering. You project looks very interesting. For now, I am trying to get CSR graph working with push_relabel_max_flow. Failing that, I'll look into using your library. I have a feeling, even if I get it working, I may still need to go with a database based approach eventually. You said you never got push_relabel_max_flow working for your custom graph. Is your graph based on CSR or adjaceny_list? The problem I am facing with CSR is lack of a simple working example for setting with push_relabel_max_flow. I have been at it for 4 days straight and still fighting with it. I have been a long time C++ developer with ATL/COM and hence template to me is not a new thing, but I found learning curve with BOOST is still pretty steep due to lack of examples in some areas. (Do not get me wrong, it is an excellent library) Cheers, Dan
Date: Thu, 15 Apr 2010 10:48:48 -0500 From: tkeitt@gmail.com To: boost@lists.boost.org Subject: Re: [boost] space complexity of push_relabel_max_flow
On Tue, Apr 13, 2010 at 6:06 AM, Dan Jiang <danjiang@live.ca> wrote:
Hi All, I am a new BOOST user and like it very much. I have a question regarding space complexity of push_relabel_max_flow. The push_relabel_max_flow is very efficient compared to Edmond's and Karp. In my application, I need to deal with half a million nodes and half a billion arcs. It requires lots of ram (>= 6gb). Two questions: 1) How do I find out space used per node and per arc? 2) My network does not need any reverse edge (currently I set the capacity of reverse edge to 0), is there anyway to not supply reverse edges to save space? 3) Does push_relabel_algorithm dynamically allocates lots of memory (besides the ram required by the input graph, i.e., adjacency_list)? 4) Is it possible to implement custom file-based edge lists so that I could explicitly page to disk when not enough ram is available? How slower would it be?
Dear Dan,
I have implemented a set of prototype graph classes that use PostgreSQL to store edge lists. The graph classes do not store data, only prepared queries. The code is at https://code.launchpad.net/postgraph
I would actually love some help with this code. Implementing your own graph classes in the BGL is I found non-trivial. I never got push-relabel max-flow to compile with my classes. I probably need to implement concept checking. Unfortunately, I've largely run out of time to work on this. If anyone is interested in the code, I will be happy to assist their efforts in anyway I can.
I've looked at push-relabel max-flow a lot and I can say the implementation is not particularly space efficient. One of the issues is that you have to store both forward and backward edges in the graph. That is how the algorithm was designed, but I think one could modify it to only use a single edge with forward and backward capacity maps.
Cheers, Tim
Thanks, Dan Jiang
_________________________________________________________________ Hotmail & Messenger. Get them on your phone now. http://go.microsoft.com/?linkid=9724463 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Timothy H. Keitt http://www.keittlab.org/ _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_________________________________________________________________ Videos that have everyone talking! Now also in HD! http://go.microsoft.com/?linkid=9724465
participants (3)
-
Dan Jiang
-
Jeremiah Willcock
-
Tim Keitt