CSR graph and Dijkstra's to lower graph memory
Hello, This is my first post and I'm very new to BOOST, so please let me know if I have missed some of the mailing list etiquette :) When I run Dijkstra's on my graph, I quickly run out of memory and I'm trying to find ways to lower the memory needed to store the graph. My graphs has around ~3,600,000 nodes and ~580,000,000 edges (I will eventually need a graph at least 5x this number - more if possible). I only store a weight (double) at each edge (I don't need to store any information at each node) and find that while this only takes about 1.5 hrs to run, this takes up around ~95GB (I have access to a cluster so I can run these high memory jobs). I use an adjacency_list to store this graph. typedef adjacency_list < setS, vecS, undirectedS, no_property, property < edge_weight_t, double > > graph_t; My two questions are: 1) Is this a reasonable amount of memory to expect a graph of this size using an adjacency_list? Or am I doing some terribly wrong? This is much more memory needed than I expected. 2) From what I understand, the compressed sparse row (CSR) graph is a possible solution to lowering the memory requirements as I know the number of edges and the number of nodes prior to forming the graph. I've been going through the documentation and I followed this tread <http://boost.2283326.n4.nabble.com/Graph-On-Dijkstra-SSSP-performance-td4446749.html> which had an example of using a CSR instead of an adjacency_list and running Dijkstra's. However I can't seem to get the last step (last reply by Jeremiah Willcock) of
"plus adding the weight map parameter explicitly to the call to dijkstra_shortest_paths_no_color_map" to work successfully.
I guess I'm not sure how to add the weigh map parameter to the Dijkstra's call. The following is the relevant code dealing with edge weights. ======= ... struct Edge_Cost { double weight;}; // Needed for bundled properties. Edge_Cost *edgeCosts = new Edge_Cost[2]; // Add 3 edges costs. edgeCosts[0].weight = (2.0); edgeCosts[1].weight = (5.2); edgeCosts[2].weight =(3.33); ... typedef compressed_sparse_row_graph<directedS, WebPage, // From the example docs on CSR Edge_Cost // The struct.
graph_t;
graph_t g(boost::edges_are_unsorted, &the_edges[0], &the_edges[0] + numEdges, edgeCosts, // Add the weights here??? numNodes); ... boost::dijkstra_shortest_paths(g, // The graph. s, // VertexDescriptor - the source vertex. &p[0], &d[0], edgeCosts, // <--- WHAT DO I PUT HERE?? indexmap, std::less<int>(), closed_plus<int>(), (std::numeric_limits<int>::max)(), 0, default_dijkstra_visitor() ); ======= When I run this, I get several errors, the first is: error C2664: 'boost::get' : cannot convert parameter 2 from 'boost::detail::csr_edge_descriptor<Vertex,EdgeIndex>' to 'ptrdiff_t' c:\boost_1_49_0\boost_1_49_0\boost\graph\dijkstra_shortest_paths.hpp 162 I might be totally off here so please let me know :) Thanks! - Jeremy -- View this message in context: http://boost.2283326.n4.nabble.com/CSR-graph-and-Dijkstra-s-to-lower-graph-m... Sent from the Boost - Users mailing list archive at Nabble.com.
hi Jeremy, I'm the OP of that topic. With a great help of Jeremiah WillCock (thank you Jeremiah, you are a great and very patient man :) ), I could finally obtain an working case with CSR graphs (a great amount of messages were sent in private, so they do not appear in the list). Here's an example. Note that I don't provide the the class obj_Mesh, since the only variables I use from it are used to compute the number of edges and the cost of each edge. You can replace these info by your own. Note that I'm a newbie boost user, so don't take it as the best of the use cases, it is just an working case. I hope it helps you. cheers, leo ----------------------------------------------------------------------------------------------------------------- #include <boost/graph/adjacency_list.hpp> #include <boost/graph/compressed_sparse_row_graph.hpp> #include <boost/graph/dijkstra_shortest_paths_no_color_map.hpp> class obj_Mesh; class BoostGraph { typedef std::pair < int, int > Edge; struct Edge_Cost { double weight; }; struct vertex_data { boost::graph_traits< boost::compressed_sparse_row_graph< boost::directedS >
::vertex_descriptor p;
double d; }; typedef boost::compressed_sparse_row_graph< boost::directedS, vertex_data, Edge_Cost > graph_t; typedef boost::graph_traits < graph_t >::vertex_descriptor vertex_descriptor; graph_t* Graph; public: std::vector<vertex_descriptor> p;//path std::vector<double> d; //distance BoostGraph(obj_Mesh* mesh) { int numb_edges = mesh->NbOfVertices + mesh->NbOfTriangles - 2; numb_edges *= 2; //for directed graph we add the edge twice (both senses) Edge* edge_list = new Edge [numb_edges]; Edge* ptr_edge_list = edge_list; Edge_Cost* edge_weight_list = new Edge_Cost[numb_edges]; Edge_Cost* ptr_edge_weight_list = edge_weight_list; //compute the cost of the edges for (int i = 0; i < mesh->NbOfVertices; ++i) { for (int j = 0; j < mesh->nbOf_Neigh_Vertices[i]; ++j) { int i_neigh = mesh->Neigh_Vertices[i][j]; if (i < i_neigh)//this if is to prevent an edge to be added twice, { //since i_neigh is presented into the list of i and vice-versa *ptr_edge_list++ = Edge(i, i_neigh); (ptr_edge_weight_list++)->weight = sqrt(obj_Mesh::Distance_Sqr_V3(mesh->vertices+3*i, mesh->vertices+3*i_neigh)); *ptr_edge_list++ = Edge(i_neigh, i); (ptr_edge_weight_list++)->weight = sqrt(obj_Mesh::Distance_Sqr_V3(mesh->vertices+3*i, mesh->vertices+3*i_neigh)); } } } Graph = new graph_t( boost::edges_are_unsorted, edge_list, ptr_edge_list, edge_weight_list, mesh->NbOfVertices); delete [] edge_list; delete [] edge_weight_list; this->p.assign(mesh->NbOfVertices, 0); this->d.assign(mesh->NbOfVertices, 0); } void SSSP(int source_idx) { boost::dijkstra_shortest_paths_no_color_map (*Graph, source_idx, boost::predecessor_map(&p[0]). distance_map(&d[0]). weight_map(boost::get(&Edge_Cost::weight, *Graph)) ); } ~BoostGraph() { if (Graph) delete Graph; } }; Le 21/09/2012 23:19, jer.k a écrit :
Hello,
This is my first post and I'm very new to BOOST, so please let me know if I have missed some of the mailing list etiquette :)
When I run Dijkstra's on my graph, I quickly run out of memory and I'm trying to find ways to lower the memory needed to store the graph.
My graphs has around ~3,600,000 nodes and ~580,000,000 edges (I will eventually need a graph at least 5x this number - more if possible). I only store a weight (double) at each edge (I don't need to store any information at each node) and find that while this only takes about 1.5 hrs to run, this takes up around ~95GB (I have access to a cluster so I can run these high memory jobs). I use an adjacency_list to store this graph.
typedef adjacency_list < setS, vecS, undirectedS, no_property, property < edge_weight_t, double > > graph_t;
My two questions are: 1) Is this a reasonable amount of memory to expect a graph of this size using an adjacency_list? Or am I doing some terribly wrong? This is much more memory needed than I expected.
2) From what I understand, the compressed sparse row (CSR) graph is a possible solution to lowering the memory requirements as I know the number of edges and the number of nodes prior to forming the graph.
I've been going through the documentation and I followed this tread <http://boost.2283326.n4.nabble.com/Graph-On-Dijkstra-SSSP-performance-td4446749.html> which had an example of using a CSR instead of an adjacency_list and running Dijkstra's. However I can't seem to get the last step (last reply by Jeremiah Willcock) of
"plus adding the weight map parameter explicitly to the call to dijkstra_shortest_paths_no_color_map" to work successfully.
I guess I'm not sure how to add the weigh map parameter to the Dijkstra's call. The following is the relevant code dealing with edge weights. ======= ... struct Edge_Cost { double weight;}; // Needed for bundled properties.
Edge_Cost *edgeCosts = new Edge_Cost[2]; // Add 3 edges costs. edgeCosts[0].weight = (2.0); edgeCosts[1].weight = (5.2); edgeCosts[2].weight =(3.33); ... typedef compressed_sparse_row_graph<directedS, WebPage, // From the example docs on CSR Edge_Cost // The struct.
graph_t; graph_t g(boost::edges_are_unsorted, &the_edges[0], &the_edges[0] + numEdges, edgeCosts, // Add the weights here??? numNodes); ... boost::dijkstra_shortest_paths(g, // The graph. s, // VertexDescriptor - the source vertex. &p[0], &d[0], edgeCosts, // <--- WHAT DO I PUT HERE?? indexmap, std::less<int>(), closed_plus<int>(), (std::numeric_limits<int>::max)(), 0, default_dijkstra_visitor() ); ======= When I run this, I get several errors, the first is: error C2664: 'boost::get' : cannot convert parameter 2 from 'boost::detail::csr_edge_descriptor<Vertex,EdgeIndex>' to 'ptrdiff_t' c:\boost_1_49_0\boost_1_49_0\boost\graph\dijkstra_shortest_paths.hpp 162
I might be totally off here so please let me know :)
Thanks! - Jeremy
-- View this message in context: http://boost.2283326.n4.nabble.com/CSR-graph-and-Dijkstra-s-to-lower-graph-m... Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Sat, 22 Sep 2012, Leo Hidd wrote:
hi Jeremy,
I'm the OP of that topic. With a great help of Jeremiah WillCock (thank you Jeremiah, you are a great and very patient man :) ), I could finally obtain an working case with CSR graphs (a great amount of messages were sent in private, so they do not appear in the list). Here's an example. Note that I don't provide the the class obj_Mesh, since the only variables I use from it are used to compute the number of edges and the cost of each edge. You can replace these info by your own.
The one thing I would change about the example is not to use &d[0] and &p[0] but to use an iterator_property_map instead (see <URL:http://comments.gmane.org/gmane.comp.lib.boost.user/67769>); using raw pointers has caused problems for various users in the past. -- Jeremiah Willcock
On Sat, Sep 22, 2012 at 11:51 AM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:
On Sat, 22 Sep 2012, Leo Hidd wrote:
I'm the OP of that topic. With a great help of Jeremiah WillCock (thank
you Jeremiah, you are a great and very patient man :) ), I could finally obtain an working case with CSR graphs (a great amount of messages were sent in private, so they do not appear in the list). Here's an example. Note that I don't provide the the class obj_Mesh, since the only variables I use from it are used to compute the number of edges and the cost of each edge. You can replace these info by your own.
The one thing I would change about the example is not to use &d[0] and &p[0] but to use an iterator_property_map instead (see <URL: http://comments.gmane.**org/gmane.comp.lib.boost.user/**67769<http://comments.gmane.org/gmane.comp.lib.boost.user/67769>>); using raw pointers has caused problems for various users in the past.
Thank you so much Jeremiah and Leo! I still need to implement Jeremiah's suggestion but the initial results look very positive. In case it is of interest to others, after I changed to a CSR graph instead of an adjacency_list, and used the "boost::dijkstra_shortest_paths _no_color_map" instead of the "boost::dijkstra_shortest_paths", both my running time and space requirements significantly dropped. For a graph with ~3,600,000 nodes and ~580,000,000 edges, after making these changes, my running time went down from about 1.5 hrs to 20 mins and my space requirements went from 95GB to 35 GB. Thanks again! - Jeremy
On Sat, 22 Sep 2012, Jeremy Kawahara wrote:
On Sat, Sep 22, 2012 at 11:51 AM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote: On Sat, 22 Sep 2012, Leo Hidd wrote:
I'm the OP of that topic. With a great help of Jeremiah WillCock (thank you Jeremiah, you are a great and very patient man :) ), I could finally obtain an working case with CSR graphs (a great amount of messages were sent in private, so they do not appear in the list). Here's an example. Note that I don't provide the the class obj_Mesh, since the only variables I use from it are used to compute the number of edges and the cost of each edge. You can replace these info by your own.
The one thing I would change about the example is not to use &d[0] and &p[0] but to use an iterator_property_map instead (see <URL:http://comments.gmane.org/gmane.comp.lib.boost.user/67769>); using raw pointers has caused problems for various users in the past.
Thank you so much Jeremiah and Leo! I still need to implement Jeremiah's suggestion but the initial results look very positive.
In case it is of interest to others, after I changed to a CSR graph instead of an adjacency_list, and used the "boost::dijkstra_shortest_paths _no_color_map" instead of the "boost::dijkstra_shortest_paths", both my running time and space requirements significantly dropped.
For a graph with ~3,600,000 nodes and ~580,000,000 edges, after making these changes, my running time went down from about 1.5 hrs to 20 mins and my space requirements went from 95GB to 35 GB.
If you know that you are going to have fewer than 2^32 vertices and/or 2^32 edges, you can also turn down the sizes of the integers used as indices in the graph's representation; see the documentation for details. On a 64-bit machine, they both default to 64 bits, but you can save a lot of storage by reducing their sizes. -- Jeremiah Willcock
and, since you are in a cluster, you could also try the distributed (parallel) version of SSSP. It will probably boost your running time. About your memory usage, let's see. You have ~580,000,000 edges, 2 fixed point of 4 bytes (vertex indices) plus one floating point of 8 bytes (edge cost) per edge, totalizing ~9.3 GB. If your graph is undirected, but if you use a directed CSR, than you multiply this by 2 (to include both senses), totalizing ~19GB. Now if your 35GB takes into account only the Boost graph size, it would seem a little too much, however if you also consider your own structure to load your information on RAM, it is just fine. leo Le 23/09/2012 18:23, Jeremiah Willcock a écrit :
On Sat, 22 Sep 2012, Jeremy Kawahara wrote:
On Sat, Sep 22, 2012 at 11:51 AM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote: On Sat, 22 Sep 2012, Leo Hidd wrote:
I'm the OP of that topic. With a great help of Jeremiah WillCock (thank you Jeremiah, you are a great and very patient man :) ), I could finally obtain an working case with CSR graphs (a great amount of messages were sent in private, so they do not appear in the list). Here's an example. Note that I don't provide the the class obj_Mesh, since the only variables I use from it are used to compute the number of edges and the cost of each edge. You can replace these info by your own.
The one thing I would change about the example is not to use &d[0] and &p[0] but to use an iterator_property_map instead (see <URL:http://comments.gmane.org/gmane.comp.lib.boost.user/67769>); using raw pointers has caused problems for various users in the past.
Thank you so much Jeremiah and Leo! I still need to implement Jeremiah's suggestion but the initial results look very positive.
In case it is of interest to others, after I changed to a CSR graph instead of an adjacency_list, and used the "boost::dijkstra_shortest_paths _no_color_map" instead of the "boost::dijkstra_shortest_paths", both my running time and space requirements significantly dropped.
For a graph with ~3,600,000 nodes and ~580,000,000 edges, after making these changes, my running time went down from about 1.5 hrs to 20 mins and my space requirements went from 95GB to 35 GB.
If you know that you are going to have fewer than 2^32 vertices and/or 2^32 edges, you can also turn down the sizes of the integers used as indices in the graph's representation; see the documentation for details. On a 64-bit machine, they both default to 64 bits, but you can save a lot of storage by reducing their sizes.
-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hello, Sorry for such a late response - I was pulled to some other tasks. The suggestions to lower the needed memory helped immensely. Thank you :) I am wondering if there are still more optimizations I could look at to further lower the required memory. A better description can be found below... On Sun, Sep 23, 2012 at 12:29 PM, Leo Hidd <leohidd@gmail.com> wrote:
and, since you are in a cluster, you could also try the distributed (parallel) version of SSSP. It will probably boost your running time.
Yes thanks! This is a good point. I might try this later once I get all the space issues sorted out.
About your memory usage, let's see. You have ~580,000,000 edges, 2 fixed point of 4 bytes (vertex indices) plus one floating point of 8 bytes (edge cost) per edge, totalizing ~9.3 GB. If your graph is undirected, but if you use a directed CSR, than you multiply this by 2 (to include both senses), totalizing ~19GB. Now if your 35GB takes into account only the Boost graph size, it would seem a little too much, however if you also consider your own structure to load your information on RAM, it is just fine.
Sorry I made a bit of an error in my original calculations here. However I'm curious about your estimate of the amount of needed space. It seems to me that we need to store the following arrays: (here I assume that the total number of vertices is less than 2^32 (4 bytes) but the total number of edges will likely be greater than 2^32 (8 bytes). An "edge array" that stores the vertex indices. 2 vertices per edge at 4 bytes each An "edge cost" array the stores the cost at each edge. 1 edge cost per edge at 4 bytes (say we use a float) each These arguments are than passed to the CSR constructor. I THINK the CSR then sorts and copies these values into an internal structure (I'm basing this off the documentation<http://www.boost.org/doc/libs/1_51_0/libs/graph/doc/compressed_sparse_row.html>which states that "The CSR format stores vertices and edges in separate arrays" - but perhaps I misunderstood it). So I think we need to have enough memory to store the "edge array", the "edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes + the_number_of_nodes x 8 bytes) all at the same time. Is my understanding of this correct? If so, I'm wondering if there is a way we can somehow lower this memory usage where we only store these structures once? I looked at using the "edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a significant difference.
Le 23/09/2012 18:23, Jeremiah Willcock a écrit :
If you know that you are going to have fewer than 2^32 vertices and/or 2^32 edges, you can also turn down the sizes of the integers used as indices in the graph's representation; see the documentation for details. On a 64-bit machine, they both default to 64 bits, but you can save a lot of storage by reducing their sizes.
Thanks! This did help quite a bit with the space requirements. For the reference of others, I added the last "unsigned int" parameter to the call below since I expect the number of nodes to be less than 2^32 but the number of edges will likely be more. typedef compressed_sparse_row_graph<directedS, void, Edge_Cost,boost::no_property, unsigned int> graph_t; -- Jeremy
On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
Hello,
Sorry for such a late response - I was pulled to some other tasks. The suggestions to lower the needed memory helped immensely. Thank you :) I am wondering if there are still more optimizations I could look at to further lower the required memory.
A better description can be found below...
On Sun, Sep 23, 2012 at 12:29 PM, Leo Hidd <leohidd@gmail.com> wrote: and, since you are in a cluster, you could also try the distributed (parallel) version of SSSP. It will probably boost your running time.
Yes thanks! This is a good point. I might try this later once I get all the space issues sorted out.
About your memory usage, let's see. You have ~580,000,000 edges, 2 fixed point of 4 bytes (vertex indices) plus one floating point of 8 bytes (edge cost) per edge, totalizing ~9.3 GB. If your graph is undirected, but if you use a directed CSR, than you multiply this by 2 (to include both senses), totalizing ~19GB. Now if your 35GB takes into account only the Boost graph size, it would seem a little too much, however if you also consider your own structure to load your information on RAM, it is just fine.
Sorry I made a bit of an error in my original calculations here. However I'm curious about your estimate of the amount of needed space.
It seems to me that we need to store the following arrays: (here I assume that the total number of vertices is less than 2^32 (4 bytes) but the total number of edges will likely be greater than 2^32 (8 bytes).
An "edge array" that stores the vertex indices. 2 vertices per edge at 4 bytes each
Usually (for the directed case where incoming edges aren't stored) there is only one vertex (the target) stored per edge.
An "edge cost" array the stores the cost at each edge. 1 edge cost per edge at 4 bytes (say we use a float) each
Yes.
These arguments are than passed to the CSR constructor. I THINK the CSR then sorts and copies these values into an internal structure (I'm basing this off the documentation which states that "The CSR format stores vertices and edges in separate arrays" - but perhaps I misunderstood it).
The CSR graph takes roughly (nvertices + 1) * sizeof(EdgeIndex) + nedges * sizeof(VertexIndex) bytes for a directed graph, excluding properties.
So I think we need to have enough memory to store the "edge array", the "edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes + the_number_of_nodes x 8 bytes) all at the same time.
Is my understanding of this correct?
Yes, except that you need another copy of the edge properties that are sorted along with the edges.
If so, I'm wondering if there is a way we can somehow lower this memory usage where we only store these structures once? I looked at using the "edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a significant difference.
There is an in-place version (construct_inplace_from_sources_and_targets_t) that saves memory by requiring particular input layouts, but it is also slower. You should always use the _multi_pass version when you can since the other one does a copy of the data first. How much more memory are you looking to save? Are you hitting an out-of-memory condition on graphs above a certain size?
Le 23/09/2012 18:23, Jeremiah Willcock a écrit : If you know that you are going to have fewer than 2^32 vertices and/or 2^32 edges, you can also turn down the sizes of the integers used as indices in the graph's representation; see the documentation for details. On a 64-bit machine, they both default to 64 bits, but you can save a lot of storage by reducing their sizes.
Thanks! This did help quite a bit with the space requirements.
For the reference of others, I added the last "unsigned int" parameter to the call below since I expect the number of nodes to be less than 2^32 but the number of edges will likely be more.
typedef compressed_sparse_row_graph<directedS, void, Edge_Cost,boost::no_property, unsigned int> graph_t;
Yes, that is how you want to do it in that case. You might want to use uint32_t instead to get more control over the exact data size. -- Jeremiah Willcock
Hello, On Mon, Oct 8, 2012 at 1:55 PM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:
On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
It seems to me that we need to store the following arrays: (here I assume
that the total number of vertices is less than 2^32 (4 bytes) but the total number of edges will likely be greater than 2^32 (8 bytes).
An "edge array" that stores the vertex indices. 2 vertices per edge at 4 bytes each
Usually (for the directed case where incoming edges aren't stored) there is only one vertex (the target) stored per edge.
I see. I was working from Leo Hidd's earlier example where we create a list of edges... *ptr_edge_list++ = Edge(i_neigh, i); and from the CSR documentation<http://www.boost.org/doc/libs/1_51_0/libs/graph/example/csr-example.cpp> : E the_edges[] = { E(0, 1), E(0, 2), ... I'm not quite sure how to change the edge list to store only the target vertex but I'll look into this some more as this would result in quite a bit more space savings.
An "edge cost" array the stores the cost at each edge.
1 edge cost per edge at 4 bytes (say we use a float) each
Yes.
These arguments are than passed to the CSR constructor. I THINK the CSR
then sorts and copies these values into an internal structure (I'm basing this off the documentation which states that "The CSR format stores vertices and edges in separate arrays" - but perhaps I misunderstood it).
The CSR graph takes roughly (nvertices + 1) * sizeof(EdgeIndex) + nedges * sizeof(VertexIndex) bytes for a directed graph, excluding properties.
So I think we need to have enough memory to store the "edge array", the
"edge cost array" AND the newly created CSR (the_number_of_edges x 4 bytes + the_number_of_nodes x 8 bytes) all at the same time.
Is my understanding of this correct?
Yes, except that you need another copy of the edge properties that are sorted along with the edges.
Right, so I think this would be the memory needed to store the "edge cost array", which would equal the_number_of_edges x 4 bytes.
If so, I'm wondering if there is a way we can somehow lower this memory
usage where we only store these structures once? I looked at using the "edges_are_unsorted_multi_**pass" parameter but the memory differences did not seem to make a significant difference.
There is an in-place version (construct_inplace_from_**sources_and_targets_t) that saves memory by requiring particular input layouts, but it is also slower. You should always use the _multi_pass version when you can since the other one does a copy of the data first.
Alright thanks! The construct_inplace_from _sources_and_targets_t sounds like it might do the trick.
How much more memory are you looking to save? Are you hitting an out-of-memory condition on graphs above a certain size?
Yes on some of my graphs I'm running out of memory. I'm really looking at getting it down to as low as possible since the lower I get it the more data I can run my graphs on. For example, one graph this crashed on had: 37,044,000 nodes 6,328,936,480 edges I thought that if I can use the same memory to load the data and create the CSR, then I would approximately half my memory requirements. I think if I find this not be enough savings, then I am probably hitting some theoretical limits and would have to look at taking a different approach. For the reference of others, I added the last "unsigned int" parameter to
the call below since I expect the number of nodes to be less than 2^32 but the number of edges will likely be more.
typedef compressed_sparse_row_graph<**directedS, void, Edge_Cost,boost::no_property, unsigned int> graph_t;
Yes, that is how you want to do it in that case. You might want to use uint32_t instead to get more control over the exact data size.
Alright thanks! - Jeremy
On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
Hello,
On Mon, Oct 8, 2012 at 1:55 PM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote: On Mon, 8 Oct 2012, Jeremy Kawahara wrote:
It seems to me that we need to store the following arrays: (here I assume that the total number of vertices is less than 2^32 (4 bytes) but the total number of edges will likely be greater than 2^32 (8 bytes).
An "edge array" that stores the vertex indices. 2 vertices per edge at 4 bytes each
Usually (for the directed case where incoming edges aren't stored) there is only one vertex (the target) stored per edge.
I see. I was working from Leo Hidd's earlier example where we create a list of edges... *ptr_edge_list++ = Edge(i_neigh, i);
and from the CSR documentation: E the_edges[] = { E(0, 1), E(0, 2), ... I'm not quite sure how to change the edge list to store only the target vertex but I'll look into this some more as this would result in quite a bit more space savings.
I meant that the CSR graph representation stores only targets explicitly, not that your input data should do that.
Right, so I think this would be the memory needed to store the "edge cost array", which would equal the_number_of_edges x 4 bytes.
Yes.
If so, I'm wondering if there is a way we can somehow lower this memory usage where we only store these structures once? I looked at using the "edges_are_unsorted_multi_pass" parameter but the memory differences did not seem to make a significant difference.
There is an in-place version (construct_inplace_from_sources_and_targets_t) that saves memory by requiring particular input layouts, but it is also slower. You should always use the _multi_pass version when you can since the other one does a copy of the data first.
Alright thanks! The construct_inplace_from _sources_and_targets_t sounds like it might do the trick.
How much more memory are you looking to save? Are you hitting an out-of-memory condition on graphs above a certain size?
Yes on some of my graphs I'm running out of memory. I'm really looking at getting it down to as low as possible since the lower I get it the more data I can run my graphs on. For example, one graph this crashed on had:
37,044,000 nodes 6,328,936,480 edges
I thought that if I can use the same memory to load the data and create the CSR, then I would approximately half my memory requirements. I think if I find this not be enough savings, then I am probably hitting some theoretical limits and would have to look at taking a different approach.
How are you generating/loading the data? Can you get the data in sorted order by source? The best you can do without pre-sorting the data is 12 bytes per edge + 8 bytes per vertex (for in-place construction, you need the lists of sources, targets, and edge properties, the second of which will be destroyed after graph construction, plus the temporary array of edge indices created by the CSR construction algorithm). -- Jeremiah Willcock
participants (4)
-
jer.k
-
Jeremiah Willcock
-
Jeremy Kawahara
-
Leo Hidd