[Graph] component_index crash on big graph

Hi, I am using the incremental connected component + component_index to find connected components in an undirected graph. However the program works fine on small small dataset but segfaults on large input. I tried to do some debugging and the best I could come up was that it fails on array_push_front cal at the boost/graph/detail/incremental_component.hpp:78. I have attached the source code and the accompanying small files that contain edgelist. The TrivialEdges.txt works fine but on ManyEdges.txt* the program segfaults. To the best of my understanding the input is sane. Would appreciate any help. * ManyEdges.txt is 320K so please download it from (* http://tinyurl.com/m9blyy). * THis is the dataset I stumbled upon. I apologies for not being able to trim down to smaller size. thanks sandeep

I am using the incremental connected component + component_index to find connected components in an undirected graph. However the program works fine on small small dataset but segfaults on large input. I tried to do some debugging and the best I could come up was that it fails on array_push_front cal at the boost/graph/detail/incremental_component.hpp:78.
It doesn't appear as though you have allocated any vertices for the graph. I'm pretty sure that add_edge does not allocate new vertices. I would guess that every time you call add_vertex, you are referencing data that doesn't actually exist. The successful run may be attributed to a) that the default constructor for the graph (or vector) pre-allocates a small number of elements, or b) pure luck. Andrew Sutton andrew.n.sutton@gmail.com

I am using the incremental connected component + component_index to find connected components in an undirected graph. However the program works
fine
on small small dataset but segfaults on large input. I tried to do some debugging and the best I could come up was that it fails on array_push_front cal at the boost/graph/detail/incremental_component.hpp:78.
It doesn't appear as though you have allocated any vertices for the graph. I'm pretty sure that add_edge does not allocate new vertices.
I would guess that every time you call add_vertex, you are referencing data that doesn't actually exist. The successful run may be attributed to a) that the default constructor for the graph (or vector) pre-allocates a small number of elements, or b) pure luck.
Thanks Andrew, I got confused with add_vertex and add_edge allocation
On Tue, Jul 7, 2009 at 4:40 AM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote: properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo). P.S. The command to run on ManyEdges would now be ./a.out ManyEdges.txt 21174

On Tue, 7 Jul 2009, Sandeep Gupta wrote:
On Tue, Jul 7, 2009 at 4:40 AM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote:
I am using the incremental connected component + component_index to find connected components in an undirected graph. However the program works
fine
on small small dataset but segfaults on large input. I tried to do some debugging and the best I could come up was that it fails on array_push_front cal at the boost/graph/detail/incremental_component.hpp:78.
It doesn't appear as though you have allocated any vertices for the graph. I'm pretty sure that add_edge does not allocate new vertices.
I would guess that every time you call add_vertex, you are referencing data that doesn't actually exist. The successful run may be attributed to a) that the default constructor for the graph (or vector) pre-allocates a small number of elements, or b) pure luck.
Thanks Andrew, I got confused with add_vertex and add_edge allocation properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo).
P.S. The command to run on ManyEdges would now be ./a.out ManyEdges.txt 21174
You appear to be mixing up the vector used as the data for the parent map with the disjoint_set structure. Try using an iterator property map created on the vector (or a shared_array_property_map in newer versions of Boost) as the parent map for the disjoint set structure. -- Jeremiah Willcock

On Tue, Jul 7, 2009 at 7:27 AM, Jeremiah Willcock <jewillco@osl.iu.edu>wrote:
On Tue, 7 Jul 2009, Sandeep Gupta wrote:
On Tue, Jul 7, 2009 at 4:40 AM, Andrew Sutton <andrew.n.sutton@gmail.com
wrote:
I am using the incremental connected component + component_index to find connected components in an undirected graph. However the program works
fine
on small small dataset but segfaults on large input. I tried to do some debugging and the best I could come up was that it fails on array_push_front cal at the boost/graph/detail/incremental_component.hpp:78.
It doesn't appear as though you have allocated any vertices for the graph. I'm pretty sure that add_edge does not allocate new vertices.
I would guess that every time you call add_vertex, you are referencing data that doesn't actually exist. The successful run may be attributed to a) that the default constructor for the graph (or vector) pre-allocates a small number of elements, or b) pure luck.
Thanks Andrew, I got confused with add_vertex and add_edge allocation
properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo).
P.S. The command to run on ManyEdges would now be ./a.out ManyEdges.txt 21174
You appear to be mixing up the vector used as the data for the parent map with the disjoint_set structure. Try using an iterator property map created on the vector (or a shared_array_property_map in newer versions of Boost) as the parent map for the disjoint set structure.
Yes the parent map access is what causing the segfault. As you suggested I did switch to iterator_property_map but unfortunately it still segfaults at the previous location. Not sure if I am declaring Components_t at line 58 correctly. The modified code can be found at http://codepad.org/1GAHpitK Thanks for taking a look, Jeremiah. -sandeep

Thanks Andrew, I got confused with add_vertex and add_edge allocation properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo).
I get confused too. I didn't write the class, so it's hard to remember. Just a heads up, though. The adjacency class has a constructor that takes a number of vertices and pre-allocates them. For example: graph g(20); cout << num_vertices(g); // prints 20. Andrew Sutton andrew.n.sutton@gmail.com

On Tue, Jul 7, 2009 at 8:21 AM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote:
Thanks Andrew, I got confused with add_vertex and add_edge allocation properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo).
I get confused too. I didn't write the class, so it's hard to remember. Just a heads up, though. The adjacency class has a constructor that takes a number of vertices and pre-allocates them. For example:
graph g(20); cout << num_vertices(g); // prints 20.
Hi Andrew, Got it. Back of my mind, I think I was uncomfortable about conversion from int to vertex_descriptor. The final status is that Jeremiah verified that the code runs fine on gcc over the large input. Hence its quite possible a bug, and should be filed as such. He suggested a workaround i.e. use ds.compress_sets() and then collect the component. So my problem is resolved.
I also have issues with the documentation of incremental component which I shall post separately. Appreciate the help. -Sandeep

On Tue, 7 Jul 2009, Sandeep Gupta wrote:
On Tue, Jul 7, 2009 at 8:21 AM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote:
Thanks Andrew, I got confused with add_vertex and add_edge allocation properties. I have modified the code create numEdges. However the problem occurs while building component_index which does not access graph data structure but rather only the parent map. Attached is the modified file (also at http://codepad.org/5dZM6YBo).
I get confused too. I didn't write the class, so it's hard to remember. Just a heads up, though. The adjacency class has a constructor that takes a number of vertices and pre-allocates them. For example:
graph g(20); cout << num_vertices(g); // prints 20.
Hi Andrew, Got it. Back of my mind, I think I was uncomfortable about conversion from int to vertex_descriptor. The final status is that Jeremiah verified that the code runs fine on gcc over the large input. Hence its quite possible a bug, and should be filed as such.
Just a minor thing -- Valgrind did show a buffer overrun in the code, and I added asserts and showed that it writes beyond the end of a temporary array, so it is definitely a bug in component_index.
He suggested a workaround i.e. use ds.compress_sets() and then collect the component. So my problem is resolved.
I also have issues with the documentation of incremental component which I shall post separately.
I would appreciate any comments on that you have. They should be a separate ticket from the component_index one, though. -- Jeremiah Willcock
participants (3)
-
Andrew Sutton
-
Jeremiah Willcock
-
Sandeep Gupta