[BGL] brandes_betweenness_centrality failed on medium-size graphs

Hi all, I coded a program recursively cutting graphs based on brandes_betweenness_centrality. But the weird thing is that it succeeds on small graphs(<=8 vertices, <=15 edges) and large graphs(~3500 vertices, ~138,000edges). But failed on medium-size (~37-200 vertices, ~42-350 edges). The failed output is like this: no_of_vertices: 203 no_of_edges: 368 connectivity: 0.0179486 running brandes_betweenness_centrality... *** glibc detected *** double free or corruption (out): 0x080942d8 *** Aborted Actually, I can't say the it succeeds on large graph. I stopped it after ~45 brandes_betweenness_centrality() calls(because it took too long, over-night). But on medium-size ones, the first brandes_betweenness_centrality() is always ok. But failed quickly, mostly 2nd loop, some 3rd, even 5th. Only 0x080942d8 changed from run to run. Others remain same(Am i stupid to say this? of course it won't change). Well, i paste my core function here. The main idea is I remove the edge with max centrality and form a vector of clusters(graphs) from the ancestor graph's components (if after cut, the ancestor has >=2 components, otherwise keep cutting on the ancestor graph). void ClusterByEBC::cut_by_betweenness_centrality(Graph &graph, const int &size_cutoff, const float &conn_cutoff) /* *09-04-05 * cut the graph with the edge of maximum betweenness_centrality */ { int no_of_vertices = num_vertices(graph); int no_of_edges = num_edges(graph); #ifdef DEBUG std::cerr<<"no_of_vertices: "<<no_of_vertices<<std::endl; std::cerr<<"no_of_edges: "<<no_of_edges<<std::endl; #endif if (no_of_vertices>=size_cutoff) { float connectivity = 2.0*no_of_edges/(no_of_vertices*(no_of_vertices-1)); #ifdef DEBUG std::cerr<<"connectivity: "<<connectivity<<std::endl; #endif if (connectivity>=conn_cutoff) { #ifdef DEBUG std::cerr<<"good subgraph "<<std::endl; #endif good_subgraph_vector.push_back(graph); } else { vector<double> edge_centrality(no_of_edges); EdgeCentralityMap ec_map(edge_centrality.begin(), get(edge_index, graph)); //"make_iterator_property_map(edge_centrality.begin(), get(edge_index, subgraph), double())" also works. indirect_cmp<EdgeCentralityMap, std::less<centrality_type> > cmp(ec_map); #ifdef DEBUG std::cerr<<"running brandes_betweenness_centrality... "; #endif brandes_betweenness_centrality(graph, edge_centrality_map(ec_map)); #ifdef DEBUG std::cerr<<"done."<<std::endl; #endif edgeDescriptor e = *max_element(edges(graph).first, edges(graph).second, cmp); centrality_type max_centrality = get(ec_map, e); #ifdef DEBUG std::cerr<<"max_centrality is "<<max_centrality<<std::endl; #endif remove_edge(e, graph); #ifdef DEBUG std::cerr<<"after removal the subgraph has "<<num_edges(graph)<<" edges."<<std::endl; #endif std::vector<int> component(num_vertices(graph)); int no_of_components = connected_components(graph, &component[0]); if (no_of_components==1) //keep cutting { #ifdef DEBUG std::cerr<<"only one component, keep cutting "<<std::endl; #endif cut_by_betweenness_centrality(graph, size_cutoff, conn_cutoff); } else //first get the connected_components into a vector_sub_subgraph and cut them separately { #ifdef DEBUG std::cerr<<"get the connected_components into a vector_graph and cut them separately "<<std::endl; #endif std::vector<Graph> vector_graph = graph_components(graph, component, no_of_components); std::vector<Graph>::iterator g_iterator; for(g_iterator=vector_graph.begin();g_iterator!=vector_graph.end();++g_iterator) { cut_by_betweenness_centrality(*g_iterator, size_cutoff, conn_cutoff); } } } } #ifdef DEBUG else std::cerr<<"Bad graph, too small,"<<no_of_vertices<<"vertices"<<std::endl; #endif } This looks like a hard issue. I searched the mailing list and found one similar message, http://lists.boost.org/MailArchives/ublas/2005/08/0657.php (no one replied). Thanks, Yu

Yu Huang wrote:
Hi all,
I coded a program recursively cutting graphs based on brandes_betweenness_centrality.
But the weird thing is that it succeeds on small graphs(<=8 vertices, <=15 edges) and large graphs(~3500 vertices, ~138,000edges). But failed on medium-size (~37-200 vertices, ~42-350 edges).
The failed output is like this:
no_of_vertices: 203 no_of_edges: 368 connectivity: 0.0179486 running brandes_betweenness_centrality... *** glibc detected *** double free or corruption (out): 0x080942d8 *** Aborted
Actually, I can't say the it succeeds on large graph. I stopped it after ~45 brandes_betweenness_centrality() calls(because it took too long, over-night). But on medium-size ones, the first brandes_betweenness_centrality() is always ok. But failed quickly, mostly 2nd loop, some 3rd, even 5th. Only 0x080942d8 changed from run to run. Others remain same(Am i stupid to say this? of course it won't change).
Well, i paste my core function here. The main idea is I remove the edge with max centrality and form a vector of clusters(graphs) from the ancestor graph's components (if after cut, the ancestor has >=2 components, otherwise keep cutting on the ancestor graph).
void ClusterByEBC::cut_by_betweenness_centrality(Graph &graph, const int &size_cutoff, const float &conn_cutoff) /* *09-04-05 * cut the graph with the edge of maximum betweenness_centrality */ { int no_of_vertices = num_vertices(graph); int no_of_edges = num_edges(graph); #ifdef DEBUG std::cerr<<"no_of_vertices: "<<no_of_vertices<<std::endl; std::cerr<<"no_of_edges: "<<no_of_edges<<std::endl; #endif if (no_of_vertices>=size_cutoff) { float connectivity = 2.0*no_of_edges/(no_of_vertices*(no_of_vertices-1)); #ifdef DEBUG std::cerr<<"connectivity: "<<connectivity<<std::endl; #endif if (connectivity>=conn_cutoff) { #ifdef DEBUG std::cerr<<"good subgraph "<<std::endl; #endif good_subgraph_vector.push_back(graph); } else { vector<double> edge_centrality(no_of_edges); EdgeCentralityMap ec_map(edge_centrality.begin(), get(edge_index, graph)); //"make_iterator_property_map(edge_centrality.begin(), get(edge_index, subgraph), double())" also works. indirect_cmp<EdgeCentralityMap, std::less<centrality_type> > cmp(ec_map); #ifdef DEBUG std::cerr<<"running brandes_betweenness_centrality... "; #endif brandes_betweenness_centrality(graph, edge_centrality_map(ec_map)); #ifdef DEBUG std::cerr<<"done."<<std::endl; #endif edgeDescriptor e = *max_element(edges(graph).first, edges(graph).second, cmp); centrality_type max_centrality = get(ec_map, e); #ifdef DEBUG std::cerr<<"max_centrality is "<<max_centrality<<std::endl; #endif remove_edge(e, graph); #ifdef DEBUG std::cerr<<"after removal the subgraph has "<<num_edges(graph)<<" edges."<<std::endl; #endif std::vector<int> component(num_vertices(graph)); int no_of_components = connected_components(graph, &component[0]); if (no_of_components==1) //keep cutting { #ifdef DEBUG std::cerr<<"only one component, keep cutting "<<std::endl; #endif cut_by_betweenness_centrality(graph, size_cutoff, conn_cutoff); } else //first get the connected_components into a vector_sub_subgraph and cut them separately { #ifdef DEBUG std::cerr<<"get the connected_components into a vector_graph and cut them separately "<<std::endl; #endif std::vector<Graph> vector_graph = graph_components(graph, component, no_of_components); std::vector<Graph>::iterator g_iterator;
for(g_iterator=vector_graph.begin();g_iterator!=vector_graph.end();++g_iterator) { cut_by_betweenness_centrality(*g_iterator, size_cutoff, conn_cutoff); } } } } #ifdef DEBUG else std::cerr<<"Bad graph, too small,"<<no_of_vertices<<"vertices"<<std::endl; #endif
}
This looks like a hard issue. I searched the mailing list and found one similar message, http://lists.boost.org/MailArchives/ublas/2005/08/0657.php (no one replied).
Thanks, Yu
I forgot, if any of you guys wants the complete source code and the graph data, i'll email you directly. The graph data is too large to be an attachement.

On Sep 7, 2005, at 2:11 AM, Yu Huang wrote:
Actually, I can't say the it succeeds on large graph. I stopped it after ~45 brandes_betweenness_centrality() calls(because it took too long, over-night).
Make sure to compile with full optimization (e.g., -O3 on GCC), because it makes a *huge* difference in the performance.
But on medium-size ones, the first brandes_betweenness_centrality() is always ok. But failed quickly, mostly 2nd loop, some 3rd, even 5th. Only 0x080942d8 changed from run to run. Others remain same(Am i stupid to say this? of course it won't change).
I'm guessing that you will need to renumber the edges after removing an edge. [snip]
vector<double> edge_centrality(no_of_edges); EdgeCentralityMap ec_map(edge_centrality.begin(), get(edge_index, graph));
At this point, we need to be 100% sure that no_of_edges > the index of every edge in the graph, otherwise we'll be reading/writing past the end of the edge_centrality vector.
//"make_iterator_property_map(edge_centrality.begin(), get(edge_index, subgraph), double())" also works. indirect_cmp<EdgeCentralityMap, std::less<centrality_type> > cmp(ec_map); #ifdef DEBUG std::cerr<<"running brandes_betweenness_centrality... "; #endif brandes_betweenness_centrality(graph, edge_centrality_map(ec_map)); #ifdef DEBUG std::cerr<<"done."<<std::endl; #endif edgeDescriptor e = *max_element(edges(graph).first, edges(graph).second, cmp); centrality_type max_centrality = get(ec_map, e); #ifdef DEBUG std::cerr<<"max_centrality is "<<max_centrality<<std::endl; #endif remove_edge(e, graph); #ifdef DEBUG std::cerr<<"after removal the subgraph has "<<num_edges(graph)<<" edges."<<std::endl; #endif
I believe this is the problem. When you remove an edge, num_edges (graph) is reduced by one. However, there may still exist an edge with edge_index=num_edges(graph). There are two ways to fix the problem: 1) After calling remove_edge, loop over the edges in the graph and reindex them to be sure they are in the range [0, num_edges(graph)) after removal. 2) Allocate the edge_centrality vector for the total number of edges *outside* of the cut_by_betweenness_centrality function, then pass in a reference to the vector (or its property map). That way, even without changing the edge indices you'll still have enough space in the vector, even though its usage will be sparse. #1 is the smallest fix, whereas #2 might result in slightly faster code. Doug

Date: Wed, 7 Sep 2005 09:14:15 -0500 From: Douglas Gregor <doug.gregor@gmail.com> Subject: Re: [boost] [BGL] brandes_betweenness_centrality failed on medium-size graphs To: boost@lists.boost.org Message-ID: <18767AEB-146E-494D-BBE9-BD4437E1A4D5@cs.indiana.edu> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed On Sep 7, 2005, at 2:11 AM, Yu Huang wrote:
Actually, I can't say the it succeeds on large graph. I stopped it after ~45 brandes_betweenness_centrality() calls(because it took too long, over-night).
Make sure to compile with full optimization (e.g., -O3 on GCC), because it makes a *huge* difference in the performance.
But on medium-size ones, the first brandes_betweenness_centrality() is always ok. But failed quickly, mostly 2nd loop, some 3rd, even 5th. Only 0x080942d8 changed from run to run. Others remain same(Am i stupid to say this? of course it won't change).
I'm guessing that you will need to renumber the edges after removing an edge. [snip]
vector<double> edge_centrality(no_of_edges); EdgeCentralityMap ec_map(edge_centrality.begin(), get(edge_index, graph));
At this point, we need to be 100% sure that no_of_edges > the index of every edge in the graph, otherwise we'll be reading/writing past the end of the edge_centrality vector.
//"make_iterator_property_map(edge_centrality.begin(), get(edge_index, subgraph), double())" also works. indirect_cmp<EdgeCentralityMap, std::less<centrality_type> > cmp(ec_map); #ifdef DEBUG std::cerr<<"running brandes_betweenness_centrality... "; #endif brandes_betweenness_centrality(graph, edge_centrality_map(ec_map)); #ifdef DEBUG std::cerr<<"done."<<std::endl; #endif edgeDescriptor e = *max_element(edges(graph).first, edges(graph).second, cmp); centrality_type max_centrality = get(ec_map, e); #ifdef DEBUG std::cerr<<"max_centrality is "<<max_centrality<<std::endl; #endif remove_edge(e, graph); #ifdef DEBUG std::cerr<<"after removal the subgraph has "<<num_edges(graph)<<" edges."<<std::endl; #endif
I believe this is the problem. When you remove an edge, num_edges (graph) is reduced by one. However, there may still exist an edge with edge_index=num_edges(graph). There are two ways to fix the problem: 1) After calling remove_edge, loop over the edges in the graph and reindex them to be sure they are in the range [0, num_edges(graph)) after removal. 2) Allocate the edge_centrality vector for the total number of edges *outside* of the cut_by_betweenness_centrality function, then pass in a reference to the vector (or its property map). That way, even without changing the edge indices you'll still have enough space in the vector, even though its usage will be sparse. #1 is the smallest fix, whereas #2 might result in slightly faster code. Doug Yeah! Both of them worked out. My final option is 2 as Doug said it's faster. I almost gave up when i saw the "glibc detected" last night. Thanks! Yu
participants (2)
-
Douglas Gregor
-
Yu Huang