[BGL] parallel BGL does not scale

Good Morning, I am trying to use the parallel BGL to compute the weakly connected components of a graph. I have managed to obtain the correct results with the code pasted below, however I am getting the following times: - Sequential BGL 0.01 seconds - Parallel BGL on 1 core 0.04 seconds - Parallel BGL on 2 cores 0.64 seconds That doesn’t look right to me. What am I doing wrong? using namespace boost; using boost::graph::distributed::mpi_process_group; typedef adjacency_list<vecS, distributedS<mpi_process_group, vecS>, undirectedS> Graph; typedef iterator_property_map<std::vector<int>::iterator, property_map<Graph, vertex_index_t>::type> LocalMap; int getConnectedComponents(std::vector< std::pair<int,int> > &arcs, const std::vector<int> &isActive) { // Allocate structures Graph G(nV+1); std::vector<int> localComponent(nV+1); std::vector<int> globalComponent(nV+1); // Only rank 0 has got the arc data structure if( rank == 0 ) { // Add dummy edges to each vertex (required to avoid BGL crash for a single node with no edges) for( int i=0; i<nV; i++ ) add_edge(vertex(i,G),vertex(i,G),G); // Add edges to the graph for(std::vector<std::pair<int,int> >::iterator it = arcs.begin(); it != arcs.end(); ++it) if( isActive[arcCnt++]==0 ) add_edge(vertex(it->first,G),vertex(it->second,G),G); } synchronize(G); // Compute number of connected components LocalMap components(localComponent.begin(),get(vertex_index, G)); int num = connected_components_ps(G, components); // Let rank 0 collect the global component vector if( rank == 0 ) { components.set_max_ghost_cells(0); for( PetscInt i = 0; i < nV+1; i++ ) globalComponent[i] = get(components, vertex(i, G)); } synchronize(components); } Regards, ------------------------------------------------- Alessio Quaglino Postdoctoral Researcher University of Lugano

I have found out the following in the meantime: I have been running the program with mpirun -np 2. This seems to MASSIVELY slow down the parallel BGL. Why is this the case? Timing connected_components_ps with mpirun -np 1 is in fact competitive with the non-parallel connected_components function. Then I am losing time in the synchronize() calls and end up being slightly slower for the parallel case. How can I use mpirun -np 2 with the parallel BGL? Thanks, Alessio On 14 Dec 2015, at 11:29, Quaglino Alessio <quagla@usi.ch<mailto:quagla@usi.ch>> wrote: Good Morning, I am trying to use the parallel BGL to compute the weakly connected components of a graph. I have managed to obtain the correct results with the code pasted below, however I am getting the following times: - Sequential BGL 0.01 seconds - Parallel BGL on 1 core 0.04 seconds - Parallel BGL on 2 cores 0.64 seconds That doesn’t look right to me. What am I doing wrong? using namespace boost; using boost::graph::distributed::mpi_process_group; typedef adjacency_list<vecS, distributedS<mpi_process_group, vecS>, undirectedS> Graph; typedef iterator_property_map<std::vector<int>::iterator, property_map<Graph, vertex_index_t>::type> LocalMap; int getConnectedComponents(std::vector< std::pair<int,int> > &arcs, const std::vector<int> &isActive) { // Allocate structures Graph G(nV+1); std::vector<int> localComponent(nV+1); std::vector<int> globalComponent(nV+1); // Only rank 0 has got the arc data structure if( rank == 0 ) { // Add dummy edges to each vertex (required to avoid BGL crash for a single node with no edges) for( int i=0; i<nV; i++ ) add_edge(vertex(i,G),vertex(i,G),G); // Add edges to the graph for(std::vector<std::pair<int,int> >::iterator it = arcs.begin(); it != arcs.end(); ++it) if( isActive[arcCnt++]==0 ) add_edge(vertex(it->first,G),vertex(it->second,G),G); } synchronize(G); // Compute number of connected components LocalMap components(localComponent.begin(),get(vertex_index, G)); int num = connected_components_ps(G, components); // Let rank 0 collect the global component vector if( rank == 0 ) { components.set_max_ghost_cells(0); for( PetscInt i = 0; i < nV+1; i++ ) globalComponent[i] = get(components, vertex(i, G)); } synchronize(components); } Regards, ------------------------------------------------- Alessio Quaglino Postdoctoral Researcher University of Lugano

Can you try a problem size that takes longer than the sub-second workload your posted? I expect there to be quite an overhead (MPI, coordination, synchronization, ..) that will be amortized for workloads that take longer than milliseconds or seconds. Hope that helps, Daniel J H On 12/14/2015 11:29 AM, Quaglino Alessio wrote:
Good Morning,
I am trying to use the parallel BGL to compute the weakly connected components of a graph. I have managed to obtain the correct results with the code pasted below, however I am getting the following times:
- Sequential BGL 0.01 seconds - Parallel BGL on 1 core 0.04 seconds - Parallel BGL on 2 cores 0.64 seconds
That doesn’t look right to me. What am I doing wrong?
usingnamespaceboost; using boost::graph::distributed::mpi_process_group; typedef adjacency_list<vecS, distributedS<mpi_process_group, vecS>, undirectedS> Graph; typedef iterator_property_map<std::vector<int>::iterator, property_map<Graph, vertex_index_t>::type> LocalMap;
int getConnectedComponents(std::vector< std::pair<int,int> > &arcs, const std::vector<int> &isActive) {
// Allocate structures Graph G(nV+1); std::vector<int> localComponent(nV+1); std::vector<int> globalComponent(nV+1);
// Only rank 0 has got the arc data structure if( rank == 0 ) {
// Add dummy edges to each vertex (required to avoid BGL crash for a single node with no edges) for( int i=0; i<nV; i++ ) add_edge(vertex(i,G),vertex(i,G),G);
// Add edges to the graph for(std::vector<std::pair<int,int> >::iterator it = arcs.begin(); it != arcs.end(); ++it) if( isActive[arcCnt++]==0 ) add_edge(vertex(it->first,G),vertex(it->second,G),G); }
synchronize(G);
// Compute number of connected components LocalMap components(localComponent.begin(),get(vertex_index, G)); int num = connected_components_ps(G, components);
// Let rank 0 collect the global component vector if( rank == 0 ) { components.set_max_ghost_cells(0); for( PetscInt i = 0; i < nV+1; i++ ) globalComponent[i] = get(components, vertex(i, G)); }
synchronize(components); }
Regards, ------------------------------------------------- Alessio Quaglino Postdoctoral Researcher University of Lugano
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (2)
-
Daniel Hofmann
-
Quaglino Alessio