Hello, I'm testing PBGL on my cluster. I have 4 nodes and each node has a dual-core processor. I changed the BFS example a little bit. The code generates a graph with RMAT, and it does BFS 10 times, starting from different nodes. I first run the program in a sequential mode, and the graph has 1,000,000 vertices. Then I run it in the cluster. The result is shown below, but I don't see much speedup. The running time is reduced from 14 seconds to 9.5 seconds. Can the result be right? I see very good speedup in the PBGL document. Another problem is that BFS starting from some vertices only needs very little time. I can imagine that it traverses only very few vertices. Is there any way I can count how many vertices are visited from the specific root vertex? My code is also shown below. Thanks, Da [zhengda@amdahl36 pbgl]$ ./bfs ~/bfs.out 1000000 generate graph takes 162.336 seconds to generate graph start BFS random v: 1395111516 takes 14.3496 seconds for BFS search random v: 488302563 takes 0.029389 seconds for BFS search random v: 72509101 takes 14.1103 seconds for BFS search random v: 1835953408 takes 0.028544 seconds for BFS search random v: 951813172 takes 14.7907 seconds for BFS search random v: 1708481694 takes 14.2254 seconds for BFS search random v: 305355333 takes 14.1587 seconds for BFS search random v: 1627009882 takes 0.02749 seconds for BFS search random v: 1769559585 takes 0.027495 seconds for BFS search random v: 101184463 takes 0.02756 seconds for BFS search [zhengda@amdahl36 pbgl]$ mpirun -np 8 --hostfile ~/.mpi_hostfile ./bfs ~/bfs.out 1000000 generate graph generate graph generate graph generate graph generate graph generate graph generate graph generate graph takes 136.623 seconds to generate graph start BFS random v: 1395111516 takes 9.6362 seconds for BFS search random v: 488302563 takes 0.017129 seconds for BFS search random v: 72509101 takes 9.49864 seconds for BFS search random v: 1835953408 takes 0.031777 seconds for BFS search random v: 951813172 takes 9.53063 seconds for BFS search random v: 1708481694 takes 9.45427 seconds for BFS search random v: 305355333 takes 9.58886 seconds for BFS search random v: 1627009882 takes 0.018723 seconds for BFS search random v: 1769559585 takes 0.020844 seconds for BFS search random v: 101184463 takes 0.016004 seconds for BFS search boost::minstd_rand gen; std::cout << "generate graph" << std::endl; struct timeval start_time; struct timeval end_time; gettimeofday (&start_time, NULL); Graph g(RMATGen(gen, nvertices, 16 * nvertices, 0.57, 0.19, 0.19, 0.05), RMATGen(), nvertices); gettimeofday (&end_time, NULL); if (process_id(process_group(g)) == 0) { std::cout << "takes " << (end_time.tv_sec - start_time.tv_sec) + ((double) end_time.tv_usec - start_time.tv_usec) / 1000000 << " seconds to generate graph" << std::endl; std::cout << "start BFS" << std::endl; } for (int i = 0; i < 10; i++) { int rv = gen(); if (process_id(process_group(g)) == 0) { std::cout << "random v: " << rv << std::endl; } graph_traits<Graph>::vertex_descriptor start = vertex(rv % nvertices, g); // Compute BFS levels from vertex 0 property_map<Graph, vertex_distance_t>::type distance = get(vertex_distance, g); put(distance, start, 0); gettimeofday (&start_time, NULL); breadth_first_search (g, start, visitor(make_bfs_visitor(record_distances(distance, on_tree_edge())))); gettimeofday (&end_time, NULL); if (process_id(process_group(g)) == 0) { std::cout << "takes " << (end_time.tv_sec - start_time.tv_sec) + ((double) end_time.tv_usec - start_time.tv_usec) / 1000000 << " seconds for BFS search" << std::endl; } }