Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?

13 Oct 2011

      On Oct 13, 2011, at 9:53 AM, Yan wrote:
...
Hi all,
I'm learning to use the distributed adjacency list and distributed  
dijkstra
SP-algorithm. I use some test data contains about 3,800,000 edges and
2,800,000 vertices. I use a pair of iterators to pass the data to the
distributed adjacency list, like that:
Graph g(reader.begin(), reader.end(), reader.weight_begin(),
reader.num_vertices(), pg, dist);
    "reader" is an user-defined class which supplys required data  
(edges,
vertices, weights).
    "pg" is defined like that "mpi_process_group pg;"
    "dist" is supplied by the reader like that  
"graph::metis_distribution
dist = reader.distribution();". It is based on a metis partition file
coordinating the test data.
I run 2 processes on my notebook computer(Intel core2 2.0G, Ram 2G,
WinXPsp3).
I also test the same data using a sequential bgl dijkstra SP- 
algorithm on
this computer.
The result is that:
Parallel bgl:
Creating of Graph g costs *10080 seconds*; dijkstra computing costs  
28.281
seconds.
Sequential bgl:
Creating of Graph g costs *4.563 seconds*; dijkstra computing costs  
1.703
seconds.
As you see, when using distributed adjacency list, it costs so much  
time to
create the graph. Is it nomal?
I have done some more comparison between 2-processes parallel  
dijkstra on
one core2 computer and sequential dijkstra on the same computer, using
different scale data set. Always, the sequential bgl dijkstra costs  
less
time than the 2-processes parallel dijkstra. What's the problem? I  
have read
the boost document on
"boost_1_46_1\libs\graph_parallel\doc\html 
\dijkstra_shortest_paths.html",
which shows a good performance and speedup in the charts. Doesn't  
that mean
less time costing than the sequential program?
Any advice would be appreciated.
Thanks,
Yan
While I can't offer categorical answers, my first reaction is to say  
that it is highly likely that you're paging in the example you  
provide.  With only 2G of RAM and the memory consumed by the  
mpi_process_group (discussed in another thread on this list, look  
there for a description of how to reduce it) plus whatever else is  
running on your laptop it's entirely possible that you're running out  
of memory.  Once the PBGL begins to page between mpi_process_group  
buffers and graph data the runtime becomes entirely unpredictable and  
if the degree of paging is high very, very long.  It's also worth  
noting that parallel speedup of the PBGL vs. sequential BGL depends  
on the graph structure and size, it's is entirely possible, and not  
uncommon, to find graphs for which no speedup or negative speedup is  
observed.  The distributed communication mechanisms in the PBGL incur  
non-trivial overhead due to copying and serialization overhead.  The  
overhead of the serialization in PBGL is so high that I have seldom  
found it beneficial to run multiple MPI ranks within a single address  
space (as you are doing on your two core laptop).  There are a number  
of ways to speed up serialization (by defining BOOST_MPI_HOMOGENOUS  
to avoid MPI_Pack()/MPI_Unpack() overhead amongst other  
optimizations).  I considered rewriting PBGL to make it possible to  
avoid serialization entirely, and in fact the new prototype active  
message based version does this, but it's quite a bit of work and I  
likely won't have the time to make similar changes to the Boost  
version of the library.  The new version of the library also  
incorporates parallelism within address spaces via threading or  
(potentially) off-loading to accelerators rather than running  
multiple MPI processes.

Hope that answers your question and gives you some ideas regarding  
where you can expect to find (known) performance issues in PBGL and  
how to ameliorate them to some degree... as well as what you could do  
to improve PBGL if you're so inclined!

Cheers,
Nick

Re: [Boost-users] Why does the creating of distributed adjacency list cost so much time?

Nick Edmonds