Re: [Boost-users] distributed breadth first search doesn't work properly?

11 Dec 2010

      In addition to my previous answer, your observed behavior more and more seems 
reasonable to me. It may seem uncorrect, but that's a question of the point-
of-view IMHO. Solutions to some problems differ when going from sequential to 
distributed computing and the concept chosen by the PBGL implies that the 
"best" value for the distance should be stored here.
But now to explain why I further support the observed behavior. The algorithms 
in (P)BGL are based on visitors that define event points. One of those is the 
initialization of a vertex. Due to the distributed behavior, you have two (or  
more) processes that can start independently. Thus, these two processes will 
usually move from intializing to discovering (another event point) at different 
moments. If you look at the code of the distributed_bfs_visitor you will see 
the tree_edge() function definition which relies on the stored value of the 
source in order to assign the appropriate distance value to the target of an 
edge. Due to the distributed computing, the vertex n4 can/will start 
discovering while the vertex n0 is _still_ initializing (the formulation 
probably is not 100% correct as I just want to illustrate the general idea). 
Because vertex n4 has not yet been discovered by n0 (only initialized or, even 
more, maybe default initialized) it will discover vertex n3 and assign it the 
discovery-value of n4 (which is 0) plus 1. When n0 now discovers vertex n4, it 
would like to assign it the value 1, but this is prohibited by the 
set_property_map_role(). The same will hold for n3.
What further supports this, is the label (distance-value) of vertex n1. In 
your single-process call it gets a value of 2, while in the multi-process call 
this is changed to 1. Because this is a BFS, n1 is discovered in the same 
level as n3 via n4. Thus, it gets the same value as n3 and will not be 
assigned a value by the discovery via n0 (where it seems to be one level 
deeper in the underlying graph).

The only thing I have not come up with a me-satisfying explanation is why the 
second BFS starts at n4 (I mean, it is the "last" vertex). I suspect that this 
is due to an initialization by vertex n0 and it only then can start a second 
process. Because, after all, it is one of the vertices that will be 
initialized first, probably even the first to intialize vertex, besides n0.
If the second (or third etc.) process would be started arbitrarily, than the 
results of the multi-core run should differ between some consecutive executions 
at least (if it was truly random).
In this context it would also be interesting to see how the whole approach 
behaves when it is run on more than two processes or with a different graph, 
probably having an initial vertex (start) with degree > 2 (possibly 3 for 
easiness).

Best,

Cedric

On Friday, 10. December 2010 17:06:34 Mattia Lambertini wrote:
...
...
Perhaps you could attach the code of a minimal working example?
of course, i can give you a link to the example i've tested [0] and here
below is the graph i used (METIS):
5 9 1
3 1
2 2 4 1 5 2
2 7 4 3
5 1
1 1 2 1
[0]
http://www.boost.org/doc/libs/1_44_0/libs/graph_parallel/example/breadth_fi
rst_search.cpp
To compile the example you have to link:
boost_graph_parallel-mt
boost_mpi-mt
boost_system-mt
Thanks for your time.

Re: [Boost-users] distributed breadth first search doesn't work properly?

Cedric Laczny