Re: [boost] [BGL] Trying to get a correclty working Parallel BFS code

4 Nov 2009

      On Nov 3, 2009, at 10:44 PM, Sandeep Gupta wrote:
...
On Mon, Nov 2, 2009 at 12:43 PM, Nick Edmonds  
<ngedmond@cs.indiana.edu>wrote:
...
On Oct 31, 2009, at 1:00 AM, Sandeep Gupta wrote:
On Fri, Oct 30, 2009 at 6:41 AM, Jeremiah Willcock <jewillco@osl.iu.edu
...
...
wrote:
On Thu, 29 Oct 2009, Sandeep Gupta wrote:
...
On Thu, Oct 29, 2009 at 1:12 PM, Jeremiah Willcock <jewillco@osl.iu.edu
...
wrote:
...
On Thu, 29 Oct 2009, Sandeep Gupta wrote:
...
On Wed, Oct 28, 2009 at 5:49 PM, Sandeep Gupta <
gupta.sandeep@gmail.com
wrote:
>
>>
>>
>
>
> On Tue, Oct 27, 2009 at 9:31 PM, Jeremiah Willcock <
>> jewillco@osl.iu.edu
>>
>> wrote:
>>>
>>>
>> On Tue, 27 Oct 2009, Sandeep Gupta wrote:
>>
>>
>>> The error snippet regarding  predecessor map:
>>>
>>>
>>> error: no match for call to
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> '(boost 
>>>>> ::property_reduce 
>>>>> < 
>>>>> boost 
>>>>> ::vertex_predecessor_t 
>>>>> >::apply<boost::detail::error_property_not_found>)
>>>>> (const boost::detail::parallel::global_descriptor<unsigned  
>>>>> int>&)'
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ../../../boost/property_map/parallel/impl/ 
>>>>> distributed_property_map.ipp:141:
>>>>> error: no match for call to
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> '(boost 
>>>>> ::property_reduce 
>>>>> < 
>>>>> boost 
>>>>> ::vertex_predecessor_t 
>>>>> >::apply<boost::detail::error_property_not_found>)
>>>>> (const boost::detail::parallel::global_descriptor<unsigned  
>>>>> int>&,
>>>>> boost::detail::error_property_not_found&, const
>>>>> boost::detail::error_property_not_found&)'
>>>>> ../../../boost/graph/parallel/properties.hpp:95: note:  
>>>>> candidates
>>>>> are:
>>>>> T
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> boost 
>>>>> ::property_reduce 
>>>>> <boost::vertex_predecessor_t>::apply<T>::operator()(T)
>>>>> const [with T = boost::detail::error_property_not_found]
>>>>> ../../../boost/graph/parallel/properties.hpp:96: note:
>>>>> T
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> boost 
>>>>> ::property_reduce 
>>>>> <boost::vertex_predecessor_t>::apply<T>::operator()(T,
>>>>> T, T) const [with T =  
>>>>> boost::detail::error_property_not_found]
>>>>>
>>>>>
>>>>> Jeremiah, Do you think that I should file a bug report for  
>>>>> this.
>>>>>
>>>>> Although I was hoping (and actually needed quite urgently)  
>>>>> that it
>>>>>
>>>>> would
>>>> be
>>>> minor issue and get fixed quickly.
>>>>
>>>>
>>>> You need to pass a predecessor map to the algorithm as one  
>>>> of the
>>>>
>>> parameters
>>> (or put it as a part of your graph but making an external  
>>> property
>>> map
>>> to
>>> give
>>> to the algorithm is easier).
>>>
>>> Hi Jeremy,
>>>
>>>
>>> It took me a while but I finally figure out how to pass the
>> predecessor
>>
>> map. Unfortunately it doesn't have any effect. Also, I might be
>>> wrong
>>> but I
>>> don't see any logical reason why predecessor map should have  
>>> any
>>> bearing
>>> on
>>> the correctness of the depth. I have attached the new code.   
>>> I am
>>> not
>>> able
>>> to print out the predecessor because I am not able to figure  
>>> out its
>>> type.
>>> Maybe you could help me resolve this.
>>>
>>>
>>> It would be nice to have this code running. I need to  
>>> profile graph
>>>
>> performance on a new machine by tomorrow. Again, thanks for you
>> patience
>> and
>> time. I really appreciate you looking into this.
>>
>> Hi Jeremiah,
>>
>> Was just wondering if you had time to look into this or any
>> suggestion
>>
>> on
> how further proceed. I can get you the output of predecessor  
> map if
> that
> would help.
> Just that I haven't be been able to figure out what the is  
> type of the
> property map returned by make_distributed_property_map.
> Please let me know your thoughts.
>
>
> The person who really knows this library is not around as far  
> as I
know;
he
might be on vacation.  The type returned from
make_distributed_property_map
is documented on <URL:
http://www.osl.iu.edu/research/pbgl/documentation/graph/
distributed_property_map.html>.  One thing that you could do to  
debug
things is to put in your own color map and look at what values  
it ends
up
with at the end of the algorithm, and possibly even to put in a  
local
color
map that prints out when elements are changed.  That might be a  
lot of
work
for what could end up being an obvious problem (that I just  
don't know
how
to diagnose), though.  BTW, does the example in
libs/graph_parallel/example/breadth_first_search.cpp work for  
you?  It
appears to be doing the same thing you want to do.  If that  
works, it
could
be a problem in your visitor.
I derived this code from
graph_parallel/example/breadth_first_search.cpp.
The problem is that I don't understand the input and the output  
of the
example code. It would be great if I could get an explanation  
for  the
format of the input file. Then I can transform my graph into that
format
and pass to the algorithm. That would be enough for my current  
purpose.
As time permits I would try out your suggestions and let know of  
update.
In
the meantime I am hoping that I would get input from other  
relevant
authors.
This is what I could tell from the code.
Input:
http://people.sc.fsu.edu/~burkardt/data/metis_graph/<http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/
...
metis_graph.html<
http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/metis_graph.html
...
Output:
http://www.graphviz.org/doc/info/lang.html
Thanks Jeremiah. The link was helpful. I understood the input graph
format.
Its for undirected graph only. However the BFS output is still  
coming out
to
be  incorrect. I  tried with  a sample line graph  of 4 nodes and  
it
exhibits the same problem as before.
The distances of nodes on process 2 is not getting updated. On  
the other
hand dijkstra shortest paths example is working correctly.
Thanks
Sandeep
I rewrote most of your code before I realized that you're  
outputting all
your distances on process 0, even for vertices process 0 doesn't  
own.  The
large value you see is std::numeric_limits<distance_type>::max.   
When you
call get(distance, vertex(i, g)) for some i not owned by the  
current process
a ghost cell is created for the requested value and a request sent  
to the
owner of that vertex for the correct value.  get() then returns  
whatever the
default value is for the distance property map, which in this case is
infinity (i.e. std::numeric_limits<distance_type>::max.
In some cases you may get the correct distance value because  
process 0 has
previously requested that value (and it may or may not be up to  
date).  If
process 0 has never requested that distance value and it doesn't  
own the
vertex, then you'll get infinity.
You need to gather all your data before you output it from a single  
node
(i.e. issue a get() or request() which is a get with no return  
value, for
each data element).  If you want your code to be scalable you  
should output
your data in a distributed fashion from all nodes at once.   
Remember that
updated values won't be guaranteed to arrive until the next  
synchronization
step.
If you want the modified version of your code let me know and I'll  
finish
it and ship it your way.
-Nick
Hi Nick,
Thanks so much for looking into this. Sorry, I couldn't get to it  
earlier
due to other concerns. I corrected the mistake you mentioned and I  
don't get
numeric_limit<>::max() for any nodes anymore.
I believe the graph/distributed/graphviz.hpp was outputting the  
distance in
correct fashion.
Unfortunately, the output is still coming out to be incorrect.
It was also the case with output from graphviz.hpp.
For the test graph that we mentioned before, essentially
0 --> 1 --> 2 --> 3,
0 --> 4 --> 3
the distance output is as follows:
global node-id : distance
0 : 0
1 : 1
2 : 2
3 : 1            //(wrong: should be 2)
4 :  0          // should be 1
If possible can you send me your version of the code. Let me know if  
you
need mine (the updated version).
Given the description of your problem above, and assuming you're using  
a block distribution with 8 vertices, it sounds like you're  
initializing the distance to the first vertex on *each* process to 0,  
as opposed to only the distance to the source.  The vertex with global  
index 4 is actually local index 0 on process one using the above  
distribution.  If you initialize the distances to the vertices with  
global indices 0 and 4 to 0, then your results make sense.  Let me  
know if that's not the case, but I surmise that it is.

Also there's a problem in your depth labeling visitor.  In the case of  
cross-processor edges the process that owns the target of the edge  
doesn't necessarily have the distance to the source of the edge  
available locally.  This will lead to incorrect distance labels even  
though tree and non-tree edges will still be correctly recognized  
since the BFS is level-synchronized.  The fastest way to handle this  
is to modify the BFS queue to also pass distance data (which is how  
the shortest paths algorithms do it).  You could also send the data  
separately and probe for it on the receiving side.  Check out the  
distributed betweenness centrality code for how to create a  
distributed queue which contains tuples of data one element of which  
is a vertex (in order to know where to route the data).  Integrating  
this type of vertex queue (i.e. one that contains a data payload) with  
BFS basically just requires handling the data payload on the receiving  
side.

Cheers,
Nick