On Apr 22, 2011, at 11:01 AM, Cedric Laczny wrote:
On Friday, 22. April 2011 16:31:20 Steven Watanabe wrote:
AMDG
On 04/22/2011 05:59 AM, Cedric Laczny wrote:
it recently came to my mind again that the distributed BFS in the PBGL seems to produce different results when applied to multiple processors as to when applied to a single processor (s. http://lists.boost.org/boost- users/2010/12/64783.php) If this is true, I really would like to know what the reasoning is behind this and it should be noted in the documentation of course to help prevent surprises as in the discussion above. Unfortunately, at that time, no definite answer was given and this question puzzled me again now... Any ideas on that?
Why would you expect the results to be exactly the same? Parallelism usually introduces a certain amount of non-determinism.
I agree with you on that, generally. And it is good to note this explicitly again. However, maybe I just got a weird kind of perception/expectation here, but especially for search-algorithms, I would expect parallel algorithms to return the same results as otherwise the question of correct interpretation of the parallel results arises to me. I mean, if you get different distance-results depending on the number of processors, what is then the use of distributed BFS, e.g. if you want to know the BFS-distance from source to a specific vertex x?
The current PBGL (and any other *correct* parallel BFS implementation) will explore one (of a family of valid) breadth-first vertex ordering. For a variety of reasons the sequential visitor semantics do not map well to parallel execution however. Confusion about the semantics of visitors in a parallel context in combination with the fact that the ordering of vertex discovery and edge traversal within levels is necessarily non-deterministic often leads to confusion by users about the results returned by their BFS visitors. A correctly written visitor will return the same distance labels as the sequential implementation, but may not explore the vertices/edges in the same order. There is also the question of what the *output* of a breadth-first search is. In sequential execution the cost of finding a breadth-first ordering and exploring the graph breadth-first are equivalent. In the parallel case it is sometimes desirable to compute a BFS ordering using an asymptotically more expensive algorithm such as single-source shortest paths because it may expose more parallelism depending on the input graph. In the sequential BGL case the breadth-first exploration of the graph and exposure of this ordering via visitor events is the output. In possible parallel implementations the output may be a BFS tree expressed using a predecessor map, or distance labels from the source of that tree. The point here is that the assumption of breadth-first exploration may or may not be valid/necessary in the parallel case and relaxing this requirement to either some sort of relaxed topological traversal, or computation of a BFS-ordering using a non-breadth-first traversal is often more appropriate. In the rewrite of the PBGL I'm working on it's looking like the visitor concept will be mostly (or completely) absent, and a richer set of traversal primitives will replace/augment the general purpose BFS algorithm in the current version (since it proved inefficient to compose other algorithms on parallel BFS anyway). Hope that helps, and my apologies for not always having time to help answer the "my code using the PBGL returns results that don't make sense to me" questions. Cheers, Nick