[graph] File Dependency Example with updates

Hi all - I am toying around with the file dependency example that is part of the boost graph documentation as a way of adapting the concepts to my own use cases. What I want to do is update the graph over time (add more files with more dependencies), and incrementally decide what needs to be built, and in what order. It makes sense that doing a breadth-first search from a changed node will give me everything that might be affected; however, if I change many nodes, is there something better? I could imagine that I might mark nodes as they are encountered so that other BFS from other changed nodes won't traverse those as well. With boost graph, how can I control the BFS to stop recursion if a vertex has already been visited? I guess then once I have seen all the stale vertices, I can make a smaller graph that only has the stale vertices and their connections, and I can topologically sort the sub-graph to find the parallel build order? Sorry, I'm a bit out of practice with my graph algorithms in general, and I'm very new to the boost graph library, so I hope I haven't said anything dumb :) Any design suggestions welcome. Thanks, Brian

On Thu, 15 Nov 2012, Brian Budge wrote:
Hi all - I am toying around with the file dependency example that is part of the boost graph documentation as a way of adapting the concepts to my own use cases.
What I want to do is update the graph over time (add more files with more dependencies), and incrementally decide what needs to be built, and in what order.
Does that mean that you are only adding dependencies, not removing them?
It makes sense that doing a breadth-first search from a changed node will give me everything that might be affected; however, if I change many nodes, is there something better?
It depends on if you think a lot of the BFS levels will be changed by your updates. If you think they will be, just re-do the BFS from scratch. If not, you might want to keep a level map, then stop on any vertices whose level values don't change when running the BFS from the updated sources. What I think will work (not tested) is to: 1. Start with source vertices of newly added edges marked as white and in the list of start vertices (see below); color the other vertices black. 2. In examine_edge, if the target's level is larger than the source's level + 1, set the target's color to white.
I could imagine that I might mark nodes as they are encountered so that other BFS from other changed nodes won't traverse those as well. With boost graph, how can I control the BFS to stop recursion if a vertex has already been visited?
There are multi-source BFS versions in Boost.Graph now, although they are
not documented; the version to use is:
template
I guess then once I have seen all the stale vertices, I can make a smaller graph that only has the stale vertices and their connections, and I can topologically sort the sub-graph to find the parallel build order?
An efficient incremental topological sort is apparently hard to do; BFS isn't as fast as a specialized algorithm for that, but should work. -- Jeremiah Willcock

Thanks for the ideas Jeremiah. I'll upgrade to 1.52 and try it out.
Brian
On Fri, Nov 16, 2012 at 7:57 AM, Jeremiah Willcock
On Thu, 15 Nov 2012, Brian Budge wrote:
Hi all -
I am toying around with the file dependency example that is part of the boost graph documentation as a way of adapting the concepts to my own use cases.
What I want to do is update the graph over time (add more files with more dependencies), and incrementally decide what needs to be built, and in what order.
Does that mean that you are only adding dependencies, not removing them?
It makes sense that doing a breadth-first search from a changed node will
give me everything that might be affected; however, if I change many nodes, is there something better?
It depends on if you think a lot of the BFS levels will be changed by your updates. If you think they will be, just re-do the BFS from scratch. If not, you might want to keep a level map, then stop on any vertices whose level values don't change when running the BFS from the updated sources. What I think will work (not tested) is to:
1. Start with source vertices of newly added edges marked as white and in the list of start vertices (see below); color the other vertices black.
2. In examine_edge, if the target's level is larger than the source's level + 1, set the target's color to white.
I could imagine that I might mark nodes as they are
encountered so that other BFS from other changed nodes won't traverse those as well. With boost graph, how can I control the BFS to stop recursion if a vertex has already been visited?
There are multi-source BFS versions in Boost.Graph now, although they are not documented; the version to use is:
template
void breadth_first_search (const VertexListGraph& g, SourceIterator sources_begin, SourceIterator sources_end, Buffer& Q, BFSVisitor vis, ColorMap color); I guess then once I have seen all the stale vertices, I can make a
smaller graph that only has the stale vertices and their connections, and I can topologically sort the sub-graph to find the parallel build order?
An efficient incremental topological sort is apparently hard to do; BFS isn't as fast as a specialized algorithm for that, but should work.
-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On Fri, Nov 16, 2012 at 7:57 AM, Jeremiah Willcock
Does that mean that you are only adding dependencies, not removing them?
I may rarely remove, rarely enough that I can do something less efficient in those case.
It depends on if you think a lot of the BFS levels will be changed by your updates. If you think they will be, just re-do the BFS from scratch. If not, you might want to keep a level map, then stop on any vertices whose level values don't change when running the BFS from the updated sources. What I think will work (not tested) is to:
I'm not 100% sure I understand about the level map. I can see that if I just mark all vertices white, and then mark black as vertices are completed, common subpaths will not be traversed when BFSing from multiple vertices. What is the level map, and how is it computed? Are the levels similar to the dependency levels given by a topological sort?
1. Start with source vertices of newly added edges marked as white and in the list of start vertices (see below); color the other vertices black.
2. In examine_edge, if the target's level is larger than the source's level + 1, set the target's color to white.
How are the levels initialized in the first place? And once I had performed the multi-source BFS, would these levels correspond to dependency levels given by the topo sort?
I could imagine that I might mark nodes as they are
encountered so that other BFS from other changed nodes won't traverse those as well. With boost graph, how can I control the BFS to stop recursion if a vertex has already been visited?
There are multi-source BFS versions in Boost.Graph now, although they are not documented; the version to use is:
template
void breadth_first_search (const VertexListGraph& g, SourceIterator sources_begin, SourceIterator sources_end, Buffer& Q, BFSVisitor vis, ColorMap color); I guess then once I have seen all the stale vertices, I can make a
smaller graph that only has the stale vertices and their connections, and I can topologically sort the sub-graph to find the parallel build order?
An efficient incremental topological sort is apparently hard to do; BFS isn't as fast as a specialized algorithm for that, but should work.
What I was thinking I would do (since I don't quite grok the level map idea) was to use the multi-source BFS to simply grab out the nodes that need to be operated on (with the edges that exist between those nodes, haven't considered yet how to do this). Once I had that list, I was thinking that I could build a smaller graph and topo sort that graph. However, it seems that I would need a re-mapping table from vertices in one graph to vertices in the other, which might be a pain. I much prefer the idea of your solution, assuming that the levels correspond to independent sets, but I don't quite get all the details. Thanks for your help, Brian

On Tue, 20 Nov 2012, Brian Budge wrote:
On Fri, Nov 16, 2012 at 7:57 AM, Jeremiah Willcock
wrote: Hopefully I am far enough along to ask some reasonable questions now. Does that mean that you are only adding dependencies, not removing them? I may rarely remove, rarely enough that I can do something less efficient in those case.
OK.
It depends on if you think a lot of the BFS levels will be changed by your updates. If you think they will be, just re-do the BFS from scratch. If not, you might want to keep a level map, then stop on any vertices whose level values don't change when running the BFS from the updated sources. What I think will work (not tested) is to:
I'm not 100% sure I understand about the level map. I can see that if I just mark all vertices white, and then mark black as vertices are completed, common subpaths will not be traversed when BFSing from multiple vertices. What is the level map, and how is it computed? Are the levels similar to the dependency levels given by a topological sort?
It's similar in some ways. The level/distance map gives the distance from the source vertex to each vertex in the graph (where each edge has length 1).
1. Start with source vertices of newly added edges marked as white and in the list of start vertices (see below); color the other vertices black.
2. In examine_edge, if the target's level is larger than the source's level + 1, set the target's color to white.
How are the levels initialized in the first place?
By a visitor in the original (non-incremental) BFS; see http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/distance_recorder.html for details.
And once I had performed the multi-source BFS, would these levels correspond to dependency levels given by the topo sort?
I don't think so, although you can probably use them for similar purposes.
I could imagine that I might mark nodes as they are encountered so that other BFS from other changed nodes won't traverse those as well. With boost graph, how can I control the BFS to stop recursion if a vertex has already been visited?
There are multi-source BFS versions in Boost.Graph now, although they are not documented; the version to use is:
template
void breadth_first_search (const VertexListGraph& g, SourceIterator sources_begin, SourceIterator sources_end, Buffer& Q, BFSVisitor vis, ColorMap color); I guess then once I have seen all the stale vertices, I can make a smaller graph that only has the stale vertices and their connections, and I can topologically sort the sub-graph to find the parallel build order?
An efficient incremental topological sort is apparently hard to do; BFS isn't as fast as a specialized algorithm for that, but should work.
What I was thinking I would do (since I don't quite grok the level map idea) was to use the multi-source BFS to simply grab out the nodes that need to be operated on (with the edges that exist between those nodes, haven't considered yet how to do this). Once I had that list, I was thinking that I could build a smaller graph and topo sort that graph.
That should work, but the incremental BFS would probably be faster. If you want to sort part of a graph, just use filtered_graph rather than doing a copy.
However, it seems that I would need a re-mapping table from vertices in one graph to vertices in the other, which might be a pain.
filtered_graph should take care of that.
I much prefer the idea of your solution, assuming that the levels correspond to independent sets, but I don't quite get all the details.
Levels don't correspond to independent sets (there can be edges within a level, such as in the DAG a -> b -> c, a -> c). The explanation above might be helpful for more understanding; otherwise, please ask more questions. -- Jeremiah Willcock

And once I had performed the multi-source BFS, would these
levels correspond to dependency levels given by the topo sort?
I don't think so, although you can probably use them for similar purposes.
Thanks for the help so far. This is quite close, bit not quite the right answer. I need the distance at each vertex to be the max of the distance of its predecessors. This requires all source vertices from in-edges to be visited before the target vertex. Is there another construction in bgl that will automatically give that visitation order? Thanks. Brian

On Tue, 27 Nov 2012, Brian Budge wrote:
And once I had performed the multi-source BFS, would these
levels correspond to dependency levels given by the topo sort?
I don't think so, although you can probably use them for similar purposes.
Thanks for the help so far. This is quite close, bit not quite the right answer. I need the distance at each vertex to be the max of the distance of its predecessors. This requires all source vertices from in-edges to be visited before the target vertex.
Is there another construction in bgl that will automatically give that visitation order?
Look at dag_shortest_paths (using 1 as edge length and std::greater as comparison function). -- Jeremiah Willcock
participants (2)
-
Brian Budge
-
Jeremiah Willcock