[BFL] Efficient neighbourhood search in trees
Hi all, I'm trying to calculate the average edge weight within a certain neighbourhood for each vertex in a tree. Basically, I have a (minimum spanning) tree and I want to remove all edges whose weight is much larger than the weight of edges closeby. The result would be a forrest of several connected components (I'm working on a clustering algorithm). Currently I naively iterate over each edge. For each edge, I find the source and target vertex. Then, for both the source and the target vertices, I calculate the average weight in a certain neighbourhood using a recursive function as following simplified pseudocode indicates: E: List of edges in MST getNeighbourhoodStats( depth, startvertex, edge ){ out_edges = getIncedentEdges(startvertex) foreach out_edges as out if( out!=edge){ totweight += getWeight(out); if(depth>0) getNeighbourhoodStats(depth-1, target(out), out) } end } main(){ foreach E as e //Get the total weight on the left of the edge int totweight_left=0; getNeighbourhoodStats( 5, source( e), E, totweight_left ) //Get the total weight on the right of the edge int totweight_right=0; getNeighbourhoodStats( 5, target( e), E, totweight_right ) end } However, this approach is rather inefficient as a lot of calculations are redundant (takes about 10ms for 300 edges which is way too slow for the purpose I had in mind). Any ideas for a better solution? Tnx a lot, Kind regards, Vincent
On Wed, 3 Nov 2010, Vincent Spruyt wrote:
Hi all,
I'm trying to calculate the average edge weight within a certain neighbourhood for each vertex in a tree. Basically, I have a (minimum spanning) tree and I want to remove all edges whose weight is much larger than the weight of edges closeby. The result would be a forrest of several connected components (I'm working on a clustering algorithm).
Currently I naively iterate over each edge. For each edge, I find the source and target vertex. Then, for both the source and the target vertices, I calculate the average weight in a certain neighbourhood using a recursive function as following simplified pseudocode indicates:
E: List of edges in MST
getNeighbourhoodStats( depth, startvertex, edge ){
out_edges = getIncedentEdges(startvertex)
foreach out_edges as out if( out!=edge){ totweight += getWeight(out);
if(depth>0) getNeighbourhoodStats(depth-1, target(out), out)
} end }
main(){ foreach E as e
//Get the total weight on the left of the edge int totweight_left=0; getNeighbourhoodStats( 5, source( e), E, totweight_left )
//Get the total weight on the right of the edge int totweight_right=0; getNeighbourhoodStats( 5, target( e), E, totweight_right ) end }
However, this approach is rather inefficient as a lot of calculations are redundant (takes about 10ms for 300 edges which is way too slow for the purpose I had in mind). Any ideas for a better solution?
I would suggest using dynamic programming. Keep a map from
On Wed, 3 Nov 2010, Jeremiah Willcock wrote:
On Wed, 3 Nov 2010, Vincent Spruyt wrote:
Hi all,
I'm trying to calculate the average edge weight within a certain neighbourhood for each vertex in a tree. Basically, I have a (minimum spanning) tree and I want to remove all edges whose weight is much larger than the weight of edges closeby. The result would be a forrest of several connected components (I'm working on a clustering algorithm).
Currently I naively iterate over each edge. For each edge, I find the source and target vertex. Then, for both the source and the target vertices, I calculate the average weight in a certain neighbourhood using a recursive function as following simplified pseudocode indicates:
E: List of edges in MST
getNeighbourhoodStats( depth, startvertex, edge ){
out_edges = getIncedentEdges(startvertex)
foreach out_edges as out if( out!=edge){ totweight += getWeight(out);
if(depth>0) getNeighbourhoodStats(depth-1, target(out), out)
} end }
main(){ foreach E as e
//Get the total weight on the left of the edge int totweight_left=0; getNeighbourhoodStats( 5, source( e), E, totweight_left )
//Get the total weight on the right of the edge int totweight_right=0; getNeighbourhoodStats( 5, target( e), E, totweight_right ) end }
However, this approach is rather inefficient as a lot of calculations are redundant (takes about 10ms for 300 edges which is way too slow for the purpose I had in mind). Any ideas for a better solution?
I would suggest using dynamic programming. Keep a map from
to total weight. First, fill in with zero for all v, then recursively fill in with the sum of for all neighbors w of each v. This will not exclude the edge you're currently testing from the sum, though.
The way I suggested can also counting some edge weights multiple times, so
the update formula may need to be
Jeremiah Willcock wrote:
On Wed, 3 Nov 2010, Jeremiah Willcock wrote:
On Wed, 3 Nov 2010, Vincent Spruyt wrote:
Hi all,
I'm trying to calculate the average edge weight within a certain neighbourhood for each vertex in a tree. Basically, I have a (minimum spanning) tree and I want to remove all edges whose weight is much larger than the weight of edges closeby. The result would be a forrest of several connected components (I'm working on a clustering algorithm).
Currently I naively iterate over each edge. For each edge, I find the source and target vertex. Then, for both the source and the target vertices, I calculate the average weight in a certain neighbourhood using a recursive function as following simplified pseudocode indicates:
E: List of edges in MST
getNeighbourhoodStats( depth, startvertex, edge ){
out_edges = getIncedentEdges(startvertex)
foreach out_edges as out if( out!=edge){ totweight += getWeight(out);
if(depth>0) getNeighbourhoodStats(depth-1, target(out), out)
} end }
main(){ foreach E as e
//Get the total weight on the left of the edge int totweight_left=0; getNeighbourhoodStats( 5, source( e), E, totweight_left )
//Get the total weight on the right of the edge int totweight_right=0; getNeighbourhoodStats( 5, target( e), E, totweight_right ) end }
However, this approach is rather inefficient as a lot of calculations are redundant (takes about 10ms for 300 edges which is way too slow for the purpose I had in mind). Any ideas for a better solution?
I would suggest using dynamic programming. Keep a map from
to total weight. First, fill in with zero for all v, then recursively fill in with the sum of for all neighbors w of each v. This will not exclude the edge you're currently testing from the sum, though. The way I suggested can also counting some edge weights multiple times, so the update formula may need to be
= sum(w in neighbors(v), ) - degree(v) * or something like that. You might want to do some small examples by hand to make sure you get the formula right.
He said it was a tree so there shouldn't be multiple paths from any vertex to another vertex in its neighborhood. You should be able to accumulate total weight from the bottom up traversal of the topological order starting at the leaves. Then do the reverse topological order and sum the two. Don't count your own weight twice. Store total weight as a vector indexed by depth and shift of the last element then merge vectors of out edges and add your own weight at the beginning of the vector for each node as you traverse up the tree. Delete the vectors on the out edges as you go up to save memory. Even if the tree has no "root" it has leaves that we can start from, and no cycles, so it should be simple. However, my understanding of the problem at hand could be completely wrong. I'm guessing it is bidirectional and in and out have the same weight because they are really just one edge. Perhaps if we had a clear specification for what exactly the desired result of the clustering algorithm is and what exactly the input is we could solve the problem instead of solving the solution? In any case it should be O(n * depth) where n is the number of nodes to cache all the total neighborhood weights on all the nodes. If you want to remove edges iteratively and update the weight stored on all the nodes in its neighborhood for each edge removed, which is as expensive as summing their weights, but it is O(edges removed) instead of O(edges) neighborhood traversals, which should at least be better. I can't see how the algorithm could be much worse than O(n^2) if you remove O(n) edges and have O(n) nodes and on average O(n) nodes within the neighborhood of any given node throughout. In normal usage I would expect << O(n) edges removed and << O(n) nodes in the neighborhood, but then I have no idea what the trees or depths that would be considered "normal" might look like. you'll probably have a priority queue of edges that are candidates for removal and want to remove and replace them as their nodes total weights get updated so you don't have to do a linear scan on edges at each iteration to find the next edge to remove, assuming that is what you have in mind. Regards, Luke
On Thu, November 04, Simonon Lucanus J wrote:
On Wed, 3 Nov 2010, Jeremiah Willcock wrote:
On Wed, 3 Nov 2010, Jeremiah Willcock wrote:
On Wed, 3 Nov 2010, Vincent Spruyt wrote:
[...] Currently I naively iterate over each edge. For each edge, I find the source and target vertex. Then, for both the source and the target vertices, I calculate the average weight in a certain neighbourhood using a recursive function as following simplified pseudocode indicates: [...] However, this approach is rather inefficient as a lot of calculations are redundant (takes about 10ms for 300 edges which is way too slow for the purpose I had in mind). Any ideas for a better solution?
[...] The way I suggested can also counting some edge weights multiple times, so the update formula may need to be
= sum(w in neighbors(v), ) - degree(v) * or something like that. You might want to do some small examples by hand to make sure you get the formula right. He said it was a tree so there shouldn't be multiple paths from any vertex to another vertex in its neighborhood. You should be able to accumulate total weight from the bottom up traversal of the topological order starting at the leaves. Then do the reverse topological order and sum the two. Don't count your own weight twice. Store total weight as a vector indexed by depth and shift of the last element then merge vectors of out edges and add your own weight at the beginning of the vector for each node as you traverse up the tree. Delete the vectors on the out edges as you go up to save memory. [...]
Thank you both for your responses. I eventually decided to use the naive approach anyway. Actually, I did implement a dynamic programming version of the algorithm, but tracking the weight of the 'depth' neighbouring edges and merging them when paths come together actually slows down the method a lot. In the end, the dynamic programming based implementation of the algorithm is faster than the naive whan if a depth of more than 130 is used because it is linear in complexity. However, in my application I only need a depth of 5 or something in which case the naive approach is way faster. Actually, the naive implementation does what I need in about 0.7ms and not the 10ms I reported earlier. The 9.3ms difference here is only because I used a filtered_graph as a view into a larger graph... which obviously was not a good idea :) So just copying the tree into a new graph and then naively solving the problem seems to be the best solution here. Kind regards, Vincent
participants (3)
-
Jeremiah Willcock
-
Simonson, Lucanus J
-
Vincent Spruyt