
On Aug 24, 2006, at 4:49 AM, Magnus Ekdahl wrote:
A while ago I had a problem with the MST algorithm in boost. Essentially profiling reviled that the MST algorithm was a bottleneck for the computation I was performing. Thus I had to investigate the performance bottlenecks in boost mst and with this mail I want to investigate the potential for improving current boost implementation.
What is the proposed way of getting the prim algorithm updated in boost?
Just sent a note to the Boost mailing list, as you've done.
First let me argue that there is room for improvement. I have made a test program that constructs 1000 random complete graphs with 1000 nodes and the execution time is as follows
actually implementing jarniks algorithm prim_minimum_spanning_tree 47.585s
current prim (djikstra wrapper) 3m2638s (338% slower)
kruskal_minimum_spanning_tree 40m57s (5063% slower)
Those times seem... horrible. What compiler are you using? With which optimization flags? Can you post the sample code, so we can see what is going wrong?
I also want to take the opportunity to ask about the meaning of the minimum spanning tree algorithm as such, since efficient implementations in the future could be helped a lot by knowing exactly what the problem that should be solved is.
Wikipedia has a decent definition: http://en.wikipedia.org/wiki/Minimum_spanning_tree
Given the hardness to understand the actual generality of the MST problem the current implementation tries to solve, I find it difficult to see how efficient algorithms can be heuristically selected. I.e. can you use num_edges(g) as an indicator of sparseness?
We would need to run some benchmarks to determine what heuristic to use to dynamically select the most efficient algorithm. We haven't really done this yet for the BGL, but we should. MST is one good example; all-pairs shortest-paths is another good example, because, e.g., the Floyd-Warshall algorithm is much better for dense graphs while Johnson's algorithm is much better for sparse graphs. I imagine that we could use some kind of threshold for num_edges(g)/ (num_vertices(g)*num_vertices(g)) to decide whether a graph is sparse or dense. Doug