[Boost-users] [BGL] Stoer–Wagner min-cut algorithm

Hello Jeremiah, I would very much like to contribute my implementation of the Stoer–Wagner min-cut algorithm to BGL. In preparation for this, I have attached a "pre-release" version to this e-mail which has everything needed to build the test case with g++ and GNU Make. Before I add in all of the template code to support "default" template parameters and other niceties, I figured that it would be good to have a review of the code to make sure that I am headed in the right direction. Also, I have tested my code against the example from Stoer & Wagner (1997). It appears to be working correctly, but there might be bugs. I think that it would be good to have more testing. Would someone please review my code? Also, is anyone aware of good examples for testing min-cut implementations? Daniel

On Sun, 20 Jun 2010, Daniel Trebbien wrote:
One thing you might want to do in the test case is to use the adjacency_list constructor that takes a list of edges (expressed using pairs of vertex numbers) -- see <URL:http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/adjacency_list.html #sec:iterator-constructor>. You can then use the vertex(n, g) function to access individual vertices by number -- see the "Structure Access" section of the page I linked to above for information on this. -- Jeremiah Willcock

On Wed, 23 Jun 2010, Daniel Trebbien wrote:
Sorry -- I hadn't gotten around to looking through your code yet since I was on vacation through Monday. A mail attachment is fine; I will look through your code later today. Look at <URL:http://www.boost.org/users/license.html> for license information -- the directions are about 2/3 of the way down the page. -- Jeremiah Willcock

On Sat, 19 Jun 2010, Daniel Trebbien wrote:
I skimmed through your code, and saw couple of little things to change: - Boost code doesn't use tabs (<URL:http://www.boost.org/community/policy.html#tabs>) - I don't think you need to use BOOST_DEDUCED_TYPENAME; I just use "typename" myself and haven't had any problems. I think it might be a workaround for compilers that are not used anymore. - Your header guard should start with BOOST_GRAPH_ to avoid conflicts with other header files. - You can replace "c.size() > 0" by "!c.empty()"; this is likely to be faster for many containers. - Your non-detail code should be in the "boost" namespace. - From a superficial look, it seems like maxAdjSearchList might be implementable as a heap (or some other priority queue) rather than as a multiset. Am I mistaken on that? - Operations on graphs and property maps (but not on tuples) need to use unqualified names (such as "get" and "target") in order for ADL to work on user-defined graph types that are not in the boost namespace. Calls to "tie", however, must be qualified (I had to fix many of those yesterday because unqualified calls to tie break C++0x compilers). - Is there a reason that you do not handle parallel edges? Is it just a simplification in the code? - It looks like the code handling sOutEdges and tOutEdges is unnecessarily complicated. Are you just intersecting the two lists of out edges? If so, you can use lookup_edge (from <boost/graph/lookup_edge.hpp>) to allow edge lookups in constant time for adjacency matrices (however, unlike the map-based implementation, lookup_edge takes linear time when it must actually do a search). - You may want to use the macros in <boost/graph/iteration_macros.hpp> to simplify your code. Those macros replace explicit declarations of iterators and tie operations when you are doing a simple iteration through vertices, out edges, etc. If you use them, please include <boost/graph/iteration_macros_undef.hpp> at the end of your file. - Property maps are passed by copy; they are intended to be lightweight and using copies allows temporary property maps to be passed in. - Are users always going to pass in all of the parameters you have in your algorithm? If not, overloads or named parameters may be worthwhile to add. - As you stated, your code needs a license banner. - You will eventually need a documentation file, preferably in HTML that is styled like the existing BGL documentation. - I think you can just attach the header file to your emails, since the test code is in a separate email anyway. Also, it would make my life easier if you wrote your code/tests as if your header was in boost/graph/, since that is where it will end up eventually. If you do that, I will test it in place there as well. -- Jeremiah Willcock

Hi Jeremiah, Thank you for your feedback. This gives me a lot to work on, which I will be implementing soon. I have addressed some of your points below. On 2010-06-23, Jeremiah Willcock <jewillco@osl.iu.edu> wrote:
You are correct that I should use a maximum priority queue. The algorithm is best implemented with a Fibonacci heap, but I couldn't get the C++ implementation that appeared in Dr. Dobb's Journal to work. Granted, this was early on and the segmentation fault that I was getting could have been caused by my own code. I also looked at `std::priority_queue`, but it does not implement `increasekey`. For this algorithm, I need to associate each vertex with a value w(A), which is the sum of weights of edges containing the vertex and a point in a set A of vertices. Each iteration until the priority queue is empty, I need to select and remove the vertex u with the greatest w(A) to be added to A, thus requiring an update (increasekey) of the w(A) values of vertices v in the priority queue such that {u, v} is in E, the set of edges. How do other BGL algorithms implement priority queues having the `increasekey` operation?
I am not sure what you mean by "need to use unqualified names". Do you mean without the `boost::` namespace qualification? Also, I do not understand what "calls to 'tie' ... must be qualified" means.
Named parameters would be good. What Boost libraries can I use for named parameters?
Daniel

On Wed, 23 Jun 2010, Daniel Trebbien wrote:
We have several heap implementations in BGL that support the increasekey operation. The one you may want to use is the d-ary heap (<boost/graph/detail/d_ary_heap.hpp>) since it seems to be the fastest in practice (although not in theoretical complexity); there are comments in that file on how to use it. Look at <boost/graph/dijkstra_shortest_paths.hpp> for example uses of several of the heaps. Note that the d-ary heap includes an indirect comparison (comparing values from a weight or distance map) natively, while some of the other heaps need the indirect_cmp function object (from <boost/pending/indirect_cmp.hpp>) to implement that. Dijkstra's algorithm uses all of the things I mentioned so you can look there to see sample code.
Yes.
Also, I do not understand what "calls to 'tie' ... must be qualified" means.
Those must have a namespace qualification.
Although we have wanted to move to Boost.Parameter for a while, BGL currently uses its own named parameter mechanism. I don't know if it is documented anywhere; your best hope would probably be to look through another algorithm in BGL that uses named parameters and see what the calls look like. Some of the handling of property maps is complicated, though, so you may want to wait on adding named parameters until the rest of the algorithm is finished and ready to submit. -- Jeremiah Willcock

Attached is version 0.2 of my Stoer–Wagner algorithm implementation and `test.cpp`. `test.cpp` now depends on three data files that were generated by a random min-cut problem generator. These data files have been attached as well. I am currently working on a documentation page, but other than that and "named parameters", I believe that I have addressed all of the feedback items. Daniel

On 2010-07-01, Daniel Trebbien <dtrebbien@gmail.com> wrote:
Attached is version 0.2 of my Stoer–Wagner algorithm implementation and `test.cpp`.
Also, in case anyone is wondering, version 0.1 has a bug, so it should not be used. The interesting thing about it, though, was that the bug does not manifest even if one steps through the entire algorithm with `gdb`, verifying each number for the two, original test cases. It took a specially-crafted example (now `test3` but without the zero edge) to identify it. So maddening :)

(Resending due to 75 KB message size limit. PNGs converted to GIFs) Attached is version 0.3 of my Stoer–Wagner algorithm implementation, which now has support for named parameters, and `test.cpp`. In implementing named parameters, I had to make some (very minor) changes to `boost/graph/named_function_params.hpp`. Attached is a unified diff of this file from the version that was distributed with Boost v. 1.42 to my version. In `stoer_wagner_mincut.hpp`, lines 199 through 268 should probably be merged into `boost/graph/named_function_params.hpp` if this approach to creating default UTIL property maps is okay. I wasn't sure about it, so someone should review this. Also attached is a ZIP archive of my documentation. It has the images and an example, `stoer_wagner.cpp`, which needs to be moved into the appropriate directory. `index.html` contains content for a new sub-section of the "Review of Elementary Graph Theory" section as well as content for a section on the `stoer_wagner_mincut` function to appear in the "Algorithms" chapter. Daniel

Attached is version 0.3 of my Stoer–Wagner algorithm implementation
I cleaned up the list of includes in `stoer_wagner_mincut.hpp`, made a few, minor changes to comments, and replaced `*vertices(g).first` as the initial value of the default assignments map with `vertex_descriptor()`. Attached is a unified diff from v. 0.3 `stoer_wagner_mincut.hpp` to what I am calling v. 0.3.1 `stoer_wagner_mincut.hpp`. Lines 199 through 268 of v. 0.3 are now lines 194 through 263 of v. 0.3.1.

On Sun, 4 Jul 2010, Daniel Trebbien wrote:
Here are my comments: I appreciate that you changed to Boost naming conventions. I was going to comment on that in the last version but you changed it before I commented on it. Why do you have IndexInHeapMap as an input parameter to the algorithm? Do you expect that a user would ever pass it in directly? What is the difference between parity_map and vertex_parity_map, especially considering that you do not appear to use the second in your algorithm? Will you ever use vertex_parity_map as a named parameter name? That is the only reason it should appear in the list of duplicate parameter names. I think just having vertex_parity_map as an alias to parity_map is not enough of a reason to have it as a parameter name. The code in <boost/graph/bipartite.hpp> uses a partition map to record its result, either passed in by the user or created internally. Is it possible for you to use that name as well? Or does it not make sense for your algorithm? There is currently on named parameter for that, since the code in bipartite.hpp does not use named parameters; that could be added, though. Your default for parity_map is dummy_property_map; did you mean instead to use the vertex_parity_map from the graph if it has one? If vertex_assignment_map is a temporary for the user-exposed parts of your algorithm, it should not be an explicit parameter; it should be generated instead (like index_in_heap_map is for dijkstra_shortest_paths()). BOOST_GRAPH_DECLARE_CONVERTED_PARAMETERS does not have a semicolon after the argument list. I refactored the color_map_maker code in named_function_params.hpp so it would be more useful for you. You should now just be able to write objects like make_color_map_from_arg_pack for the property maps that you need to create. Note that if you have a different value type for each map that you are creating, you can leave off the second template argument to make_property_map_from_arg_pack_gen (and the default value from its constructor) and then pass the default value that you want into the generated class's operator()(). Switching to two-space indentation and K&R-style formatting in your code will make the lines shorter and make it fit better with existing BGL code. Could you please use a Boost directory layout for your code? In particular, the names test.cpp and index.html are not going to be the names that your files have eventually, and having spaces in file names is not allowed in Boost. Is the .cpp file in your documentation directory meant to be an example? If so, please move it into libs/graph/example in your tarball. You do not need to create Jamfiles, but please note in a README or somewhere if any of your tests or examples need to be linked to any libraries or need any command line arguments. Where do you think your function should go in the BGL table of contents? What name should I use for it and what section should it be under? -- Jeremiah Willcock

Hello Jeremiah, I commented below.
Why do you have IndexInHeapMap as an input parameter to the algorithm? Do you expect that a user would ever pass it in directly?
I figured that I would allow it as an input parameter for the same reason as why `assignments` and `wAs` are possible input parameters: all three are utility maps that have meaningless values at algorithm completion, so the temporary, underlying storage of the maps can be created if not specified, but the user might not need to create a `std::vector` backing for them because perhaps the per-vertex utility values were made into vertex properties, or perhaps the user wishes to use a container that is backed by a custom allocator. Following similar reasoning, I was thinking about a HeapContainer input parameter. What do you think?
Oh, my mistake. I meant to alias `vertex_assignment_map` and `assignment_map` rather than `vertex_parity_map` and `parity_map`. I was thinking about writing an implementation of a randomized min-cut algorithm by Karger that merges edges rather than vertices. This might require an `edge_assignment_map` and `vertex_assignment_map`. I'm not sure. Perhaps this change can come later if I decide to implement this algorithm.
I could use "black" and "white". That will work.
Your default for parity_map is dummy_property_map; did you mean instead to use the vertex_parity_map from the graph if it has one?
No, I figured that the user might not be interested in the vertex parities, so the default `dummy_property_map` would throw away this information.
Okay
I see it in Subversion, and will switch to using it. Thank you.
Switching to two-space indentation and K&R-style formatting in your code will make the lines shorter and make it fit better with existing BGL code.
Sure, I'll switch the code to two spaces. What style changes do I need to make to make it K&R style?
Could you please use a Boost directory layout for your code?
Sure
Where do you think your function should go in the BGL table of contents? What name should I use for it and what section should it be under?
In the "Algorithms" chapter, I think that a section titled "Min-Cut Algorithms" should be added between "Maximum Flow and Matching Algorithms" and "Sparse Matrix Ordering Algorithms" because min-cuts are related to max-flow algorithms: http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem

On Sun, 4 Jul 2010, Daniel Trebbien wrote:
Right now, Dijkstra's algorithm creates those structures internally in the normal case. I'm fine with making the heap an optional parameter, but not the underlying storage. I guess it makes sense to allow the index_in_heap_map to be a parameter; that could then be added to Dijkstra's algorithm as well.
OK. Is there a reason that you need two names for that? It is much simpler to have only one name for each parameter; that avoids having to use the duplicate name list (which is for parts of BGL that have aliased parameter names for backwards compatibility).
We can just add both of those now.
It's fine if the partition map has bool as its value type, so you don't need to change your code other than the name.
I thought that the parities were the (only) output of the algorithm. Is there ever a case where the user would not want that?
It turns out that we actually use a two-space variant of the Java standard coding style (that you can find online). I think <URL:http://osl.iu.edu/download/research/mtl/papers/code_stds.pdf> (see pages starting at 25) is still reasonably accurate, except for the formatting of the opening braces of functions. Of course, anything that's explicitly specified in the Boost requirements takes precedence.
Could you please use a Boost directory layout for your code?
Sure
Thanks. That will make testing and integrating it much easier.
Are you sure that you don't want me to add "min-cut" to the maximum flow section title and then put your code in there? How related is your code to standard max-flow algorithms? BTW, what is the timeframe that you expect to have this finished? Are you targeting getting it into 1.44.0? That would require a working version of the code by tomorrow (July 5), but that would not need documentation and would probably only require minimal tests. It would be more pressure, though, and it is probably not too important to get it into the absolute next version. -- Jeremiah Willcock

There's no particular reason. I was just trying to emulate what I perceived to be the convention of creating an alias for parameter names that might be confused.
I don't know for sure. Personally I think that I would always want to know, but maybe someone does not. In my view it's kind of like the `comp` map of `connected_components`. It is a required parameter, but there were a couple of cases where I just needed to know whether a graph was connected, so I was only interested in the return value.
I don't think that a section on min-cuts should go *in* the section on maximum flow algorithms because although the two are related, a min-cut algorithm does not need to exploit the duality of maximum flow/weight of minimum cut. In fact, the Stoer–Wagner algorithm does not, and a number of recent algorithms do not. It makes sense to me that a user would expect the section on min-cut algorithms to immediately follow the section on maximum flow algorithms because historically, maximum flow was seen as *the* solution to finding min-cuts for a long time. (Karger & Stein (1996). "A new approach to the minimum cut problem".) Today, the relationship of minimum cuts to maximum flow problems is mentioned more for historical context.
I think that 1.44 is too soon. Perhaps the next release :)

Hi Daniel, I wonder if your min-cut algorithm can be used to find max flow, since min-cut and max-flow are dual? I am currently using two max-flow implementations in BOOST (i.e, Goldberg and Kolmogorov). What is the complexity compared to the two max-flow implementations in BOOST. I am looking for fastest max flow implementation available. Another related question: does BOOST offer parallel implementation of any max-flow algorithm? Thanks Dan _________________________________________________________________ Look 'em in the eye: FREE Messenger video chat http://go.microsoft.com/?linkid=9734386

Hi Dan, I should have been more precise. Given two vertices s and t where s is called the source and t is called the sink, the maximum flow from s to t is equal to the weight of a minimum s-t cut. Thus, a maximum flow algorithm can compute a minimum s-t cut of a graph. The Stoer–Wagner algorithm which I implemented calculates a min-cut of the graph, which is a slightly different problem. A cut of an undirected graph G = (V, E) is a partition of the vertices into two, non-empty sets X and Y = V - X. The weight w(X, Y) of a cut is the number of edges between X and Y if G is unweighted, or the sum of the weights of edges between X and Y if G is weighted. Given two vertices s and t, an s-t cut is a cut (X, Y) that separates s and t; that is, either s is in X and t is in Y or t is in X and s is in Y. A minimum s-t cut is an s-t cut of minimal weight. Historically, one way of computing a min-cut of a graph was to calculate the max-flow for every pair of vertices (s, t) and simply pick a minimum s-t cut of minimal weight. This would be a min-cut. I think that it is correct to say that if a min-cut is (X, Y), then for all x in X and y in Y, (X, Y) is also a minimum x-y cut with the max-flow from x to y and vice versa (because G is undirected) being the min-cut weight w(X, Y). I think that it is also correct to say that the least maximum flow between any two vertices of a graph is the weight of a min-cut. Intuitively speaking, because maximum flow is calculated between a given source and sink, then a maximum flow algorithm will be better than an min-cut algorithm, which has to consider all pairs of vertices. So, if you are given two vertices s and t and you want to know the maximum flow from s to t, you should use a maximum flow algorithm. Also, a min-cut (X, Y) of G as returned by the Stoer–Wagner algorithm might not be a minimum s-t cut; both s and t could be in X or Y. As far as parallel implementations of maximum flow algorithms, I do not know whether Parallel BGL offers parallel maximum flow algorithm implementations.

Hi Daniel, Thanks for the clarification. What I really need to is to find source set of a min-cut between s(source) and t(target) in a directed graph, and hence a maximum closure of the graph. So, I need to find a s-t min-cut, not just any min-cut. Can Stoer-Wager's min-cut be forced to find s-t min-cut only (and thus has reduced time complexity)? If not then, I guess I'll have to stick with a max-flow algorithm. Thanks, Dan
_________________________________________________________________ Turn down-time into play-time with Messenger games http://go.microsoft.com/?linkid=9734385

On Mon, 5 Jul 2010, Daniel Trebbien wrote:
Yes -- look at <URL:http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/Buffer.html> for the basic queue concept. If you need updating, there does not appear to be a defined concept for that, but you should use the interface that dijkstra_shortest_paths() uses for its queue. -- Jeremiah Willcock

I was trying to figure out how I could pass in a priority queue to a utility function of my Stoer–Wagner implementation that would also give me access to the "distances" that are associated to values in the priority queue. In short, I don't think that this is currently possible. But, I only need one method: something like `keys() const` on an updatable priority queue could return a readable property map with the key type being the value type of the queue and the value type being a `key_type` that is the distance type. It seems like I got this backward, I know, but in the papers that I have been reading, the `pop` method of a priority queue returns the value with the highest key. In my utility function, I have: const vertex_descriptor u = pq.top(); // u = extractmax(PQ) w = get(distances, u); pq.pop(); What I would like is: const vertex_descriptor u = pq.top(); // u = extractmax(PQ) w = get(pq.keys(), u); pq.pop(); This way, I could pass in the heap without worrying about not having access to the distances map. Attached is a file containing concept definitions for `Buffer` and two, new concepts that I am proposing: `UpdatableQueue` and `KeyedUpdatableQueue`. Currently `boost::d_ary_heap_indirect` satisfies `UpdatableQueue` (I have checked this), and I need it to satisfy `KeyedUpdatableQueue`. The way that it could satisfy `KeyedUpdatableQueue` rather easily is `typedef` `typename boost::property_traits<DistanceMap>::value_type` as `key_type` and a new public member: DistanceMap keys() const { return distance; }

A small update: I had added another method `clear()` to the `UpdatableQueue` concept, but I don't think that I will need it. Attached is an updated version of `buffer_concepts.hpp`. Also attached is a patch for `boost/graph/detail/d_ary_heap.hpp`. After applying the patch, `boost::d_ary_heap_indirect` models `KeyedUpdatableQueue`. I am working on documentation pages for the `UpdatableQueue` and `KeyedUpdatableQueue` concepts.

On Mon, 5 Jul 2010, Daniel Trebbien wrote:
Why do you need to get the "distances" (values) from the heap itself rather than from a separate distance map? The heap itself does not store distances, after all; it needs a separate data structure (or function) to get the distance to any given vertex. In fact, the fact that there is a distance map at all is a convenience/performance optimization in d_ary_heap; the Boost mutable_queue doesn't use a property map for distances at all (other than through indirect_cmp). What are you using the keys() map for? Maybe there is another way to do the same thing. -- Jeremiah Willcock

I liked your idea of making the heap/max-priority queue an optional parameter, so I started work on allowing this. The "phase" routine of the Stoer–Wagner algorithm is very similar to the algorithms of Dijkstra and Prim. There is a priority queue that is used to iteratively retrieve the vertex with the highest key/"distance"/wA. I need access to the wA values of vertices because the wA of the vertex that is removed last is the weight of a minimum s-t cut that was computed by the phase. I just finished a batch of changes that allow the optional max-priority queue, and I uploaded the archive of this new version to the Vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.tar.bz2&directory=BGL It requires a patch to `named_function_params.hpp` which I have attached to this e-mail. The v. 0.5 archive includes a modified `libs/graph/doc/graph_theory_review.html`. (BTW, I skipped to 0.5 as the version number because I have a version which I am calling "version 0.4", but this is an intermediary version.)

I had thought that adding `inline` to members was necessary for older compilers, but apparently not. Attached is a new patch for `boost/graph/named_function_params.hpp`.

I have uploaded a new version, v. 0.5.1, of my Stoer–Wagner algorithm implementation to the Vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.1.tar.bz2&directory=Algorithms/graph Changes in this revision include: 1. In `boost::detail::stoer_wagner_phase`, the simple change to obtaining `pq.keys()` once from obtaining `pq.keys()` each time has dramatically improved the performance. On a randomly-generated graph with 500 vertices and 62,864 edges, for example, the runtime went from 5.56 seconds to 1.82 seconds on a Pentium Dual Core. 2. Documentation for the `UpdatableQueue` and `KeyedUpdatableQueue` concepts. 3. Modification of a unit test to pass in a "custom" max-priority queue. Attached are two patches for the current trunk sources, one to `boost/graph/detail/d_ary_heap.hpp` and the other to `boost/graph/named_function_params.hpp`. I changed both files for this version, so if you have applied any of my previous patches, please revert your local copy of these files and apply the new patches. I found the `Algorithms/graph` directory in the Vault, which is probably a better place for BGL archives. Thus, I have uploaded v. 0.5.1 to `Algorithms/graph` and will delete the v. 0.5 archive from the `BGL` directory which I created. Let me know if there are further changes that I should make. Daniel

I uploaded an ever-so-slightly-different new release: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.2.zip&directory=Algorithms/graph& Changes: 1. Switched the example program, `stoer_wagner.cpp`, to using `boost::one_bit_color_map`. 2. Documented the case when `stoer_wagner_min_cut` can throw a `std::invalid_argument` object. 3. Slightly re-formatted the whitespace in `boost/graph/stoer_wagner_min_cut.hpp`

On Fri, 9 Jul 2010, Daniel Trebbien wrote:
I think I'm going to put this in basically the way it is. Do you have a script or something that compiles the .dot files into figures, or at least human-readable directions (script preferred of course)? It would be nice to have that saved for posterity. Also, what do you want me to do with the Buffer concept that is in <boost/graph/graph_concepts.hpp>? -- Jeremiah Willcock

I used Graphviz, but nothing fancy. The undirected graphs were generated with: fdp -Tpng -ograph.png graph.dot and then the resulting `graph.png` was converted to a GIF with GIMP. To draw the directed graph, I used: circo -Tpng -odigraph.png digraph.dot
Also, what do you want me to do with the Buffer concept that is in <boost/graph/graph_concepts.hpp>?
Oh, I removed that from my local copy. If I recall correctly, that `BufferConcept` declaration seemed older in style and did not fully test the given type for conformity to the Buffer concept.

Could you please write this up into a shell script, listing the names of your dot files and what rendering program you used for each?
Hello Jeremiah, I apologize for the delay. Attached is a shell script that will generate GIF images using Graphviz. Unfortunately, it seems that when I tried this script using version 2.26.3 (20100126.1600) of Graphviz on Debian Squeeze, the generated images were different than the images that I generated with Graphviz version 2.24.0 (20090616.2323) on Windows. Currently I do not see a way to use the newer versions of Graphviz to generate the same, or similar, images because I do not think that the DOT format allows fine-tuned control over where nodes are drawn in the images. Also, I experimented with using <boost/graph/dimacs.hpp> and <boost/graph/read_dimacs.hpp> instead of `parse_noigen.hpp`, but I think that the "NOIGEN format" is a modification of the DIMACS format. As explained on http://www.avglab.com/andrew/CATS/maxflow_formats.htm , DIMACS uses "n" lines, whereas NOIGEN does not. This is why the BGL DIMACS parsing utilities kept throwing exceptions. Daniel

On Thu, 15 Jul 2010, Daniel Trebbien wrote:
Does any of the stuff at <URL:http://www.graphviz.org/doc/FAQ.html#Q22> help for that? It would require coordinates, but you can get those by running dot on your graph on a system that you believe produces the desired layout.
How are the source and target of the flow specified in NOIGEN? I.e., what is the equivalent to a DIMACS "n" line? -- Jeremiah Willcock

The `pos` attribute seems like it will work. I will experiment with it to see...
The NOIGEN format specifies a min-cut problem, which does not use source and target vertices in the definition of the problem. So, there is no equivalent to the DIMACS "n" line.

I uploaded a new archive, version 0.5.3, to: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.3.zip&directory=Algorithms/graph& The main differences are: 1. The DOT files have been edited to make use of the `pos` attribute and other, layout-specific attributes. 2. The `make_min_cut_images.sh` script has been updated. A new Windows version, `make_min_cut_images.bat`, has been added. 3. `graph_theory_review.html` and `stoer_wagner_min_cut.html` have been updated ever-so-slightly as a result of the graph images changing ever-so-slightly. 4. I have (hopefully) fixed the issues with line endings. All line endings of text files in the archive should be Windows-style except for the `make_min_cut_images.sh` script, which is Unix-style. Daniel

On Fri, 16 Jul 2010, Daniel Trebbien wrote:
OK.
2. The `make_min_cut_images.sh` script has been updated. A new Windows version, `make_min_cut_images.bat`, has been added.
I just need some kind of script for my records. I'll change the shell script to output PNG files, though, and re-run it (or you can).
OK.
I'll have to convert them on extracting the zip file so I can mark them as text (with native line endings on checkout) in SVN. Having everything Unix-style would be easier for me, but it's just one flag to do the conversion automatically on all text files in the zip archive. BTW, I added a read_dimacs_min_cut() function to read_dimacs.hpp; see if that works for your test cases. I haven't tested it, but it should use the same code as read_dimacs_max_flow() but with the source and sink features turned off (and so removing the "n" line requirements). I've been kind of busy recently, so I'll look through your files in more detail soon. You might want to update your test code and upload a new version, though, to use the new reading function. -- Jeremiah Willcock

I'll change the shell script to output PNG files, though, and re-run it (or you can).
You might want to keep the GIFs. If I recall correctly, the GIF images were smaller files. Plus, none of the HTML files would need to be modified.
BTW, I added a read_dimacs_min_cut() function to read_dimacs.hpp; see if that works for your test cases. I haven't tested it, but it should use the same code as read_dimacs_max_flow() but with the source and sink features turned off (and so removing the "n" line requirements).
I've been kind of busy recently, so I'll look through your files in more detail soon. You might want to update your test code and upload a new version, though, to use the new reading function.
The changes to `read_dimacs.hpp` basically worked. I just needed to modify it so that the parsing would not fail at the 'p' line because the problem type is "cut" rather than "max". Attached is a patch for these changes. I have uploaded a new version, 0.5.4, to the Vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.4.zip&directory=Algorithms/graph& Changes include: 1. Switched to using `boost::read_dimacs_min_cut` instead of `parse_noigen`. 2. Removed `parse_noigen.hpp` 3. Added comments to the example program. 4. Created a new graphic of the min-cut in the example program. Appropriately modified `make_min_cut_images.bat` and `make_min_cut_images.sh`.

On Thu, 12 Aug 2010, Daniel Trebbien wrote:
I believe I fixed that, but using a different approach; please check.
Here are my comments: - I still don't like the KeyedUpdatablePriorityQueue concept in there, and would prefer if you used whichever map directly. - Is the code on line 87 (only update the key if the entry is already in the queue) right, or should it be to always update (based on a default of 0 or whatever)? - What property map is the default priority queue built from? I don't see that in the code. - Have you tried using a custom priority queue in the test case? I'm not sure it will work properly. - Your test case does not need to define its own constant property map; boost::constant_property_map (from <boost/graph/property_maps/constant_property_map.hpp>) should work instead. - Should line 82 be iterating through the entire graph, or would it be faster if all vertices with a given assignment are kept in a data structure for faster access? -- Jeremiah Willcock

Your version of `read_dimacs.hpp` compiles and succeeds with my tests.
Specifically do you not like the `keys` member?
It's correct. The loop on line 76 runs until the priority queue is empty. Each iteration causes one value (vertex_descriptor) to be removed from the priority queue and once a vertex_descriptor is no longer in the queue, then the algorithm does not need to update its key. Because the loop on line 84 is iterating over all out-edges, some target vertices may have already been removed. So, I check.
- What property map is the default priority queue built from? I don't see that in the code.
In the patch for `named_function_params.hpp` that I submitted in a message from July 8th (http://lists.boost.org/Archives/boost/2010/07/168703.php), I adapted your technique for `map_maker` to create `priority_queue_maker`. After applying the patch, lines 574 and 575 use `map_maker` to create a "keys" map and "index in heap" map, respectively.
- Have you tried using a custom priority queue in the test case? I'm not sure it will work properly.
`test_prgen_20_70_2` uses a custom d-ary heap.
Indeed. I will switch to using that.
That's an excellent idea. If I maintained a map from vertex descriptors to lists of descriptors of vertices that were assigned to the vertex, then I would be able to decrease the number of iterations of not only the loop on line 82, but the loop on line 66 as well. Would it be alright if I used a `std::map<vertex_descriptor, std::vector<vertex_descriptor> >` internally or does something like that need to be another parameter? I am thinking that some users might want more control over temporary variables that utilize the heap.

On Fri, 13 Aug 2010, Daniel Trebbien wrote:
OK.
Yes.
Do you have a property map for what is in the queue? Some kind of color map? I think it might be better if there was one; that might simplify the algorithm and allow a wider range of heaps to be used.
I'm not seeing quickly what property map the actual priorities come from. Is it the distance map, like it seems from the documentation? I think that should be passed explicitly into the stoer_wagner_phase() function. Is there ever a case where the priority queue won't be some kind of indirect priority queue on the distance map?
OK -- I didn't see that.
OK.
I think you can use an iterator_property_map or shared_array_property_map for this rather than an std::map. I don't think it needs to be customizable but you can if you want (you might need a new key in named_function_params.hpp for that). Just be careful that the complexity of maintaining the map doesn't cancel out the benefit of not iterating over as many edges. -- Jeremiah Willcock

Specifically do you not like the `keys` member?
Yes.
I don't think that I ever explained my reasoning behind it, but essentially the idea is to reduce the amount of data that the user would have to pass to the function while ensuring that they passed in the correct map. The d-ary heap, or any priority queue, would need access to the map which I call the "keys" map (in the `d_ary_heap.hpp` source it is called the "distances" map). If the priority queue did not have a member to furnish this property map, then in addition to receiving the priority queue to use, the function would need to receive the property map. This means that the user would not only have to pass the priority queue object but the corresponding property map as well. Having to pass in the property map might lead to mistakes, such as accidentally passing in the wrong property map. Also, I think that it would be more convenient for the user if they do not have to retype the name of the property map or even keep a copy in a local variable to be passed to the function later.
I am not sure that a property map for what is in the queue would be helpful. At the start and end of each call to `stoer_wagner_phase`, the queue must be empty.
By default, the priorities are stored in the vertex distances map because I was following the model of `dijkstra_shortest_paths`. `dijkstra_shortest_paths` uses the vertex distances map by default to store "distances", which are temporary values. One of the first operations of `stoer_wagner_phase` is to set these "distances" to 0, so it does not matter what the value of "distance" is for each vertex at the start of `stoer_wagner_phase`. The "distance" values are temporary data only.
Yes, if a custom priority queue does not use the vertex distances map for the "keys"/priorities map. For example, the `test_prgen_20_70_2` test, which passes in a custom d-ary heap object, uses a shared array property map to store the priorities/"distances".
Looking at these property map classes again, I do not see a way to iterate over the entries in the property map. For each vertex that is not assigned to another vertex, I would need to know the list of vertices that are assigned to it. For now I think that I will just use a `std::map<vertex_descriptor, std::vector<vertex_descriptor> >` to see if it improves the speed of the algorithm on a large test instance that I generated.

A quick update: I made the changes, but the performance improvement was slight (around 1 percent). I have identified, however, that the `BGL_FORALL_OUTEDGES_T(uPrime, e, g, UndirectedGraph)` loop is even worse when it comes to iterating through the entire map; toward the latter part of the computation, this loop iterates over more and more of the input graph's edges until the loop covers close to 100% of the edges. I used that loop because with it, it is possible to run the algorithm without modifying or copying the input graph. I am now thinking that I really want to copy the input graph and make changes to the copy. So, I will be adding a "destructive" version of the `stoer_wagner_min_cut` function that will be used on copies. Users could also use the destructive version on a graph that they do not care about.

Version 0.5.5 has been uploaded to the Vault: http://www.boostpro.com/vault/index.php?action=downloadfile&filename=stoer_wagner_min_cut-0.5.5.zip&directory=Algorithms/graph& The only substantive difference in this version is that it includes the code to build up a set of "assigned vertices", leading to the slight performance improvement that I was talking about on August 18. As far as producing a "destructive version", I tried out two variants. One utilized `clear_vertex` and the other utilized a different method of removing edges that needed to be removed as the algorithm encountered them. Interestingly, a benchmark program that I have been using takes 11 times longer to run when `clear_vertex` is used, and I don't know how much longer on the other version because I Ctrl+C it each time. These were unexpected results, but I guess that iterating through more and more of the edges as the computation progresses is a lot more efficient than removing edges from the input graph. Anyway, I believe that this new version has addressed all feedback that I have received so far.

This is a friendly two-week reminder. If there is no additional feedback for improving my implementation, then I am formally requesting that version 0.5.5 be merged into trunk. The patches that were attached to http://lists.boost.org/Archives/boost/2010/07/168703.php will also need to be applied.

I have merged it in now; please look and see if r65590 is what you want.
Hi Jeremiah, There are a couple of corrections: 1.) The Table of Contents entry for `stoer_wagner_min_cut` needs to be under a separate section ("Minimum Cut Algorithms") immediately following the "Maximum Flow and Matching Algorithms" section. 2.) A comment in the example program needed to be updated slightly because the documentation images were moved into the `stoer_wagner_imgs` folder. I have attached a patch for `libs/graph` that modifies `doc/table_of_contents.html` and `example/stoer_wagner.cpp`. In `example/stoer_wagner.cpp` I also added a line to #include <cstdlib> because `EXIT_SUCCESS` is declared in that header. Everything else looks good, though. Thank you!

On Fri, 9 Jul 2010, Daniel Trebbien wrote:
Actually, your graphs appear to be in DIMACS format (<URL:ftp://dimacs.rutgers.edu/pub/netflow/>, see general-info/specs.tex in there for the file format). BGL already has a reader for some variant of that format, and it appears to handle "a" lines; could you please try to use/update that for your tests? -- Jeremiah Willcock

Also, is parse_noigen.hpp something that should be in boost/graph rather than the test directory? Or is it specific to your algorithm?
I wrote `parse_noigen` to have a quick and easy way to read the prgen output, but I feel that it is not generic or clean enough for use in non-test code.
Sure. I'll take a look.

Hi Jeremiah, I react on this point only:
I don't understand this. If I have a graph of one million vertices and even more edges, the property map (e.g. defining something for each vertex) is certainly not lightweight. Or do you mean other property maps? -- ------------------------------------- Barend Gehrels, Geodan Holding b.v., Amsterdam, The Netherlands, www.geodan.nl

On Thu, 24 Jun 2010, Barend Gehrels wrote:
The property map is not intended to own the data; thus, copying a property map is only a shallow copy and thus cheap. That is why classes such as iterator_property_map require a separate object for the actual data storage; that object is not copied when the property map is copied. -- Jeremiah Willcock

OK, I didn't know this and it is AFAIS not written in the property map documentation. /(e.g.: Implementing your own exterior property maps is not very difficult. You simply need to overload the functions required by the property map concept <http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/property_map.html> that you want your class to model. At most, this means overloading the put() and get() functions and implementing operator[]. Also, your property map class will need to have nested typedefs for all the types defined in property_traits, or you can create a specialization of property_traits for your new property map type. ) /A shallow copy of non-owned data will be lightweight indeed. What about maps using std::map (e.g. using boost::make_assoc_property_map(component)), are they implemented inefficient then? I'm interested in these details because I'm using BGL and implementing a graph algorithm on it (where I currently use std::map). Thanks, Barend

On Thu, 24 Jun 2010, Barend Gehrels wrote:
It is, but it's hard to find. It's the last sentence of <URL:http://www.boost.org/doc/libs/1_43_0/libs/property_map/ doc/property_map.html>.
The associative_property_map class works just like iterator_property_map in that it just has a reference to the std::map or other underlying storage.
I'm interested in these details because I'm using BGL and implementing a graph algorithm on it (where I currently use std::map).
You might want to use iterator_property_map or shared_array_property_map since those are faster for graphs with fixed sets of vertices. -- Jeremiah Willcock
participants (4)
-
Barend Gehrels
-
Dan Jiang
-
Daniel Trebbien
-
Jeremiah Willcock