Re: [boost] Boost Graph library - r_c_shortest_paths

I checked the files I sent you and they were the correct versions (all three code files: the test file, the example file, and r_c_shortest_paths.hpp). The test and example files both run without any problems on my system (WIN XP 32 SP 3, MS Visual C++ 2008).
I've finally had some time to go through the resubmission. There are/were a number of issues that need to be addressed. First, property maps, functors, and visitors are virtually always passed by value. If those objects need to maintain complex state between calls, it's typically done by building a functor (visitor, etc) that has a reference to an external state object. I fixed this. Second, I removed the ks_smart_ptr class, which turned out to be not very smart. It only wrapped the pointer, but didn't manage it in any way. I also cleaned up some of the allocation/deallocation code by moving the calls to two new functions create and destroy. Inidentally, valgrind shows that this implementation leaks memory. I put cout's in create and destroy to count allocations and deletions (16646 allocs to 15994 deallocs). My guess is that one of the rarer code paths in the dispatch doesn't actually delete. This needs to be investigated. Third... just a style point... some of the identifiers are a little long. I think the readability could be improved by shortening them. Fourth, and I'm hoping to get Jeremiah's input on this... I'm not a fan of passing vector<vector<>>& or vector<>& as top-level parameters. It might be better to require a template parameter that is essentially a matrix- or vector-like object. That would simplify the call interface and allow a little more freedom w.r.t. parameter types. Thoughts? Fifth... and Final... The interface to the algorithm is changing which can break backwards compatibility. Maybe we need to give the new module a different name. Thoughts? Here's a tarball with the code ready to be dropped into Boost. You should refer to this version since its what I'm going to be merging. http://www.cs.kent.edu/~asutton/rcsp.tar.bz2 Andrew Sutton andrew.n.sutton@gmail.com

On Mon, 22 Feb 2010, Andrew Sutton wrote:
I checked the files I sent you and they were the correct versions (all three code files: the test file, the example file, and r_c_shortest_paths.hpp). The test and example files both run without any problems on my system (WIN XP 32 SP 3, MS Visual C++ 2008).
I've finally had some time to go through the resubmission. There are/were a number of issues that need to be addressed.
First, property maps, functors, and visitors are virtually always passed by value. If those objects need to maintain complex state between calls, it's typically done by building a functor (visitor, etc) that has a reference to an external state object. I fixed this.
I agree.
Second, I removed the ks_smart_ptr class, which turned out to be not very smart. It only wrapped the pointer, but didn't manage it in any way. I also cleaned up some of the allocation/deallocation code by moving the calls to two new functions create and destroy.
Inidentally, valgrind shows that this implementation leaks memory. I put cout's in create and destroy to count allocations and deletions (16646 allocs to 15994 deallocs). My guess is that one of the rarer code paths in the dispatch doesn't actually delete. This needs to be investigated.
See my comment below about getting rid of heap-allocated labels entirely.
Third... just a style point... some of the identifiers are a little long. I think the readability could be improved by shortening them.
Fourth, and I'm hoping to get Jeremiah's input on this... I'm not a fan of passing vector<vector<>>& or vector<>& as top-level parameters. It might be better to require a template parameter that is essentially a matrix- or vector-like object. That would simplify the call interface and allow a little more freedom w.r.t. parameter types. Thoughts?
I am not a fan of them either. It might be better for the user to provide an output iterator for each output, which the algorithm writes to. Also, what if the user does not want one of the vector outputs? Look at libs/graph/doc/mcgregor_common_subgraphs.html for one possible approach to this problem; there, a user-supplied property map is updated with each solution (there, a partial isomorphism between graphs) and a user function is called after the solution is written. I have not gone through this tarball before, so I have some other comments based on the documentation; I have not looked through the code: The function description should start with the prototype, not just the parameter names. It also only lists one overload, while other parts of the documentation hint at others with different parameter combinations. What is the point of a label allocator? Do they need to be full allocators? Do labels need to be heap-allocated objects at all, or would a value class make more sense (in combination with property maps to express paths)? It looks like labels may not make sense at all; perhaps there can be property maps for cumulated_resource_consumption, pred_edge, and is_dominated, each indexed by graph vertices? That would likely improve performance and simplify the algorithm's interface? Also, what does the Resource Container concept represent? The documentation gives no meaningful information that I can see; it has no valid expressions or associated types. Is it actually a container of some sort, or just an abstract object defined by the user? It appears that the intent is that it is an arbitrary object with a partial order and an extend operation (both given as separate function objects); is that correct? It might also be good to separate the extend operation from the enforcement of the resource constraints on the edge target. Can extend be a weight map and a combine function object (like for shortest path algorithms), plus a feasibility function object or property map? The requirements on the dominance function don't make sense; if you require that it returns true iff rc1 <= rc2, why not just use the expression rc1 <= rc2? I think the intent is that it define a partial order; are there any other properties required (such as upper or lower bounds for all values, a compatibility criterion with the extend operation, ...)? For example, is there a constraint such as non-negative edge weights (i.e., rc must dominate extend(rc, x) or vice versa for any edge weight x)? It appears not, but there are likely some undocumented compatibility criteria there. Also, is there a reason that it acts like <= rather than < like other orders in STL do? Is a <= b a domain convention for "a dominates b"? In the precondition section, there is a mention of < and == on resource containers; why do you not just use the dominance function for <? Having a function object makes relatively little sense when it must match the behavior of a fixed operator. If you do decide to keep label objects, the Label concept should just be a prototype of the structure if there is only one type that is allowed to be used in that position; a concept that can only ever have one model is usually unnecessary.
Fifth... and Final... The interface to the algorithm is changing which can break backwards compatibility. Maybe we need to give the new module a different name. Thoughts?
Since it is an existing BGL function, it probably does. How different are the interfaces between the two versions? -- Jeremiah Willcock

I'll leave most of the responses for Michael to address, but will try to add a few clarifications and comments. Fourth, and I'm hoping to get Jeremiah's input on this... I'm not a fan of
passing vector<vector<>>& or vector<>& as top-level parameters. It might be better to require a template parameter that is essentially a matrix- or vector-like object. That would simplify the call interface and allow a little more freedom w.r.t. parameter types. Thoughts?
I am not a fan of them either. It might be better for the user to provide an output iterator for each output, which the algorithm writes to. Also, what if the user does not want one of the vector outputs? Look at libs/graph/doc/mcgregor_common_subgraphs.html for one possible approach to this problem; there, a user-supplied property map is updated with each solution (there, a partial isomorphism between graphs) and a user function is called after the solution is written.
It may be the case that output iterators aren't sufficient for the algorithm. It may need random access to the output vectors/matrices. I need to look at the implementation more closely at the implementation. Property maps may be a viable option in their place, however. What is the point of a label allocator? Do they need to be full allocators?
Do labels need to be heap-allocated objects at all, or would a value class make more sense (in combination with property maps to express paths)? It looks like labels may not make sense at all; perhaps there can be property maps for cumulated_resource_consumption, pred_edge, and is_dominated, each indexed by graph vertices? That would likely improve performance and simplify the algorithm's interface?
Allocators are used to allocate r_c path labels, which have Resource_Containers as subobjects, so I'm thinking that value types may not be the best solution. On the other hand... If only we had reliable move semantics :) I'm not sure if there's a data structure that naturally supports the order of construction and destruction, if you get my meaning. If you do decide to keep label objects, the Label concept should just be a
prototype of the structure if there is only one type that is allowed to be used in that position; a concept that can only ever have one model is usually unnecessary.
I don't think the labels ever appear in the output, but I could be missing a parameter's intent.
Since it is an existing BGL function, it probably does. How different are the interfaces between the two versions?
There's one less function and template parameter in the new version. I'm pretty sure we'd break something without a new name or module. Andrew Sutton andrew.n.sutton@gmail.com

On Mon, 22 Feb 2010, Andrew Sutton wrote:
I'll leave most of the responses for Michael to address, but will try to add a few clarifications and comments.
Fourth, and I'm hoping to get Jeremiah's input on this... I'm not a fan of
passing vector<vector<>>& or vector<>& as top-level parameters. It might be better to require a template parameter that is essentially a matrix- or vector-like object. That would simplify the call interface and allow a little more freedom w.r.t. parameter types. Thoughts?
I am not a fan of them either. It might be better for the user to provide an output iterator for each output, which the algorithm writes to. Also, what if the user does not want one of the vector outputs? Look at libs/graph/doc/mcgregor_common_subgraphs.html for one possible approach to this problem; there, a user-supplied property map is updated with each solution (there, a partial isomorphism between graphs) and a user function is called after the solution is written.
It may be the case that output iterators aren't sufficient for the algorithm. It may need random access to the output vectors/matrices. I need to look at the implementation more closely at the implementation. Property maps may be a viable option in their place, however.
I looked at the code; the only operation done to the two vectors is push_back, so an iterator or callback would work.
What is the point of a label allocator? Do they need to be full allocators?
Do labels need to be heap-allocated objects at all, or would a value class make more sense (in combination with property maps to express paths)? It looks like labels may not make sense at all; perhaps there can be property maps for cumulated_resource_consumption, pred_edge, and is_dominated, each indexed by graph vertices? That would likely improve performance and simplify the algorithm's interface?
Allocators are used to allocate r_c path labels, which have Resource_Containers as subobjects, so I'm thinking that value types may not be the best solution. On the other hand... If only we had reliable move semantics :) I'm not sure if there's a data structure that naturally supports the order of construction and destruction, if you get my meaning.
I don't know exactly what the reason for heap allocation is. If you need ordered construction and destruction, wouldn't a stack (or recursion on the execution stack) work? Is the goal to have a bunch of lists with shared tails?
If you do decide to keep label objects, the Label concept should just be a
prototype of the structure if there is only one type that is allowed to be used in that position; a concept that can only ever have one model is usually unnecessary.
I don't think the labels ever appear in the output, but I could be missing a parameter's intent.
They do get passed to visitors, but I do not see them going into the output either. Do labels even need numbers or any kind of unique identifier? They appear to just be information about paths and partial paths (effectively a linked list). BTW, what is the point of a priority_queue on pointers? That ordering is not generally useful for algorithms. The pseudocode in the docs shows EXTRACTMIN as well; perhaps there is some other distance metric that is intended as the comparison in the priority_queue?
Since it is an existing BGL function, it probably does. How different are the interfaces between the two versions?
There's one less function and template parameter in the new version. I'm pretty sure we'd break something without a new name or module.
Yes, most likely. The full name resource_constrained_shortest_paths would work, though, and fits with BGL conventions. One other issue I saw is that there are a bunch of loops like: for(int i = 0; i < static_cast<int>(num_vertices(g)); ++i) Why not just use an iterator loop or BGL_FORALL_VERTICES_T? The code would even be somewhat cleaner by just switching to size_t as the counter. There are other tests that seem to have static_cast<int> just to quiet compiler warnings; some of them (such as static_cast<int>(list_labels_cur_vertex.size()) >= 2) are probably unnecessary. Can vec_vertex_labels, vec_last_valid_positions_for_dominance, vec_last_valid_index_for_dominance, or b_vec_vertex_already_checked_for_dominance be property maps instead of vectors? Another style issue: using c.size() as a condition should be replaced by !c.empty(), as the latter is often faster (especially for std::list, which does not guarantee that size() takes constant time). -- Jeremiah Willcock

I don't know exactly what the reason for heap allocation is. If you need ordered construction and destruction, wouldn't a stack (or recursion on the execution stack) work? Is the goal to have a bunch of lists with shared tails?
If you do decide to keep label objects, the Label concept should just be a
prototype of the structure if there is only one type that is allowed to be used in that position; a concept that can only ever have one model is usually unnecessary.
They do get passed to visitors, but I do not see them going into the output either. Do labels even need numbers or any kind of unique identifier? They appear to just be information about paths and partial paths (effectively a linked list). BTW, what is the point of a priority_queue on pointers? That ordering is not generally useful for algorithms. The pseudocode in the docs shows EXTRACTMIN as well; perhaps there is some other distance metric that is intended as the comparison in the priority_queue?
I just realized that I messed up a change to the implementation by removing the smart_pointer class. It was delegating comparisons to its dereferenced object (*ptr). That priority queue should be parameterized over (minimally) indirect_less. I'll have to go through and make sure that I haven't removed any other comparisons. Somehow, the tests still passed with the ptr comparisons....
There's one less function and template parameter in the new version. I'm
pretty sure we'd break something without a new name or module.
Yes, most likely. The full name resource_constrained_shortest_paths would work, though, and fits with BGL conventions.
I like it, but I think the documentation sites an algorithm, which suggests to me that there might be others. Should we consider using the author's name in a module called resource_constrained_shortest_paths.hpp?
One other issue I saw is that there are a bunch of loops like:
for(int i = 0; i < static_cast<int>(num_vertices(g)); ++i)
I had planned to address these in a later pass. I'll go thru and and back out the smart pointer change (sort of) and start taking a closer looks at static_casts. Andrew Sutton andrew.n.sutton@gmail.com

On Mon, 22 Feb 2010, Andrew Sutton wrote: (snip)
There's one less function and template parameter in the new version. I'm
pretty sure we'd break something without a new name or module.
Yes, most likely. The full name resource_constrained_shortest_paths would work, though, and fits with BGL conventions.
I like it, but I think the documentation sites an algorithm, which suggests to me that there might be others. Should we consider using the author's name in a module called resource_constrained_shortest_paths.hpp?
In that case, I think the author's name should be included.
One other issue I saw is that there are a bunch of loops like:
for(int i = 0; i < static_cast<int>(num_vertices(g)); ++i)
I had planned to address these in a later pass. I'll go thru and and back out the smart pointer change (sort of) and start taking a closer looks at static_casts.
OK. -- Jeremiah Willcock
participants (2)
-
Andrew Sutton
-
Jeremiah Willcock