[BGL] auto index property maps in the boost vault

Hi all, I've put some files in the boost vault under algorithms/graph in a package I'm calling "auto index property maps." I developed these in response to a thread on boost-users a few months back (http://tinyurl.com/bezjr) and a steady stream of questions on this mailing list and boost-users about how to use/create vertex and edge index maps when they aren't provided for you. Vertex index maps are property maps that map the vertices of a graph bijectively into the range [0..num_vertices(g)). Edge index maps map edges bijectively into [0..num_edges(g)). Many BGL algorithms require a vertex or edge index map, and one can be supplied either in the form of an interior property or an external property map. Auto index property maps are external property maps that were created with ease of use in mind. In short, whenever a vertex index map is expected for the graph g, you can substitute the expression "make_auto_vertex_index(g)" and everything will work as expected. Similarly, if an edge index property map is expected, you can use the expression "make_auto_edge_index(g)" instead. It's my hope that the auto index property maps will provide a quick solution to those looking to get started with the BGL who aren't yet worried about delving into the details of working with internal properties, which seems to be a difficult issue for a lot of beginners. Auto index property maps can be used while prototyping and replaced later with interior properties if efficiency is an issue. Any feedback would be most appreciated. I've tested everything with gcc 3.4.4 on cygwin. Aaron

On Nov 19, 2005, at 12:12 PM, Aaron Windsor wrote:
Auto index property maps are external property maps that were created with ease of use in mind. In short, whenever a vertex index map is expected for the graph g, you can substitute the expression "make_auto_vertex_index(g)" and everything will work as expected. Similarly, if an edge index property map is expected, you can use the expression "make_auto_edge_index(g)" instead.
Cool!
It's my hope that the auto index property maps will provide a quick solution to those looking to get started with the BGL who aren't yet worried about delving into the details of working with internal properties, which seems to be a difficult issue for a lot of beginners. Auto index property maps can be used while prototyping and replaced later with interior properties if efficiency is an issue.
This will definitely be very helpful. I've only had a few minutes to look over this, so I only have two questions on the code itself: 1) edge_less_than seems more complicated than it needs to be. Instead of creating an integer inside edge_to_index_dispatch, then comparing the integers for two edges to order them, why not just have edge_less_than produce an ordering itself? That would avoid having to store the number of vertices in the edge_less_than predicate. 2) For the auto_index_property_map that allows duplicate keys, was there any particular reason to use a map of vectors instead of a multimap? I also have some higher-level, non-code comments. The primary concern I have is that we're adding more functionality to the BGL to make it easier. It's the same thing we did with bundled properties: add a new, easier-to-use mechanism on top of what we already have. Unfortunately, features interact and I'm not entirely sure that we've managed to make life easier overall. Auto-indexing maps are a great feature, but overall will they make it easier to learn and use the BGL? They will make some things easier: when users ask "why can't I call this algorithm with my adjacency_list?" we'll have the simple answer of "add edge_index(auto_edge_index_map(g))", followed by the obligatory "if you find that it's too slow, do this other thing" comment. It seems that the way to make the BGL easier to use would be to make auto-indexing property maps automatic. When a BGL algorithm tries to pull out a vertex_index map, it checks the parameter list, then the graph itself, then falls back to generating an auto-indexing map. This would be convenient, but it also means that there are hidden performance penalties, which we've tried to avoid in the BGL. What to do? It's becoming more and more important to make the BGL easier to use out-of-the-box. I even think that most users will understand if at a later point in time they need to do a little work to get their code to give maximum performance, but we need to give them the tools to do so. For instance, I can imagine a macro BOOST_GRAPH_PERFORMANCE_WARNINGS that produces run-time warnings when the library is secretly building an auto-indexing map behind-the-scenes and a macro BOOST_GRAPH_PERFORMANCE_WARNINGS_ARE_ERRORS that causes compile-time errors instead. But for now, I think once I understand (1) and (2), the auto-indexing property maps should go into CVS HEAD and we can discuss just how automatic we want to make them. Doug

On 11/21/05, Doug Gregor <dgregor@cs.indiana.edu> wrote: <snip>
I've only had a few minutes to look over this, so I only have two questions on the code itself:
1) edge_less_than seems more complicated than it needs to be. Instead of creating an integer inside edge_to_index_dispatch, then comparing the integers for two edges to order them, why not just have edge_less_than produce an ordering itself? That would avoid having to store the number of vertices in the edge_less_than predicate.
Do you mean that edge_less_than should be a stateful predicate, creating an ordering as it goes along (first edge it sees is ranked 1, second edge is ranked 2, etc.?) Because this scheme would make lookup in a map an log^2 n operation - at each node in the tree, a rank lookup costing log n needs to be made in order to figure out where to go next. Maybe I'm misunderstanding you?
2) For the auto_index_property_map that allows duplicate keys, was there any particular reason to use a map of vectors instead of a multimap?
No, you're right - multimap is better here. I'll change this and put a new version in the vault.
I also have some higher-level, non-code comments. The primary concern I have is that we're adding more functionality to the BGL to make it easier. It's the same thing we did with bundled properties: add a new, easier-to-use mechanism on top of what we already have. Unfortunately, features interact and I'm not entirely sure that we've managed to make life easier overall.
Auto-indexing maps are a great feature, but overall will they make it easier to learn and use the BGL? They will make some things easier: when users ask "why can't I call this algorithm with my adjacency_list?" we'll have the simple answer of "add edge_index(auto_edge_index_map(g))", followed by the obligatory "if you find that it's too slow, do this other thing" comment.
It seems that the way to make the BGL easier to use would be to make auto-indexing property maps automatic. When a BGL algorithm tries to pull out a vertex_index map, it checks the parameter list, then the graph itself, then falls back to generating an auto-indexing map. This would be convenient, but it also means that there are hidden performance penalties, which we've tried to avoid in the BGL.
What to do?
In the boost-users thread that I linked to in the original post, I suggested that get(vertex_index, g) return an auto index if there was no interior index. You thought this was a bad idea at the time, too misleading for the user, and I tend to agree with your earlier self. I came to like the idea of saying "make_auto_vertex_index" and "make_auto_edge_index" as an acknowledgement of the fact that you, as a user, realize that the algorithm needs an index on the vertices or edges, and realize that you don't have one, but still want the algorithm to work.
It's becoming more and more important to make the BGL easier to use out-of-the-box. I even think that most users will understand if at a later point in time they need to do a little work to get their code to give maximum performance, but we need to give them the tools to do so. For instance, I can imagine a macro BOOST_GRAPH_PERFORMANCE_WARNINGS that produces run-time warnings when the library is secretly building an auto-indexing map behind-the-scenes and a macro BOOST_GRAPH_PERFORMANCE_WARNINGS_ARE_ERRORS that causes compile-time errors instead.
Interesting. I'll think some more about this...
But for now, I think once I understand (1) and (2), the auto-indexing property maps should go into CVS HEAD and we can discuss just how automatic we want to make them.
Thanks for taking the time to look at this and give some feedback, Doug. I'll update with the multimap fix, probably after the holidays. Aaron

On Nov 22, 2005, at 7:23 AM, Aaron Windsor wrote:
On 11/21/05, Doug Gregor <dgregor@cs.indiana.edu> wrote:
<snip>
I've only had a few minutes to look over this, so I only have two questions on the code itself:
1) edge_less_than seems more complicated than it needs to be. Instead of creating an integer inside edge_to_index_dispatch, then comparing the integers for two edges to order them, why not just have edge_less_than produce an ordering itself? That would avoid having to store the number of vertices in the edge_less_than predicate.
Do you mean that edge_less_than should be a stateful predicate, creating an ordering as it goes along (first edge it sees is ranked 1, second edge is ranked 2, etc.?) Because this scheme would make lookup in a map an log^2 n operation - at each node in the tree, a rank lookup costing log n needs to be made in order to figure out where to go next. Maybe I'm misunderstanding you?
I believe I was actually thinking of a requirement for operator< on the key type of the auto-index property map. Users have been asking for the ability to compare vertex and edge descriptors with <for a long time, and I believe we can provide it for all of the graph types in the BGL. It would simplify the auto-index property maps a bit, and help users overall. Granted, the problem remains that we essentially need to know whether the less-than operator for the key type is a total order (so we can use std::map) or only a partial order (we need to use std::multimap): perhaps assume it's a partial order, but have a trait is_totally_ordered<Key> that will be specialized to tell us when we can use the std::map.
In the boost-users thread that I linked to in the original post, I suggested that get(vertex_index, g) return an auto index if there was no interior index. You thought this was a bad idea at the time, too misleading for the user, and I tend to agree with your earlier self. I came to like the idea of saying "make_auto_vertex_index" and "make_auto_edge_index" as an acknowledgement of the fact that you, as a user, realize that the algorithm needs an index on the vertices or edges, and realize that you don't have one, but still want the algorithm to work.
My earlier self hadn't received quite as many complaints about the usability of the BGL :( My concern is that we lose people when they try an algorithm and get some big error message. If they ask on the list, we'll just tell them to us make_auto_vertex_index or make_auto_edge_index and everything will work perfectly; but if they don't ask, we'll probably have lost them completely. Granted, I think adjacency_list is a bigger initial problem: if we had an easy-to-use graph type (or two) that had built- in vertex and edge index property maps, users wouldn't see these problems. Then, when they decide to tweak a bit more, they could move to adjacency_list<> and use auto-index property maps where necessary. In any case: as soon as we figure out whether we should use/require operator< on keys to the auto-index property map, please go ahead and integrate your code and documentation into CVS HEAD. Doug

On 11/29/05, Douglas Gregor <doug.gregor@gmail.com> wrote:
On Nov 22, 2005, at 7:23 AM, Aaron Windsor wrote:
On 11/21/05, Doug Gregor <dgregor@cs.indiana.edu> wrote:
<snip>
I've only had a few minutes to look over this, so I only have two questions on the code itself:
1) edge_less_than seems more complicated than it needs to be. Instead of creating an integer inside edge_to_index_dispatch, then comparing the integers for two edges to order them, why not just have edge_less_than produce an ordering itself? That would avoid having to store the number of vertices in the edge_less_than predicate.
Do you mean that edge_less_than should be a stateful predicate, creating an ordering as it goes along (first edge it sees is ranked 1, second edge is ranked 2, etc.?) Because this scheme would make lookup in a map an log^2 n operation - at each node in the tree, a rank lookup costing log n needs to be made in order to figure out where to go next. Maybe I'm misunderstanding you?
I believe I was actually thinking of a requirement for operator< on the key type of the auto-index property map. Users have been asking for the ability to compare vertex and edge descriptors with <for a long time, and I believe we can provide it for all of the graph types in the BGL. It would simplify the auto-index property maps a bit, and help users overall.
Granted, the problem remains that we essentially need to know whether the less-than operator for the key type is a total order (so we can use std::map) or only a partial order (we need to use std::multimap): perhaps assume it's a partial order, but have a trait is_totally_ordered<Key> that will be specialized to tell us when we can use the std::map.
Well, the code in the vault does provide a generic < operator for edges (edge_less_than, in the file edge_to_index.hpp), which provides a total order on edges for graphs without parallel edges and a partial order otherwise. It requires a vertex index map, which I think is the best way to do this. The way to tell whether the order is total or partial is to check the edge_parallel_category tag in graph_traits. As for a vertex descriptor ordering, I don't know of a way to provide this generically without (1) using a vertex index map or (2) taking O(n) time per call to <. (1) is ridiculous since we're trying to provide a vertex index map, and (2) is unacceptably slow. All the vertex descriptors used by the BGL standard graph types are comparable with < anyway, since they're either pointers or of type std::size_t, so I don't know what else to do besides assume that if a user has a strange vertex descriptor type, they provide their own < for the vertex descriptor. For now, I think I'll just add a form of make_vertex_index that takes a < functor as an argument (although, arguably, if you can write your own < functor for vertex descriptors, you can probably figure out internal properties.)
My concern is that we lose people when they try an algorithm and get some big error message. If they ask on the list, we'll just tell them to us make_auto_vertex_index or make_auto_edge_index and everything will work perfectly; but if they don't ask, we'll probably have lost them completely. Granted, I think adjacency_list is a bigger initial problem: if we had an easy-to-use graph type (or two) that had built- in vertex and edge index property maps, users wouldn't see these problems. Then, when they decide to tweak a bit more, they could move to adjacency_list<> and use auto-index property maps where necessary.
In any case: as soon as we figure out whether we should use/require operator< on keys to the auto-index property map, please go ahead and integrate your code and documentation into CVS HEAD.
I agree that the best way to provide auto-indexing is internally - even with existing graph types, we could provide optional auto-indexing with little space overhead. For instance, giving each vertex an index and a timestamp with timestamp intialized to 0. The graph keeps a global_timestamp initialized to 1 and a next_index initialized to 0. If you ask for a vertex's index, first check if timestamp == global_timestamp. If not, set timestamp = global_timestamp and index = next_index, and increment next_index. If so, return index. Whenever you remove a vertex, increment global_timestamp and set next_index to 0. Of course, the same scheme works for edges, too. So, I have a few misgivings about external auto index property maps, but I think they're easy to use, which is a start. I'll put the code into HEAD after I've thought about this a little more. Aaron

On Dec 1, 2005, at 7:12 AM, Aaron Windsor wrote:
On 11/29/05, Douglas Gregor <doug.gregor@gmail.com> wrote: All the vertex descriptors used by the BGL standard graph types are comparable with < anyway, since they're either pointers or of type std::size_t, so I don't know what else to do besides assume that if a user has a strange vertex descriptor type, they provide their own < for the vertex descriptor.
Yep. For all of the edge descriptor types in the BGL that can have a <, we should provide it just for convenience. It still won't be a requirement on edge descriptors, but it's a good convenience.
For now, I think I'll just add a form of make_vertex_index that takes a < functor as an argument (although, arguably, if you can write your own < functor for vertex descriptors, you can probably figure out internal properties.)
Yep, that seems reasonable.
I agree that the best way to provide auto-indexing is internally - even with existing graph types, we could provide optional auto-indexing with little space overhead. For instance, giving each vertex an index and a timestamp with timestamp intialized to 0. The graph keeps a global_timestamp initialized to 1 and a next_index initialized to 0. If you ask for a vertex's index, first check if timestamp == global_timestamp. If not, set timestamp = global_timestamp and index = next_index, and increment next_index. If so, return index. Whenever you remove a vertex, increment global_timestamp and set next_index to 0. Of course, the same scheme works for edges, too.
Yep, that would work perfectly well. At one point I was imagining magic "vertex_autoindex_t" and "edge_autoindex_t" properties that one could put into the list of properties, e.g., typedef adjacency_list<listS, listS, directedS, property<vertex_autoindex_t, std::size_t>, property<edge_autoindex_t, std::size_t> > Graph; Your suggestion would be a very reasonable implementation of that. We'd need to make it so that property_map<Graph, vertex_index_t> and get(vertex_index, g) did the right thing by grabbing the auto-index property map. Granted, I don't want to make the user do anything with property<...> ever again. We'd be better off with a schedule based on bundled properties, either by providing an "auto_indexed" base class for bundled properties or a wrapper. *Sigh*; so many tricky interface decisions. Doug
participants (3)
-
Aaron Windsor
-
Doug Gregor
-
Douglas Gregor