[graph] managed vertex indexing

All Interested, For my Summer of Code proposal, I have been working on the usability of the Boost.Graph library, basically adding a couple wrappers to make it easier to work with undirected and directed graphs. One aspect of these classes is that they do not use the typical vector storage for vertices, edges, or the out edge list. In this case, list storage provides more iterator and descriptor stability so we (mentor Jeremy Siek and myself) decided go that direction. This decision has some interesting fallout since almost every example for Boost.Graph, including that in the documentation is written using vecS for most of the storage facilities. This lead some interesting discoveries like, for example, that num_vertices(g) is O(n) with listS vertex storage under GCC - who knew?. also that most graphs declared using `listS` will fail to run correctly with default parameters on just about every algorithm. And finally, that most of the examples given in the documentation and libs directory will also fail if you stick this line of code just before a call to any core algorithm: remove_vertex(*vertices(g).first, g); Except for the linear time num_vertices(g) call, these problems are caused by indiscriminate use of vertex descriptors as indices - in fact, that's exactly the reason that the above line of code will cause most examples to fail. I've written down a solution to this and added it to the documentation I've been working for my project. It essentially involves three functions that can be used to help manage vertex indices. Essentially they are: get_vertex_index(v, g) max_vertex_index(g) renumber_vertex_indidces(g) Correct use of these functions will solve a number of problems. The more complete proposal is in subversion at: http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/graphs/libs/ graph/doc/quickbook/indexing.qbk Part the related implementation is here: http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/graphs/boost/ graph/undirected_graph.hpp The three functions in question are the last in the file, although they do call some member functions of the graph class itself. I'd be grateful to some suggestions for improvement or other comments. Andrew Sutton asutton@cs.kent.edu

On 6/15/07, Andrew Sutton <asutton@cs.kent.edu> wrote:
All Interested,
For my Summer of Code proposal, I have been working on the usability of the Boost.Graph library, basically adding a couple wrappers to make it easier to work with undirected and directed graphs. One aspect of these classes is that they do not use the typical vector storage for vertices, edges, or the out edge list. In this case, list storage provides more iterator and descriptor stability so we (mentor Jeremy Siek and myself) decided go that direction.
This decision has some interesting fallout since almost every example for Boost.Graph, including that in the documentation is written using vecS for most of the storage facilities. This lead some interesting discoveries like, for example, that num_vertices(g) is O(n) with listS vertex storage under GCC - who knew?. also that most graphs declared using `listS` will fail to run correctly with default parameters on just about every algorithm. And finally, that most of the examples given in the documentation and libs directory will also fail if you stick this line of code just before a call to any core algorithm:
remove_vertex(*vertices(g).first, g);
Except for the linear time num_vertices(g) call, these problems are caused by indiscriminate use of vertex descriptors as indices - in fact, that's exactly the reason that the above line of code will cause most examples to fail.
I've written down a solution to this and added it to the documentation I've been working for my project. It essentially involves three functions that can be used to help manage vertex indices. Essentially they are:
get_vertex_index(v, g) max_vertex_index(g) renumber_vertex_indidces(g)
Correct use of these functions will solve a number of problems. The more complete proposal is in subversion at:
http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/graphs/libs/ graph/doc/quickbook/indexing.qbk
Part the related implementation is here:
http://svn.boost.org/trac/boost/browser/sandbox/SOC/2007/graphs/boost/ graph/undirected_graph.hpp
The three functions in question are the last in the file, although they do call some member functions of the graph class itself.
I'd be grateful to some suggestions for improvement or other comments.
Andrew Sutton asutton@cs.kent.edu
Hi Andrew, Glad to see that you are making progress on this problem - I think improving the usability of the BGL for the casual user is a very worthwhile project. Your implementation makes it easier for users to manage a vertex/edge index, but I'd like to suggest a scheme where the user could be blissfully unaware of the existence of a vertex/edge index entirely unless they explicitly needed it. To me, this is the big usability problem with the BGL - the fact that a user needs to understand the concept of a vertex/edge index map before even simply using a basic algorithm like breadth-first search. But almost all BGL algorithms make use of at least a vertex index, and a large percentage of the BGL-related questions on the boost-users list deal with misunderstanding/misuse of vertex and edge index maps. I think it would be nice if the undirected_graph and directed_graph class came equipped with a ready-to-use vertex_index and edge_index, so that when a user with no prior knowledge of the BGL codes up an example like the one in your proposal: vertex_descriptor u = add_vertex(g); vertex_descriptor v = add_vertex(g); remove_vertex(u, g); breadth_first_search(g, v, make_bfs_visitor(null_visitor())); everything just works. It's possible to do this lazily with very little overhead using the following scheme (I'll describe the scheme for a vertex index, but the same scheme works for an edge index as well): give the graph class two additional members: global_vertex_timestamp and next_vertex_index, and give each vertex two interior properties: raw_vertex_index and local_vertex_timestamp. All of these variables are of type graph_traits<graph_type>::vertices_size_type. Upon construction of the graph, global_vertex_timestamp is initialized to 1, next_vertex_index is initialized to 0, and each vertex's local_vertex_timestamp is initialized to 0. The property map p returned by get(vertex_index, g) is a read-only property map that does the following: when get(p, v) is called for some vertex v, first check to see if v's local_vertex_timestamp is equal to the graph's global_vertex_timestamp. If so, return raw_vertex_index. If not, set local_vertex_timestamp = global_vertex_timestamp, set raw_vertex_index = next_available_index, and increment next_available_index. This scheme indexes the vertices "on-demand", and allows you to lazily reset *all* vertex indices at once in constant time (for example, whenever remove_vertex is called) just by incrementing global_vertex_timestamp and setting next_vertex_index to 0. One wrinkle in the above scheme is that you'd need to do something special in the rare case that the global_vertex_timestamp overflowed back to 0. In this case, you could just check after each increment of global_vertex_timestamp to see if it's equal to the max() value defined in std::numeric_limits, and if so, do a full reset (global_vertex_timestamp = 1, next_vertex_index = 0, each local_vertex_timestamp = 0). It would also have to be documented that the indices are invalidated when a vertex is removed, since the vertex index can be used to create exterior property maps (for example, a vector indexed by the vertex index). This could be a very confusing error - all of the sudden, the mapping from vertex -> external property would be permuted (since the vertices are re-indexed) if a vertex is removed. Just a different philosophy - whether you want to make vertex/edge indices easier to use, or whether you want to try to sweep them under the rug entirely for casual users of the BGL. If a user is going to do anything advanced with the BGL, they're going to have to understand vertex/edge indices eventually, but anything we can do to make everything compile and run for the first time/casual user is an improvement! Regards, Aaron

Glad to see that you are making progress on this problem - I think improving the usability of the BGL for the casual user is a very worthwhile project.
<snip>
Aaron, Interesting ideas, but maybe a little overkill for the scope of my work :) - I'm not touching any of the existing classes anyways. I think however, that your suggestions make vertex/edge indexing integral to the BGL - which I'm not convinced it should be. The more I think about it, the more it feels like the indexing stuff is there because it's declaratively harder to do without it. Conceptually, graph algorithms need to access properties of vertices and edges in constant time. When we use vecS as a storage selector, this is trivially done since the descriptors are literally the indices. This makes the accessors *really* trivial and plainly constant time - unfortunately also highly unstable under destructive operations. To be honest, the indexing really feels like more of a crutch than anything else. I think an ultimate goal - and this is also well beyond the scope of my work this summer - might be to completely eliminate any dependence on indexing for property accessors. We might be able to use hash_maps as a viable alternative since realistically, descriptors are integral types and assigned in such a way that hash collisions should be very rare. There are, unfortunately as I see it, two problems with this. First, there's substantial momentum built around the current model (indexing, etc.). The second, probably more impractical, is the commitment of users to the current design (i.e., backwards compatibility). I don't know how practical it would be to introduce and implement such a large change into the current codebase - even if people did agree that it would be a good idea. Ironically, this may be a great argument for independent library versioning. Andrew Sutton asutton@cs.kent.edu
participants (2)
-
Aaron Windsor
-
Andrew Sutton