Boost Graph: preventing duplicate vertices when constructing graph incrementally
Hi, The vast majority of Boost Graph's sample code has pre-defined source data. For example, from city_visitor.cpp: enum { SanJose, SanFran, LA, SanDiego, Fresno, LasVegas, Reno, Sacramento, SaltLake, Phoenix, N }; E edge_array[] = { E(Sacramento, Reno), E(Sacramento, SanFran), E(Reno, SaltLake), E(SanFran, SanJose), E(SanJose, Fresno), E(SanJose, LA), E(LA, LasVegas), E(LA, SanDiego), E(LasVegas, Phoenix) }; Unfortunately, static source data with enums as the vertices isn't very typical in real-world programs. It's much more common to construct the graph incrementally from a streaming data set, such as a file containing a list of edge pairs. But this presents a problem: As the list of edges is read in, a vertex may be encountered more than once, so add_vertex can't simply be called on every vertex found because that will result in redundant vertices. An obvious fix is to check whether a vertex exists before calling add_vertex. If it is found, it can be used for the edge; otherwise add_vertex can be called and the returned value used for the edge. The only question is, How do I check whether a vertex exists? I could iterate through all vertices in the graph, searching for one that equals the one I'm trying to add, but this approach doesn't scale well. It has a time complexity of O(V). A faster solution to this problem appears to be in the boost_web_graph.cpp sample. It deals with the same issue: It reads in graph data from a file called boost_web.dat, and this file contains the same vertex value in multiple places. Rather than call add_vertex multiple times (creating redundancies), the code puts every vertex it finds into a map. The map associates the vertex's unique value (a string in this case) with its corresponding vertex descriptor. Then, before calling add_vertex, the code checks the map for the unique value; if found, it uses the corresponding vertex descriptor; otherwise it calls add_vertex and adds the new ID/descriptor pair to the map. I assume that this is faster (compared to my naive approach of searching linearly through the vertices) because std::map's insert function is supposed to have logarithmic time complexity. It probably achieves this with a red-black tree, so as long as the keys are sortable, O(log(V)) complexity is guaranteed. Strings (used in the sample code) are certainly sortable, although in my program I'm using pointers as the vertex property. But I guess pointers are sortable too, since they're just numbers really. My initial tests show that this approach works properly with pointers as the values, though I've not done any performance testing, only correctness testing. Any feedback on this would be appreciated. Thanks, Trevor
participants (1)
-
Trevor Harmon