Re: [boost] [BGL] Multithread problem inserting edges and exploring the same graph

23 Sep 2009


      On Wed, 23 Sep 2009, Cosimo Calabrese wrote:
...
...
It looks like (from your description that I snipped out) that you are 
really creating a new graph for every thread that is a copy of the original 
graph plus some new, thread-specific edges.  I would not recommend having a 
single graph for all of this because of the issues you are having.  What 
about the following:
Well, I've really a single, unique, huge istance of an adjacency_list in my 
application. I've called the initial form of that graph: 'original graph'. 
There aren't other graphs in memory. Every thread creates a filtered_graph 
object, that takes a reference of the original graph, but it's not a memory 
copy of it, it's only an adaptor. The thread adds the new edges and vertices 
using a reference of the original graph, and explores the filtered graph. The 
original graph has about 5 million of vertices and 10 million of edges. It 
takes about 1 Gb of RAM, I can't replicate it.
I know you were not doing that literally, but it seemed like the combined 
effect of your graph addition, filtering, and deletion meant that each run 
of Dijkstra's algorithm was in effect on a graph built this way.
...
...
1. Create a union_graph adapter that takes two graphs with the same number 
of vertices and merges their edge sets (concatenating the out_edges and 
in_edges for each vertex implicitly).  Properties would also need to be 
handled (or indexes for external properties).
2. For every thread, create a new graph with just that thread's new edges 
and run Dijkstra's algorithm on the union_graph of that piece and the 
original graph.
3. Throw away the temporary, thread-specific graph when the thread is done.
It seems a very, very good idea. I like it very much. If it works, I always 
access to the 'original' graph in read-only modality, that is the only shared 
object; I don't need to add any vertex or edge to any existing graph, but 
each thread only needs to create another, completly new graph, which isn't 
shared with other threads. So the thread explore the union adaptor in 
read-only modality. Exciting.
Now I must take a look on some 'graph arithmetic'... This adaptor needs only 
to model the traversal graph concepts. The two graph type in input to the 
union adaptor must be the same. So all the adaptor types required by the 
concepts are the same of the input graphs. Now I need a moment to think about 
the concept funcions (out_edges(), out_degree(), ... ). I'll tell you about 
the state of the adaptor very soon.
Those should be fairly simple, except that I do not believe Boost.Iterator 
has a concatenation iterator like the old View Template Library had.  If 
you write a generic concatenate-two-sequences iterator, you might want to 
consider contributing it to Boost.Iterator.  Otherwise, you should just be 
able to concatenate edge sets (in edge sets, out edge sets) and add 
degrees together.  Various properties and vertex/edge index computation 
might be tougher.
...
...
That should remove all of your locking (except in the memory allocator),
Yes.
...
and the filtered_graph,
Mmmmh, no, not yet. There is one thing of my project I've not mentioned, it's 
difficult to explain all of it... The graph elements have many properties. 
Each thread needs to explore not all the graph, but only a particular view of 
it, that must change dependently by the specific request. So, I need the 
filter_adaptor. But if you think that I haven't understood something about 
this point, please... tell me.
OK.  So you mean that you're filtering on more than just the thread ID? 
In that case, you might still need a filter.
...
...
and greatly simplify (and probably speed up) the code.  Do you think this 
will work?
Yes, I'm hopeful. I'll tell you soon. Thanks for the great idea.
But beyond the Jeremiah's idea, no one can explain what happens in my 
application? I'm referring to the memory corruption. Surely it depends on the 
contemporary reading and writing by different thread at the same moment, but 
I would to know why it happens, even if I've locked all the functions.
I do not know -- adjacency_list I believe has out edge (and possibly in 
edge) sets for each vertex, plus I think there's a total edge list.  That 
second one might be getting corrupted even if you change unrelated 
vertices.  Also, anytime the documentation says that some operation 
corrupts iterators into the graph, that means it might move the location 
of some internal data structure and thus cause conflicts with multiple 
threads.  I have not looked specifically at the behavior you are seeing, 
though.

-- Jeremiah Willcock