
On Thu, May 20, 2010 at 5:50 PM, Jeremiah Willcock
On Thu, 20 May 2010, Adam Spargo wrote:
Hi, thanks for your reply, the fact that you have run something with two billion vertices tells me that it's worth looking into. I will probably come back to when I have done some runs. At the moment I'm mostly writing the code that gets the information to build my graph.
I would recommend using compressed_sparse_row_graph if you are working with directed (or bidirectional) graphs; the graph structure is read-only but it is memory-efficient and designed to scale to large graphs. There are several constructors that are also designed to minimize memory usage during graph construction.
I have 500GB RAM on a single node for prototyping, plus a cluster of 4000 nodes with 4GB to 64GB RAM each so it should be cool.
You should be able to use sequential BGL for your testing at first, then; you might even be able to use your 500GB machine for production runs. We have not tried to scale Parallel BGL up to the 4000-node level yet, but are working on performance improvements to get it to scale that far. We can give you prerelease versions you need scalability beyond what you are able to get with the released code. However, with your graph sizes, you might only need 8-16 machines with 64G each so you may not need the scalability improvements. It also depends on the complexity (as in big O) of your algorithms, though. We are also working on taking advantage of shared-memory systems; specific support for them is not in the released code yet (other than that you can run multiple MPI processes on the same machine).
I'm still very much in the proof of concept stage. I'm one of the first to explicitly construct the overlap graph in a genome assembly program, most have something like it but can't really play with it before they barf the answer. So hopefully the BGL will give be the power to explore my graph, but like I said - early days.
OK. Please keep in touch and report any problems (or successes) you have. We always appreciate users who are using BGL and PBGL at large scale.
-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Jeremiah, Can you say a bit more about the work you mention on BGL for shared-memory ? Has some implementation been done ? I ask because I am considering to use BGL for parallel computing on shared-memory machines. Thanks Bo