Re: [Boost-users] [BGL] Upper limits on graph size

20 May 2010


      On Thu, 20 May 2010, Adam Spargo wrote:
...
Hi, I am working on genome assembly software and hope that the BGL can 
save me a lot of development time, but before I make the investment in 
learning the library can somebody advise me on whether it is 
appropriate.
My initial test sets will be quite small. However in the end I will want 
to scale up to on the order of a billion nodes, quite sparsely 
connected. We have the RAM and many CPUs, but will the code scale up 
this far?
For this level of scalability, we have the Parallel BGL (mostly in 
boost/graph/distributed and libs/graph_parallel; more info at 
<URL:http://www.osl.iu.edu/research/pbgl/>) that runs on 
distributed-memory systems using MPI.  We have successfully run tests up 
to two billion or so vertices (16G undirected edges) on 96 machines (4GiB 
of memory each).  How much RAM and how many CPUs do you have?  PBGL works 
on clusters or SMP systems, but remember that RAM is the usual limit on 
how many vertices you have on a single machine, not CPU speed.  How many 
edges do you have?  Directed or undirected?  How much data do you need to 
attach to each vertex or edge?  What kinds of algorithms do you want to 
run?

-- Jeremiah Willcock