
On Thu, 20 May 2010, Adam Spargo wrote:
Hi, I am working on genome assembly software and hope that the BGL can save me a lot of development time, but before I make the investment in learning the library can somebody advise me on whether it is appropriate.
My initial test sets will be quite small. However in the end I will want to scale up to on the order of a billion nodes, quite sparsely connected. We have the RAM and many CPUs, but will the code scale up this far?
For this level of scalability, we have the Parallel BGL (mostly in boost/graph/distributed and libs/graph_parallel; more info at URL:http://www.osl.iu.edu/research/pbgl/) that runs on distributed-memory systems using MPI. We have successfully run tests up to two billion or so vertices (16G undirected edges) on 96 machines (4GiB of memory each). How much RAM and how many CPUs do you have? PBGL works on clusters or SMP systems, but remember that RAM is the usual limit on how many vertices you have on a single machine, not CPU speed. How many edges do you have? Directed or undirected? How much data do you need to attach to each vertex or edge? What kinds of algorithms do you want to run? -- Jeremiah Willcock