[BGL] Upper limits on graph size

Adam Spargo

20 May 2010 20 May '10

11:34 a.m.

Hi, I am working on genome assembly software and hope that the BGL can save me a lot of development time, but before I make the investment in learning the library can somebody advise me on whether it is appropriate. My initial test sets will be quite small. However in the end I will want to scale up to on the order of a billion nodes, quite sparsely connected. We have the RAM and many CPUs, but will the code scale up this far? Thanks in advance, Adam. -- Dr Adam Spargo High Performance Assembly Group email: aws@sanger.ac.uk Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728 Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Show replies by date

Jeremiah Willcock

20 May 20 May

3:06 p.m.

On Thu, 20 May 2010, Adam Spargo wrote:

...

Hi, I am working on genome assembly software and hope that the BGL can save me a lot of development time, but before I make the investment in learning the library can somebody advise me on whether it is appropriate.

My initial test sets will be quite small. However in the end I will want to scale up to on the order of a billion nodes, quite sparsely connected. We have the RAM and many CPUs, but will the code scale up this far?

For this level of scalability, we have the Parallel BGL (mostly in boost/graph/distributed and libs/graph_parallel; more info at <URL:http://www.osl.iu.edu/research/pbgl/>) that runs on distributed-memory systems using MPI. We have successfully run tests up to two billion or so vertices (16G undirected edges) on 96 machines (4GiB of memory each). How much RAM and how many CPUs do you have? PBGL works on clusters or SMP systems, but remember that RAM is the usual limit on how many vertices you have on a single machine, not CPU speed. How many edges do you have? Directed or undirected? How much data do you need to attach to each vertex or edge? What kinds of algorithms do you want to run? -- Jeremiah Willcock

Adam Spargo

3:16 p.m.

Hi, thanks for your reply, the fact that you have run something with two billion vertices tells me that it's worth looking into. I will probably come back to when I have done some runs. At the moment I'm mostly writing the code that gets the information to build my graph. I have 500GB RAM on a single node for prototyping, plus a cluster of 4000 nodes with 4GB to 64GB RAM each so it should be cool. I'm still very much in the proof of concept stage. I'm one of the first to explicitly construct the overlap graph in a genome assembly program, most have something like it but can't really play with it before they barf the answer. So hopefully the BGL will give be the power to explore my graph, but like I said - early days. Thanks again, Adam. -- Dr Adam Spargo High Performance Assembly Group email: aws@sanger.ac.uk Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728 Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919 On Thu, 20 May 2010, Jeremiah Willcock wrote:

...

On Thu, 20 May 2010, Adam Spargo wrote:

...
Hi, I am working on genome assembly software and hope that the BGL can save me a lot of development time, but before I make the investment in learning the library can somebody advise me on whether it is appropriate.

My initial test sets will be quite small. However in the end I will want to scale up to on the order of a billion nodes, quite sparsely connected. We have the RAM and many CPUs, but will the code scale up this far?

For this level of scalability, we have the Parallel BGL (mostly in boost/graph/distributed and libs/graph_parallel; more info at <URL:http://www.osl.iu.edu/research/pbgl/>) that runs on distributed-memory systems using MPI. We have successfully run tests up to two billion or so vertices (16G undirected edges) on 96 machines (4GiB of memory each). How much RAM and how many CPUs do you have? PBGL works on clusters or SMP systems, but remember that RAM is the usual limit on how many vertices you have on a single machine, not CPU speed. How many edges do you have? Directed or undirected? How much data do you need to attach to each vertex or edge? What kinds of algorithms do you want to run?

-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Jeremiah Willcock

3:50 p.m.

On Thu, 20 May 2010, Adam Spargo wrote:

...

Hi, thanks for your reply, the fact that you have run something with two billion vertices tells me that it's worth looking into. I will probably come back to when I have done some runs. At the moment I'm mostly writing the code that gets the information to build my graph.

I would recommend using compressed_sparse_row_graph if you are working with directed (or bidirectional) graphs; the graph structure is read-only but it is memory-efficient and designed to scale to large graphs. There are several constructors that are also designed to minimize memory usage during graph construction.

...

I have 500GB RAM on a single node for prototyping, plus a cluster of 4000 nodes with 4GB to 64GB RAM each so it should be cool.

You should be able to use sequential BGL for your testing at first, then; you might even be able to use your 500GB machine for production runs. We have not tried to scale Parallel BGL up to the 4000-node level yet, but are working on performance improvements to get it to scale that far. We can give you prerelease versions you need scalability beyond what you are able to get with the released code. However, with your graph sizes, you might only need 8-16 machines with 64G each so you may not need the scalability improvements. It also depends on the complexity (as in big O) of your algorithms, though. We are also working on taking advantage of shared-memory systems; specific support for them is not in the released code yet (other than that you can run multiple MPI processes on the same machine).

...

I'm still very much in the proof of concept stage. I'm one of the first to explicitly construct the overlap graph in a genome assembly program, most have something like it but can't really play with it before they barf the answer. So hopefully the BGL will give be the power to explore my graph, but like I said - early days.

OK. Please keep in touch and report any problems (or successes) you have. We always appreciate users who are using BGL and PBGL at large scale. -- Jeremiah Willcock

Bo Jensen

7:31 p.m.

On Thu, May 20, 2010 at 5:50 PM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote:

...

On Thu, 20 May 2010, Adam Spargo wrote:

...
Hi, thanks for your reply, the fact that you have run something with two billion vertices tells me that it's worth looking into. I will probably come back to when I have done some runs. At the moment I'm mostly writing the code that gets the information to build my graph.

I would recommend using compressed_sparse_row_graph if you are working with directed (or bidirectional) graphs; the graph structure is read-only but it is memory-efficient and designed to scale to large graphs. There are several constructors that are also designed to minimize memory usage during graph construction.

...
I have 500GB RAM on a single node for prototyping, plus a cluster of 4000 nodes with 4GB to 64GB RAM each so it should be cool.

You should be able to use sequential BGL for your testing at first, then; you might even be able to use your 500GB machine for production runs. We have not tried to scale Parallel BGL up to the 4000-node level yet, but are working on performance improvements to get it to scale that far. We can give you prerelease versions you need scalability beyond what you are able to get with the released code. However, with your graph sizes, you might only need 8-16 machines with 64G each so you may not need the scalability improvements. It also depends on the complexity (as in big O) of your algorithms, though. We are also working on taking advantage of shared-memory systems; specific support for them is not in the released code yet (other than that you can run multiple MPI processes on the same machine).

...
I'm still very much in the proof of concept stage. I'm one of the first to explicitly construct the overlap graph in a genome assembly program, most have something like it but can't really play with it before they barf the answer. So hopefully the BGL will give be the power to explore my graph, but like I said - early days.

OK. Please keep in touch and report any problems (or successes) you have. We always appreciate users who are using BGL and PBGL at large scale.

-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Jeremiah, Can you say a bit more about the work you mention on BGL for shared-memory ? Has some implementation been done ? I ask because I am considering to use BGL for parallel computing on shared-memory machines. Thanks Bo

Jeremiah Willcock

7:58 p.m.

On Thu, 20 May 2010, Bo Jensen wrote:

...

Jeremiah, Can you say a bit more about the work you mention on BGL for shared-memory ? Has some implementation been done ? I ask because I am considering to use BGL for parallel computing on shared-memory machines.

This is very much work in progress; part of our experimental work handles threads in limited ways but this is not in Boost's version of PBGL and is not ready for production use. You can run PBGL with multiple MPI processes on the same machine, but this is somewhat inefficient. One issue with shared-memory parallelism in general for graph algorithms is that memory (both size and performance) is often a limit, rather than CPU performance. In that case, using multiple cores that share memory is not worthwhile. More CPU-intensive algorithms do benefit from more cores, though. -- Jeremiah Willcock

Bo Jensen

8:17 p.m.

On Thu, May 20, 2010 at 7:58 PM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote:

...

On Thu, 20 May 2010, Bo Jensen wrote:

...
Jeremiah, Can you say a bit more about the work you mention on BGL for shared-memory ? Has some implementation been done ? I ask because I am considering to use BGL for parallel computing on shared-memory machines.

This is very much work in progress; part of our experimental work handles threads in limited ways but this is not in Boost's version of PBGL and is not ready for production use. You can run PBGL with multiple MPI processes on the same machine, but this is somewhat inefficient. One issue with shared-memory parallelism in general for graph algorithms is that memory (both size and performance) is often a limit, rather than CPU performance. In that case, using multiple cores that share memory is not worthwhile. More CPU-intensive algorithms do benefit from more cores, though.

-- Jeremiah Willcock _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Thanks for the update. Is there a list I can join to follow the development ?

Jeremiah Willcock

8:29 p.m.

On Thu, 20 May 2010, Bo Jensen wrote:

...

On Thu, May 20, 2010 at 7:58 PM, Jeremiah Willcock <jewillco@osl.iu.edu> wrote:

...
On Thu, 20 May 2010, Bo Jensen wrote:

...
Jeremiah, Can you say a bit more about the work you mention on BGL for shared-memory ? Has some implementation been done ? I ask because I am considering to use BGL for parallel computing on shared-memory machines.

This is very much work in progress; part of our experimental work handles threads in limited ways but this is not in Boost's version of PBGL and is not ready for production use. You can run PBGL with multiple MPI processes on the same machine, but this is somewhat inefficient. One issue with shared-memory parallelism in general for graph algorithms is that memory (both size and performance) is often a limit, rather than CPU performance. In that case, using multiple cores that share memory is not worthwhile. More CPU-intensive algorithms do benefit from more cores, though.

-- Jeremiah Willcock

...

Thanks for the update. Is there a list I can join to follow the development ?

There is a list, but it is not used for announcements. The best way to keep up with PBGL development is to monitor our SVN trunk (https://svn.osl.iu.edu/svn/pbgl/trunk) or email Nicholas Edmonds (ngedmond@osl.iu.edu). -- Jeremiah Willcock

5563

Age (days ago)

5563

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Adam Spargo
Bo Jensen
Jeremiah Willcock