
Hi! I'm using the boost::mpi library for a HPC project. I really like the interface, but I'm currently getting very poor performance from the library. I started out by serializing my objects (which are full of pointers and allocated memory, and what not), but that didn't perform at all, so I went for a more brute force approach instead. No luck there either. Essentially what I want to do in a typical case, is to send a set of indexes (4 integers) followed by an array of doubles. The arrays sizes are fixed at startup, and have a size between 10-60 kB each. There are usually many of these arrays, and the total amount of data to be communicated at the end of a calculation is of the order of 1GB. Here is my current implementation: 1) Pack the indexes into an array of 4 integers and send (or broadcast) to the receiver(s). The receiver figures out where to store the next packet based on the indexes (this takes next to no time). 2) Send the array to the receivers using: double *data = coefs->data(); world.send(who, tag, data, nCoefs); where coefs is a pointer to an Eigen2 vector, and data is a pointer to a contiguous array of doubles. I'm running the code on a big HPC cluster with individual nodes with 8 cores and 16 GB memory, all connected with Infiniband. Using this setup I achieve a maximum transfer rate of 66 MB/s doing all to one communication, which is approx. 10 times less than what I'm supposed to get. I will not even mention how long a broadcast takes, but suffice to say that it takes 20-25 times longer than doing the calculation. I get the same poor performance regardless of whether I'm communicating only over 127.0.0.1 or over the net. Since out environment is homogeneous, I have compiled the both the mpi library and my program defining the BOOST_MPI_HOMOGENEOUS macro. I will try to batch more packages into larger units, but earlier experiences (with basic MPI) has shown that with 65 kB arrays, transfer rates of 1 GB/s are possible over our Infiniband switch. Asynchronous transfer is an option, but that complicates the load balancing algorithms to a point where I really don't want to go unless under gun point. Any suggestions? Best regards, -jonas- -- ________________________________________________________________________ Dr. Jonas Jusélius Centre for Theoretical and E-mail : jonas.juselius@uit.no Computational Chemistry Telephone : +47 77644079 Department of Chemistry Fax : +47 77644765 University of Tromsø Mobile ph.: +47 47419869 N-9037 Tromsø, NORWAY http://jonas.iki.fi _______________________________________________________________________ [ PGP key : keyserver or http://jonas.iki.fi/pubkey.asc ] [ Fingerprint: 2516 A57A 3012 7962 287D B66E C1A9 157F 0A59 7A66 ]