Hello
There is a known performance problem with serializing a std::vector over MPI. Basically, this prevents you from ever reaching the performance of C. The problem is on the receive side. When you receive a vector, if you don't know the size, the receive side has to: - get the number of elements of the vector - resize the vector (which initializes elements) - receive the elements in the vector data (reinitialize the elements) The C version of the idiom: - gets the number of elements - reserves (as opposed to resize) the memory for the elements - receive the element in the vector (initialize elements once). This might make a small or a large performance difference, profile!
According to the attached program there seems to be a much larger performance problem than initializing vector elements. The program first sends a vector of doubles using MPI, then sends another identical vector with boost::mpi and prints how long these took in seconds. Note that boost::mpi also sends two messages for run-time sized containers. For vectors of 1e6 items the program prints (mpi rank is the first number): mpi 0 resize: 0.0126891, send: 0.00988925, recv: 0 1 resize: 0.0131643, send: 0, recv: 0.00955247 boost::mpi 0 resize: 0.0096425, send: 0.279135, recv: 0 1 resize: 0, send: 0, recv: 0.295702 For vectors of 1e7 items: mpi 0 resize: 0.0974027, send: 0.0538886, recv: 0 1 resize: 0.105708, send: 0, recv: 0.0456324 boost::mpi 0 resize: 0.0517177, send: 2.70333, recv: 0 1 resize: 0, send: 0, recv: 2.82339 And vectors of 5e7 items: mpi 0 resize: 0.590099, send: 0.226269, recv: 0 1 resize: 0.440719, send: 0, recv: 0.375706 boost::mpi 0 resize: 0.198448, send: 13.5335, recv: 0 1 resize: 0, send: 0, recv: 14.0518 Boost::mpi version is always at least 10 times slower. It also seems to run out of memory with smaller number of items implying that unnecessary copies of data are created somewhere. Based on experience with more complex programs (e.g. http://dx.doi.org/10.1016/j.jastp.2014.08.012) I wouldn't recommend boost::mpi for high performance computing. Or in case of user error at least high performance is easier to get with pure MPI... I used boost-1.57.0, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) and mpirun (Open MPI) 1.6.5. Ilja