[MPI] all_gather() missing functionality or bug ?

Hello list, I discovered that the all_gather function from boost.mp throws an exception, if the local sizes of the to be gathered vector are different: terminate called after throwing an instance of 'terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
' what(): MPI_Allgather: MPI_ERR_IN_STATUS: error code in status boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception> [thisch:14211] *** Process received signal ***
I used the following form of the all_gather function: all_gather(comm, myvec.data(), myvec.size(), totalvec) where myvec is a std::vector<double>. As already mentioned the size of myvec can be different on each processor. The documentation [0] says that all_gather supports most uses of the MPI_Allgatherv. Therefore I am not sure if my use of all_gather is unsupported or a just a bug. all_gather works as excpected if myvec.size() is the same on all processors. ATM I use the following workaround: { //temporary all_gather fix: use MPI_Allgatherv vector<int> recvcts(numprocs); vector<int> displs(numprocs); const int nrealcoeffs = N; const int initialcoeffsperrank = nrealcoeffs/numprocs; const int remainder = nrealcoeffs%numprocs; size_t displtmp = 0; for(size_t i=0; i < numprocs; i++){ recvcts[i] = initialcoeffsperrank; if(i < remainder) recvcts[i]++; displs[i] = displtmp; displtmp += recvcts[i]; } allgradient.resize(N); MPI_Allgatherv( myvec.data(), myvec.size(), MPI_DOUBLE, totalvec.data(), recvcts.data(), displs.data(), MPI_DOUBLE, comm); } Regards Thomas [0] http://www.boost.org/doc/libs/1_50_0/doc/html/mpi/tutorial.html#mpi.c_mappin...

2012/7/25 Thomas Hisch <t.hisch@gmail.com>
Hello list,
I discovered that the all_gather function from boost.mp throws an exception, if the local sizes of the to be gathered vector are different:
terminate called after throwing an instance of 'terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
' what(): MPI_Allgather: MPI_ERR_IN_STATUS: error code in status
boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::mpi::exception>
[thisch:14211] *** Process received signal ***
I used the following form of the all_gather function: all_gather(comm, myvec.data(), myvec.size(), totalvec) where myvec is a std::vector<double>. As already mentioned the size of myvec can be different on each processor. The documentation [0] says that all_gather supports most uses of the MPI_Allgatherv. Therefore I am not sure if my use of all_gather is unsupported or a just a bug.
all_gather works as excpected if myvec.size() is the same on all processors. ATM I use the following workaround:
{ //temporary all_gather fix: use MPI_Allgatherv vector<int> recvcts(numprocs); vector<int> displs(numprocs); const int nrealcoeffs = N; const int initialcoeffsperrank = nrealcoeffs/numprocs; const int remainder = nrealcoeffs%numprocs; size_t displtmp = 0; for(size_t i=0; i < numprocs; i++){ recvcts[i] = initialcoeffsperrank; if(i < remainder) recvcts[i]++; displs[i] = displtmp; displtmp += recvcts[i]; }
allgradient.resize(N); MPI_Allgatherv( myvec.data(), myvec.size(), MPI_DOUBLE, totalvec.data(), recvcts.data(), displs.data(), MPI_DOUBLE, comm); }
Regards Thomas
[0] http://www.boost.org/doc/libs/1_50_0/doc/html/mpi/tutorial.html#mpi.c_mappin... _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hi Thomas, I had a similar problem in the past, but it was with the scatterv/gatherv pair. I submitted a patch (https://svn.boost.org/trac/boost/ticket/5292) completely based on the existing implementation (scatter/gather), but Boost.MPI devs didn't liked it. If you read table 17.5 ( http://www.boost.org/doc/libs/1_50_0/doc/html/mpi/tutorial.html#mpi.c_mappin...), you'll find their opinion about scatterv/gatherv. I think you have two options: 1) Implement it yourself by adapting the existing Boost.MPI implementation like i did. 2) Do the tedious and error-prone task of dealing with different sizes by hand, i.e., redistribute your chunk of data and do "point-to-point" communication with the remainder. Good luck! Júlio.
participants (2)
-
Júlio Hoffimann
-
Thomas Hisch