
On Jun 12, 2009, at 1:16 PM, Nick Collier wrote:
I running into an issue where an irecv followed by a send results in deadlock. A simple test case,
class Item { private:
friend class boost::serialization::access;
template<class Archive> void serialize(Archive& ar, const unsigned int version) { ar & val; }
public: int val; Item() : val(1) { }
};
struct Receipt {
boost::mpi::request request; std::vector<Item> items; };
int main(int argc, char **argv) {
mpi::environment env(argc, argv); mpi::communicator world; Receipt receipt;
vector<Item> msg(100000);
int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg);
receipt.request.wait();
cout << "Done" << endl; }
Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however.
Nick
Looking at this issue the reason is probably be that for a general Item type Boost.MPI uses Boost.Serialization to send and receive serialized data. For that the receiving side has to resize a receive buffer after receiving the size of the serialized message. Boost.MPI currently first sends the size of that buffer and then the data in a second message. The irecv call only posts a receive for the first (size) message since it cannot receive the buffer yet. The receive for the buffer is called only in the request.wait() function, which we never get to because we are still stuck in the send call. This is an unfortunate design problem of Boost.MPI and there are two ways around it: 1) use the skeleton/content mechanism or send fixed-size arrays and an MPI datatype for Item 2) one could change irecv to use a single receive call - but then we need to give irecv an upper bound for the buffer needed to receive the serialized data, and the receive will fail if the size was too small. Would that behavior be preferred? Matthias