I running into an issue where an irecv followed by a send results in deadlock. A simple test case, class Item { private: friend class boost::serialization::access; template<class Archive> void serialize(Archive& ar, const unsigned int version) { ar & val; } public: int val; Item() : val(1) { } }; struct Receipt { boost::mpi::request request; std::vector<Item> items; }; int main(int argc, char **argv) { mpi::environment env(argc, argv); mpi::communicator world; Receipt receipt; vector<Item> msg(100000); int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg); receipt.request.wait(); cout << "Done" << endl; } Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however. Nick
Hi, Nick,
# I am not a MPI expert.
On Sat, Jun 13, 2009 at 4:16 AM, Nick Collier
I running into an issue where an irecv followed by a send results in deadlock. A simple test case,
Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however.
According to the MPI standard, MPI_irecv() finishes when MPI_wait() is calledg, and MPI_send() never returns before receive completes. So the deadlock is of no surprise. I think you shouldn't rely on the behavior with small size object. It is the buffering mechanism in the MPI implementation that avoid the deadlock of the small size object case. See http://www.mpi-forum.org/docs/ for specification. Best regards, -- Ryo IGARASHI, Ph.D. rigarash@gmail.com
Hmm, I think it should work. Thanks though. The example I posted was
modified from:
http://ci-tutor.ncsa.illinois.edu/content.php?cid=1137
Namely,
/* deadlock avoided */
#include
Hi, Nick,
# I am not a MPI expert.
On Sat, Jun 13, 2009 at 4:16 AM, Nick Collier
wrote: I running into an issue where an irecv followed by a send results in deadlock. A simple test case,
Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however.
According to the MPI standard, MPI_irecv() finishes when MPI_wait() is calledg, and MPI_send() never returns before receive completes. So the deadlock is of no surprise.
I think you shouldn't rely on the behavior with small size object. It is the buffering mechanism in the MPI implementation that avoid the deadlock of the small size object case.
See http://www.mpi-forum.org/docs/ for specification.
Best regards, -- Ryo IGARASHI, Ph.D. rigarash@gmail.com _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hi, Nick,
On Mon, Jun 15, 2009 at 11:03 PM, Nick Collier
Hmm, I think it should work. Thanks though. The example I posted was modified from:
http://ci-tutor.ncsa.illinois.edu/content.php?cid=1137
Namely,
/* deadlock avoided */ #include
#include void main (int argc, char **argv) {
int myrank; MPI_Request request; MPI_Status status; double a[100], b[100];
MPI_Init(&argc, &argv); /* Initialize MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Get rank */ if( myrank == 0 ) { /* Post a receive, send a message, then wait */ MPI_Irecv( b, 100, MPI_DOUBLE, 1, 19, MPI_COMM_WORLD, &request ); MPI_Send( a, 100, MPI_DOUBLE, 1, 17, MPI_COMM_WORLD ); MPI_Wait( &request, &status ); } else if( myrank == 1 ) { /* Post a receive, send a message, then wait */ MPI_Irecv( b, 100, MPI_DOUBLE, 0, 17, MPI_COMM_WORLD, &request ); MPI_Send( a, 100, MPI_DOUBLE, 0, 19, MPI_COMM_WORLD ); MPI_Wait( &request, &status ); }
MPI_Finalize(); /* Terminate MPI */
}
This example is wrong. This causes deadlock. MPI_Send() calls do not finish. You can avoid deadlock in either 4 examples below: (sorry for pseudocode and works only for np=2) 1. Reverse order of Send/Recv if (myrank == 0) { MPI_Recv(a); MPI_Send(b); } else { MPI_Send(a); MPI_Recv(b); } 2. Use non-blocking call and (Isend/Irecv order doesn't matter) if (myrank == 0) { MPI_Isend(a); MPI_Irecv(b); } else { MPI_Isend(b); MPI_Irecv(a); } MPI_Wait(); 3. Use sendrecv call if (myrank == 0) { MPI_Sendrecv(a, b); } else { MPI_Sendrecv(b, a); } 4. Use one non-blocking call if (myrank == 0) { MPI_Irecv(b); MPI_send(a); MPI_wait(); } else { MPI_recv(a); MPI_send(b); } (Of course you may think of other ways without deadlock) You should be careful for the difference between yours and my 4th example. Again, you should rely only on the standard, which can be downloaded from http://www.mpi-forum.org/docs/docs.html . -- Ryo IGARASHI, Ph.D. rigarash@gmail.com
Nick Collier
I running into an issue where an irecv followed by a send results in deadlock. A simple test case, [snip] int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg); [snip]
Does this even work with non-boost MPI? My reading (skimming) of the MPI spec does not generate the implication that synchronous sends can be asynchronously received. See: http://www.mpi-forum.org/docs/mpi21-report/node58.htm#Node58 Note the emphasized part in: The fields in a status object returned by a call to MPI_WAIT, MPI_TEST, or any of the other derived functions *( MPI_{TEST|WAIT}{ALL|SOME|ANY}), where the request corresponds to a send call, are undefined, with two exceptions: The error status field will contain valid information if the wait or test call returned with MPI_ERR_IN_STATUS; and the returned status can be queried by the call MPI_TEST_CANCELLED.* _Wait and _Test not working *does* imply, to me, that synchronous and asynchronous communication may not be mixed for the same communication. -tom
On Wed, 1 Jul 2009, tom fogal wrote:
Nick Collier
writes: I running into an issue where an irecv followed by a send results in deadlock. A simple test case, [snip] int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg); [snip]
Does this even work with non-boost MPI? My reading (skimming) of the MPI spec does not generate the implication that synchronous sends can be asynchronously received.
They can -- see the seventh paragraph of http://www.mpi-forum.org/docs/mpi21-report-bw/node55.htm just before "Advice to users". -- Jeremiah Willcock
On Jun 12, 2009, at 1:16 PM, Nick Collier wrote:
I running into an issue where an irecv followed by a send results in deadlock. A simple test case,
class Item { private:
friend class boost::serialization::access;
template<class Archive> void serialize(Archive& ar, const unsigned int version) { ar & val; }
public: int val; Item() : val(1) { }
};
struct Receipt {
boost::mpi::request request; std::vector<Item> items; };
int main(int argc, char **argv) {
mpi::environment env(argc, argv); mpi::communicator world; Receipt receipt;
vector<Item> msg(100000);
int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg);
receipt.request.wait();
cout << "Done" << endl; }
Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however.
Nick
Looking at this issue the reason is probably be that for a general Item type Boost.MPI uses Boost.Serialization to send and receive serialized data. For that the receiving side has to resize a receive buffer after receiving the size of the serialized message. Boost.MPI currently first sends the size of that buffer and then the data in a second message. The irecv call only posts a receive for the first (size) message since it cannot receive the buffer yet. The receive for the buffer is called only in the request.wait() function, which we never get to because we are still stuck in the send call. This is an unfortunate design problem of Boost.MPI and there are two ways around it: 1) use the skeleton/content mechanism or send fixed-size arrays and an MPI datatype for Item 2) one could change irecv to use a single receive call - but then we need to give irecv an upper bound for the buffer needed to receive the serialized data, and the receive will fail if the size was too small. Would that behavior be preferred? Matthias
Hi Nick, can you please file a ticket with your example code so that we do not forget about this? I understand the issue but the fix will not be easy. Matthias On 12 Jun 2009, at 13:16, Nick Collier wrote:
I running into an issue where an irecv followed by a send results in deadlock. A simple test case,
class Item { private:
friend class boost::serialization::access;
template<class Archive> void serialize(Archive& ar, const unsigned int version) { ar & val; }
public: int val; Item() : val(1) { }
};
struct Receipt {
boost::mpi::request request; std::vector<Item> items; };
int main(int argc, char **argv) {
mpi::environment env(argc, argv); mpi::communicator world; Receipt receipt;
vector<Item> msg(100000);
int other = world.rank() == 0 ? 1 : 0; cout << world.rank() << " irecv from " << other << endl; receipt.request = world.irecv(other, 0, receipt.items); cout << world.rank() << " sending to " << other << endl; world.send(other, 0, msg);
receipt.request.wait();
cout << "Done" << endl; }
Run with mpirun -np 2, this never completes. It does complete with vector<Item> msg(10) however.
Nick
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (5)
-
Jeremiah Willcock
-
Matthias Troyer
-
Nick Collier
-
Ryo IGARASHI
-
tom fogal