Hello Jack,
On Mon, Jun 28, 2010 at 7:46 PM, Jack Bryan
This is the main part of me code, which may have deadlock.
Master: for (iRank = 0; iRank < availableRank ; iRank++) { destRank = iRank+1; for (taski = 1; taski <= TaskNumPerRank ; taski++) { resultSourceRank = destRank; recvReqs[taskCounterT2] = world.irecv(resultSourceRank, upStreamTaskTag, resultTaskPackageT2[iRank][taskCounterT3]); reqs = world.isend(destRank, taskTag, myTaskPackage); ++taskCounterT2; }
// taskTotalNum = availableRank * TaskNumPerRank // right now, availableRank =1, TaskNumPerRank =2 mpi::wait_all(recvReqs, recvReqs+(taskTotalNum)); ----------------------------------------------- worker: while (1) { world.recv(managerRank, downStreamTaskTag, resultTaskPackageW); do its local work on received task; destRank = masterRank; reqs = world.isend(destRank, taskTag, myTaskPackage); if (recv end signal) break; }
1. I can't see where the outer for-loop in master is closed; is the wait_all() part of that loop? (I assume it does not.) Can you send a minimal program that I can feed to a compiler and test? This could help. 2. Are you sure there is no tag mismatch between master and worker? master: world.isend(destRank, taskTag, myTaskPackage); ^^^^^^^ worker: world.recv(managerRank, downStreamTaskTag, resultTaskPackageW); ^^^^^^^^^^^^^^^^^ unless master::taskTag == worker::downStreamTaskTag, the recv() will wait forever. Similarly, the following requires that master::upStreamTaskTag == worker::taskTag: master: ... = world.irecv(resultSourceRank, upStreamTaskTag, ...); worker: world.isend(destRank, taskTag, myTaskPackage); // destRank==masterRank 3. Do the source/destination ranks match? The master waits for messages from destinations 1..availableRank (inclusive range), and the worker waits for a message from "masterRank" (is this 0?) 4. Does the master work if you replace the main loop with the following? Master: for (iRank = 0; iRank < availableRank ; iRank++) { destRank = iRank+1; for (taski = 1; taski <= TaskNumPerRank ; taski++) { // XXX: the following code does not contain any reference to // "taski": it is sending "TaskNumPerRank" copies of the // same message ... reqs = world.isend(destRank, taskTag, myTaskPackage); }; }; // I assume the outer loop does *not* include the wait_all() // expect a message from each Task int n = 0; while (n < taskTotalNum) { mpi::status status = world.probe(); world.recv(status.source(), status.tag(), resultTaskPackageT2[status.source()][taskCounterT3]); ++n; }; Best regards, Riccardo