Thanks for your reply. 

I have checked the tags, master and worker tags match. 

The deadlock happens in the case of 2 tasks scheduled on one processor. 

If there is only one task on one processor, there is no deadlock.
It works well.

The master is resopnsible for scheduling tasks to workers, which need to run 
the assigned tasks and feedback results to master. 

if I assign one task to each worker, it works well. 

But, when I increase the # of task to 2 on worker node, it is deadlock. 

The master only schedules 2 tasks to one worker in order to simplify the analysis
for the poential deadlock. 

The worker can receive the 2 tasks and run them, but the master cannot get the 
results from worker.


the main idea: 

master (node0)

counter=0;
totalTaskNum =2;
while (counter < totalTaskNum )
{
TaskPackage myTaskPackage(world); 

world.isend(node1, downStreamTaskTag, myTaskPackage); 
recvReqs[counter] = world.irecv(node1, upStreamtaskTag, taskResultPackage[counter]);
counter++;
}
world.wait_all(recvReqs, recvReqs+(totalTaskNum));

worker (node 1): 

while(1)
{
TaskPackage workerTaskPackage(world); 
world.recv(node0,downStreamTaskTag,  workerTaskPackage ); 

do it local work;

world.isend(node0, upStreamTaskTag, workerTaskPackage); 

if (no new task)
break;
}

My code has many classes, I am trying to find out how to cut out the main part from it. 

Any help is appreciated. 


thanks

Jack

> Date: Mon, 28 Jun 2010 21:28:47 +0200
> From: riccardo.murri@gmail.com
> To: boost-users@lists.boost.org
> Subject: Re: [Boost-users] boostMPI asychronous communication
>
> Hello Jack,
>
> On Mon, Jun 28, 2010 at 7:46 PM, Jack Bryan <dtustudy68@hotmail.com> wrote:
> > This is the main part of me code, which may have deadlock.
> >
> > Master:
> > for (iRank = 0; iRank < availableRank ; iRank++)
> > {
> > destRank = iRank+1;
> > for (taski = 1; taski <=  TaskNumPerRank ; taski++)
> > {
> > resultSourceRank = destRank;
> > recvReqs[taskCounterT2] = world.irecv(resultSourceRank, upStreamTaskTag, resultTaskPackageT2[iRank][taskCounterT3]);
> > reqs = world.isend(destRank, taskTag, myTaskPackage);
> > ++taskCounterT2;
> > }
> >
> > // taskTotalNum = availableRank * TaskNumPerRank
> > // right now, availableRank =1, TaskNumPerRank =2
> > mpi::wait_all(recvReqs, recvReqs+(taskTotalNum));
> > -----------------------------------------------
> > worker:
> > while (1)
> > {
> > world.recv(managerRank, downStreamTaskTag, resultTaskPackageW);
> > do its local work on received task;
> > destRank = masterRank;
> > reqs = world.isend(destRank, taskTag, myTaskPackage);
> > if (recv end signal)
> >   break;
> > }
>
> 1. I can't see where the outer for-loop in master is closed; is the
> wait_all() part of that loop? (I assume it does not.) Can you send a
> minimal program that I can feed to a compiler and test? This could
> help.
>
> 2. Are you sure there is no tag mismatch between master and worker?
>
> master: world.isend(destRank, taskTag, myTaskPackage);
> ^^^^^^^
> worker: world.recv(managerRank, downStreamTaskTag, resultTaskPackageW);
> ^^^^^^^^^^^^^^^^^
>
> unless master::taskTag == worker::downStreamTaskTag, the recv() will
> wait forever.
>
> Similarly, the following requires that master::upStreamTaskTag ==
> worker::taskTag:
>
> master: ... = world.irecv(resultSourceRank, upStreamTaskTag, ...);
> worker: world.isend(destRank, taskTag, myTaskPackage); //
> destRank==masterRank
>
> 3. Do the source/destination ranks match? The master waits for messages from
> destinations 1..availableRank (inclusive range), and the worker waits
> for a message from "masterRank" (is this 0?)
>
> 4. Does the master work if you replace the main loop with the following?
>
> Master:
> for (iRank = 0; iRank < availableRank ; iRank++)
> {
> destRank = iRank+1;
> for (taski = 1; taski <=  TaskNumPerRank ; taski++)
> {
> // XXX: the following code does not contain any reference to
> // "taski": it is sending "TaskNumPerRank" copies of the
> // same message ...
> reqs = world.isend(destRank, taskTag, myTaskPackage);
> };
> }; // I assume the outer loop does *not* include the wait_all()
>
> // expect a message from each Task
> int n = 0;
> while (n < taskTotalNum) {
> mpi::status status = world.probe();
> world.recv(status.source(), status.tag(),
> resultTaskPackageT2[status.source()][taskCounterT3]);
> ++n;
> };
>
>
> Best regards,
> Riccardo
> _______________________________________________
> Boost-users mailing list
> Boost-users@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.