Re: [Boost-users] mpi isend to group

30 Dec 2012

      Am 30.12.2012 um 23:22 schrieb Andreas Schäfer:
...
On 23:12 Sun 30 Dec     , Philipp Kraus wrote:
...
Sorry I have forgot a "main information": This calls are a preexecution of
the algorithm. The main algorithm is cycled and uses a MPI blocking
communication, so only the preexecution must be a little bit weak.
Sorry, I don't quite get it. Could you elaborate on this? What is in
this context the "preexecution", what is the main algorithm, and what
do you mean by "must be a little bit weak"?
The algorithm has got a "initialization", so the core  0 must send an
identifier to all other cores and each core can initialize all data for the
algorithm locally and independend from the other cores. So I need only the
exchange of the ID. After all cores have finished their local initialization the cores
must be "synchronized". So I named this initialization "preexecution". 
For the algorithm the time length of  this preexecutation is irrelevant, so I
need only that each core gets the ID from core 0 and after initialization the core 
must be switched into a "synchronized communication".
...
...
I do it at the moment with:
while (thread_is_running) {
if (!l_mpicom.rank())
                   for(std::size_t i=1; i < l_mpicom.size(); ++i)
                       l_mpicom.isend(i, 666, l_task.getID());
           else
               if (boost::optional<mpi::status> l_status = l_mpicom.iprobe(0, 666))
               {
                   std::size_t l_taskid = 0;
                   l_mpicom.recv(  l_status->source(), l_status->tag(), l_taskid );
               }
}
You could rebuild the binary tree communication scheme I've
illustrated using Boost MPI (I'm too lazy). I'm pretty sure calling
isend() without ever waiting for completion is illegal according to
the MPI standard and will result in a memory leak (plus the code won't
scale thanks to the for loop).
I don't see a problem !? While loop checks on each iteration if there is
a message in the MPI queue, if not I can do other things. If a message
in the queue, I read them, and store it in my local variable and work with
the value later (not shown here in the example).

So the core 0 sends the message via isend and goes later in the loop into a mpi.barrier,
each other core goes also into the barrier if it received the message, so all cores, that
have received the message goes stops at the barrier call, and all cores that haven't received
the message does other things, but the core 0 sends to all other cores the message, so
after some time, each core must be reached the MPI barrier.
The only thing which is indeterminisitic, is the time until all cores reaches the barrier command
(in pseudo code it shows):

while (thread_is_running)
{

     // this is the preexecution
     if core == 0
            get Task
            for i = 1 to mpi.size()
                 mpi_isend( task.getID )
     else
           status = mpi.iprobe( 0 )
           if status
                 received id
   // end prexecution

   // this is the main algorithm
   if id exists on core
       mpi barrier
       {
            here can be a blocking & synchronized structure for the main algorithm
       }
  // end main algorithm

  do other things if there is no data
}

IMHO there can not be a memory leak, because each core gets the data (except the communication between
the cores fails, but in this case an exception should be thrown). The barrier synchronize all cores at a special
point and starts a synchronized block, after the block is finished each core can work independend again

Thanks for the great discussion

Phil