Re: [Boost-users] mpi isend to group

On 23:12 Sun 30 Dec , Philipp Kraus wrote:
Sorry I have forgot a "main information": This calls are a preexecution of the algorithm. The main algorithm is cycled and uses a MPI blocking communication, so only the preexecution must be a little bit weak.
Sorry, I don't quite get it. Could you elaborate on this? What is in this context the "preexecution", what is the main algorithm, and what do you mean by "must be a little bit weak"?
Thanks for the code, but this code is based on OpenMPI. My program must be also work wirth MPI CH2 (MPI implemention on Windows based systems), so I would like to create a boost-only solution.
The code uses the MPI standard interface, it contains nothing specific to Open MPI (BTW: it's not called "OpenMPI", but that's just nitpicking). It thus works with MPICH2, too -- no matter if the OS is Linux or Windows.
I do it at the moment with:
while (thread_is_running) {
if (!l_mpicom.rank()) for(std::size_t i=1; i < l_mpicom.size(); ++i) l_mpicom.isend(i, 666, l_task.getID()); else if (boost::optionalmpi::status l_status = l_mpicom.iprobe(0, 666)) { std::size_t l_taskid = 0; l_mpicom.recv( l_status->source(), l_status->tag(), l_taskid ); } }
You could rebuild the binary tree communication scheme I've illustrated using Boost MPI (I'm too lazy). I'm pretty sure calling isend() without ever waiting for completion is illegal according to the MPI standard and will result in a memory leak (plus the code won't scale thanks to the for loop). Cheers -Andreas -- ========================================================== Andreas Schäfer HPC and Grid Computing Chair of Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-27910 PGP/GPG key via keyserver http://www.libgeodecomp.org ========================================================== (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination!

Am 30.12.2012 um 23:22 schrieb Andreas Schäfer:
On 23:12 Sun 30 Dec , Philipp Kraus wrote:
Sorry I have forgot a "main information": This calls are a preexecution of the algorithm. The main algorithm is cycled and uses a MPI blocking communication, so only the preexecution must be a little bit weak.
Sorry, I don't quite get it. Could you elaborate on this? What is in this context the "preexecution", what is the main algorithm, and what do you mean by "must be a little bit weak"?
The algorithm has got a "initialization", so the core 0 must send an identifier to all other cores and each core can initialize all data for the algorithm locally and independend from the other cores. So I need only the exchange of the ID. After all cores have finished their local initialization the cores must be "synchronized". So I named this initialization "preexecution". For the algorithm the time length of this preexecutation is irrelevant, so I need only that each core gets the ID from core 0 and after initialization the core must be switched into a "synchronized communication".
I do it at the moment with:
while (thread_is_running) {
if (!l_mpicom.rank()) for(std::size_t i=1; i < l_mpicom.size(); ++i) l_mpicom.isend(i, 666, l_task.getID()); else if (boost::optionalmpi::status l_status = l_mpicom.iprobe(0, 666)) { std::size_t l_taskid = 0; l_mpicom.recv( l_status->source(), l_status->tag(), l_taskid ); } }
You could rebuild the binary tree communication scheme I've illustrated using Boost MPI (I'm too lazy). I'm pretty sure calling isend() without ever waiting for completion is illegal according to the MPI standard and will result in a memory leak (plus the code won't scale thanks to the for loop).
I don't see a problem !? While loop checks on each iteration if there is a message in the MPI queue, if not I can do other things. If a message in the queue, I read them, and store it in my local variable and work with the value later (not shown here in the example). So the core 0 sends the message via isend and goes later in the loop into a mpi.barrier, each other core goes also into the barrier if it received the message, so all cores, that have received the message goes stops at the barrier call, and all cores that haven't received the message does other things, but the core 0 sends to all other cores the message, so after some time, each core must be reached the MPI barrier. The only thing which is indeterminisitic, is the time until all cores reaches the barrier command (in pseudo code it shows): while (thread_is_running) { // this is the preexecution if core == 0 get Task for i = 1 to mpi.size() mpi_isend( task.getID ) else status = mpi.iprobe( 0 ) if status received id // end prexecution // this is the main algorithm if id exists on core mpi barrier { here can be a blocking & synchronized structure for the main algorithm } // end main algorithm do other things if there is no data } IMHO there can not be a memory leak, because each core gets the data (except the communication between the cores fails, but in this case an exception should be thrown). The barrier synchronize all cores at a special point and starts a synchronized block, after the block is finished each core can work independend again Thanks for the great discussion Phil

On 00:23 Mon 31 Dec , Philipp Kraus wrote:
The algorithm has got a "initialization", so the core 0 must send an identifier to all other cores and each core can initialize all data for the algorithm locally and independend from the other cores. So I need only the exchange of the ID. After all cores have finished their local initialization the cores must be "synchronized". So I named this initialization "preexecution". For the algorithm the time length of this preexecutation is irrelevant, so I need only that each core gets the ID from core 0 and after initialization the core must be switched into a "synchronized communication".
Ah, I get it. Thanks!
I do it at the moment with:
while (thread_is_running) {
if (!l_mpicom.rank()) for(std::size_t i=1; i < l_mpicom.size(); ++i) l_mpicom.isend(i, 666, l_task.getID()); else if (boost::optionalmpi::status l_status = l_mpicom.iprobe(0, 666)) { std::size_t l_taskid = 0; l_mpicom.recv( l_status->source(), l_status->tag(), l_taskid ); } }
You could rebuild the binary tree communication scheme I've illustrated using Boost MPI (I'm too lazy). I'm pretty sure calling isend() without ever waiting for completion is illegal according to the MPI standard and will result in a memory leak (plus the code won't scale thanks to the for loop).
I don't see a problem !? While loop checks on each iteration if there is a message in the MPI queue, if not I can do other things. If a message in the queue, I read them, and store it in my local variable and work with the value later (not shown here in the example).
The problem is that you never seem to call MPI_Wait() for the requests generated by MPI_Isend (or Boost MPI's isend(), which is in fact "just" a convenient wrapper). Let me quote the the man page: "Nonblocking calls allocate a communication request object" These requests are only freed after calling MPI_Wait for them. Hence the memory leak.
Thanks for the great discussion
You're welcome. :-) Cheers -Andreas -- ========================================================== Andreas Schäfer HPC and Grid Computing Chair of Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-27910 PGP/GPG key via keyserver http://www.libgeodecomp.org ========================================================== (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination!

The problem is that you never seem to call MPI_Wait() for the requests generated by MPI_Isend (or Boost MPI's isend(), which is in fact "just" a convenient wrapper). Let me quote the the man page:
"Nonblocking calls allocate a communication request object"
These requests are only freed after calling MPI_Wait for them. Hence the memory leak.
Thanks for this information. I'm using this isend structure on some more places, so I need a release of the communication object. Phil
participants (2)
-
Andreas Schäfer
-
Philipp Kraus