Re: [boost] [MPI] Boost.MPI Review

16 Sep 2006

      On Sep 16, 2006, at 10:22 AM, Markus Blatt wrote:
...
Just forget about it. I was missing the tags in the collective
communication where they definitely are none in the MPI
standard. Probably I should have gotten more sleep. Sorry.
I would actually also love to have tags there :-)
...
...
I hope these answers address the issues you had in mind. I can
elaborate if you want.
The question came up when I looked into mpi/collectives/broadcast.hpp:
// We're sending a type that does not have an associated MPI
  // datatype, so we'll need to serialize it. Unfortunately, this
  // means that we cannot use MPI_Bcast, so we'll just send from the
  // root to everyone else.
  template<typename T>
  void
  broadcast_impl(const communicator& comm, T& value, int root,
  mpl::false_)
If this function gets called the performance will definitely be
suboptimal as the root will send to all others. It this just the case
if no MPI_Datatype was constructed (like for the linked list) or is it
called whenever the boost serialization is used?
OK, I see your concern. This is actually only used when no  
MPI_Datatype can be constructed. That is when there no MPI_Datatype  
is possible, such as for a linked list, and if you do not use the  
skeleton&content mechanism either.

Since this part of the code was written by Doug Gregor, I ask him to  
correct me if I say something wrong now or if I miss something. When  
no MPI datatype exists then we need to pack the object into a buffer  
using MPI_Pack, and the buffer needs to be broadcast. So far we all  
seem to agree. The problem now is that the receiving side needs to  
know the size of the buffer to allocate enough memory, but there is  
no MPI_Probe for collectives that could be used to inquire about the  
message size. I believe that this was the reason for implementing the  
broadcast as a sequence of nonblocking sends and receives (Doug?).  
Thinking about it, I realize that one could instead do two  
consecutive broadcasts: one to send the size of the buffer and then  
another one to send the buffer. This will definitively be faster on  
machines with special hardware for collectives. On Beowulf clusters  
on the other hand the current version is faster since most MPI  
implementation just perform the broadcast as a sequence of N-1 send/ 
receive operations from the root instead of optimizing it.

Matthias

Re: [boost] [MPI] Boost.MPI Review

Matthias Troyer