Re: [boost] [MPI] Boost.MPI Review

16 Sep 2006

      On Sep 16, 2006, at 5:05 AM, Matthias Troyer wrote:
...
On Sep 16, 2006, at 10:22 AM, Markus Blatt wrote:
...
The question came up when I looked into mpi/collectives/ 
broadcast.hpp:
// We're sending a type that does not have an associated MPI
  // datatype, so we'll need to serialize it. Unfortunately, this
  // means that we cannot use MPI_Bcast, so we'll just send from the
  // root to everyone else.
  template<typename T>
  void
  broadcast_impl(const communicator& comm, T& value, int root,
  mpl::false_)
If this function gets called the performance will definitely be
suboptimal as the root will send to all others. It this just the case
if no MPI_Datatype was constructed (like for the linked list) or  
is it
called whenever the boost serialization is used?
OK, I see your concern. This is actually only used when no
MPI_Datatype can be constructed. That is when there no MPI_Datatype
is possible, such as for a linked list, and if you do not use the
skeleton&content mechanism either.
Right. from code code standpoint, in addition to that broadcast_impl  
shown above, there is one that looks like this:

     // We're sending a type that has an associated MPI datatype, so
     // we'll use MPI_Bcast to do all of the work.
     template<typename T>
     void broadcast_impl(const communicator& comm, T& value, int  
root, mpl::true_)

That last parameter decides which implementation to use, based on  
whether we have or can create an MPI_Datatype for the type T.
...
Since this part of the code was written by Doug Gregor, I ask him to
correct me if I say something wrong now or if I miss something. When
no MPI datatype exists then we need to pack the object into a buffer
using MPI_Pack, and the buffer needs to be broadcast. So far we all
seem to agree. The problem now is that the receiving side needs to
know the size of the buffer to allocate enough memory, but there is
no MPI_Probe for collectives that could be used to inquire about the
message size. I believe that this was the reason for implementing the
broadcast as a sequence of nonblocking sends and receives (Doug?).
Yes, this was the reason for the sequence of nonblocking sends and  
receives.
...
Thinking about it, I realize that one could instead do two
consecutive broadcasts: one to send the size of the buffer and then
another one to send the buffer. This will definitively be faster on
machines with special hardware for collectives. On Beowulf clusters
on the other hand the current version is faster since most MPI
implementation just perform the broadcast as a sequence of N-1 send/
receive operations from the root instead of optimizing it.
Right. I guess we could provide some kind of run-time configuration  
switch that decides between the two implementations, if someone runs  
into a case where it matters.

	Doug

Re: [boost] [MPI] Boost.MPI Review

Douglas Gregor