[Review][MPI] Boost.MPI review begins

Hi All, The view of the Boost.MPI (for Message Passing Interface) begins today, Sept. 6 and continues through Sept. 15. Description: The Message Passing Interface (MPI) is a standard interface for message passing in high-performance parallel applications. It defines a library interface, available from C, Fortran, and C++, for which there are many MPI implementations. Although there exist C++ bindings for MPI, they offer little functionality over the C bindings. The Boost.MPI library provides an alternative C++ interface to MPI that better supports modern C++ development styles, including complete support for user- defined data types and C++ Standard Library types, arbitrary function objects for colective algorithms, and the use of modern C++ library techniques to maintain maximal efficiency. As an example, one can concatenate the std::strings stored on each processor with a single reduce call and the function object std::plus<std::string>. However, if the call to reduce is merely computing the sum of integers, the implementation transforms into the appropriate call to MPI_Reduce. For more information about the design of Boost.MPI, see our design philosophy. The review tarball is here: http://www.generic-programming.org/~dgregor/boost.mpi/boost- mpi-20060906.tgz Online documentation is available here: http://www.generic-programming.org/~dgregor/boost.mpi/libs/ parallel/doc/html/ PDF documentation is available here: http://www.generic-programming.org/~dgregor/boost.mpi/mpi.pdf Review questions ================ Please always explicitly state in your review, whether you think the library should be accepted into Boost. You might want to comment on the following questions: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Cheers, Jeremy __________________________________ Jeremy Siek <siek@cs.colorado.edu> Visiting Assistant Professor Department of Computer Science University of Colorado at Boulder

Hi Jeremy, On 9/7/06, Jeremy Graham Siek <jsiek@osl.iu.edu> wrote:
The review tarball is here:
http://www.generic-programming.org/~dgregor/boost.mpi/boost- mpi-20060906.tgz
For the benefit of people like me who might see the above link as relatively harder to use: http://tinyurl.com/nlmxp
Online documentation is available here:
http://www.generic-programming.org/~dgregor/boost.mpi/libs/ parallel/doc/html/
http://tinyurl.com/m78bb Hope this helps! -- Dean Michael C. Berris C++ Software Architect Orange and Bronze Software Labs, Ltd. Co. web: http://software.orangeandbronze.com/ email: dean@orangeandbronze.com mobile: +63 928 7291459 phone: +63 2 8943415 other: +1 408 4049532 blogs: http://mikhailberis.blogspot.com http://3w-agility.blogspot.com http://cplusplus-soup.blogspot.com

What is necessary to be able to test this on a windows machine. It's my understanding that there exist MPI(CH?) implementations which permit one to run programs like those in the tutorial without having and actual cluster at one's disposal. I would appreciate some background/pointers or whatever as to the best way of doing this. I realize its not really on topic - but without something like this, the number of people that can effectively review this is very limited. Robert Ramey Jeremy Graham Siek wrote:
Review questions ================
Please always explicitly state in your review, whether you think the library should be accepted into Boost.
You might want to comment on the following questions:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Cheers, Jeremy
__________________________________ Jeremy Siek <siek@cs.colorado.edu> Visiting Assistant Professor Department of Computer Science University of Colorado at Boulder
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi Robert, On 9/7/06, Robert Ramey <ramey@rrsd.com> wrote:
What is necessary to be able to test this on a windows machine. It's my understanding that there exist MPI(CH?) implementations which permit one to run programs like those in the tutorial without having and actual cluster at one's disposal. I would appreciate some background/pointers or whatever as to the best way of doing this.
IIRC, there is a Windows based version of LAM/MPI [1] which you can use. And yes, you're right that you don't need a "real life cluster" to run an MPI toolset/implementation.
I realize its not really on topic - but without something like this, the number of people that can effectively review this is very limited.
I agree, however like most of the previous reviews that I have looked at, two very important aspects that can be scrutinized without building the library is the design and interface/implementation. Reading the source code is one way, and reading the documentation is another. Granted that if you already have prior experience with an MPI implementation, you should be able to read through everything and understand that it's really a wrapper around the existing MPI implementations around. HTH! -- Dean Michael C. Berris C++ Software Architect Orange and Bronze Software Labs, Ltd. Co. web: http://software.orangeandbronze.com/ email: dean@orangeandbronze.com mobile: +63 928 7291459 phone: +63 2 8943415 other: +1 408 4049532 blogs: http://mikhailberis.blogspot.com http://3w-agility.blogspot.com http://cplusplus-soup.blogspot.com

On 9/7/06, Dean Michael Berris <mikhailberis@gmail.com> wrote:
IIRC, there is a Windows based version of LAM/MPI [1] which you can use. And yes, you're right that you don't need a "real life cluster" to run an MPI toolset/implementation.
I recall wrong. LAM/MPI doesn't have a version for Windows -- however NT-MPICH [1] is one you might want to get. A good list can be found in [2]. HTH! [1] http://www.lfbs.rwth-aachen.de/~silke/projects/nt-mpich/ [2] http://www.cs.usfca.edu/mpi/ -- Dean Michael C. Berris C++ Software Architect Orange and Bronze Software Labs, Ltd. Co. web: http://software.orangeandbronze.com/ email: dean@orangeandbronze.com mobile: +63 928 7291459 phone: +63 2 8943415 other: +1 408 4049532 blogs: http://mikhailberis.blogspot.com http://3w-agility.blogspot.com http://cplusplus-soup.blogspot.com

On 07.09.2006, at 00:29, Robert Ramey wrote:
It's my understanding that there exist MPI(CH?) implementations which permit one to run programs like those in the tutorial without having and actual cluster at one's disposal. I would appreciate some background/pointers or whatever as to the best way of doing this.
I realize its not really on topic - but without something like this, the number of people that can effectively review this is very limited.
I don't have experience with Windows and MPI since most MPI applications run on Unix or Linux clusters. However, if you have a Unix (Linux, MacOS, ...) machine you can run MPI also on a single machine and do not need a cluster. Matthias

Jeremy Graham Siek wrote:
Hi All,
Review questions ================
Please always explicitly state in your review, whether you think the library should be accepted into Boost.
YES please accept into Boost.
You might want to comment on the following questions:
- What is your evaluation of the design?
I have not dug into the design and these comments come from running some tests. I very much like the integration of boost serialization, which in particular makes it easy to transfer strings between tasks of different rank.
- What is your evaluation of the implementation?
The implementation has made extensive use of other boost components. As well as serialize, parts of archive and detail are used, I think as a consequence of the use of serialize. This came to light as my base installation is boost 1.33.1 and I installed only the required components from cvs.
- What is your evaluation of the documentation?
I found this fairly easy to follow. I think there is a typo in this line in the code on page 6. world << "I am process " << world.rank() << " of " << world.size() << "." << std::endl; The example did not compile as there does not seem to be operator<< defined for the communicator.
- What is your evaluation of the potential usefulness of the library?
I think this is useful for those working in a multiprocessor environment using MPI.
- Did you try to use the library? With what compiler?
I have used the code on a single processor system (AMD64) using 32 bit Linux (Fedora 4) with g++ 4.0.2 and a base of boost 1.33.1. I used LAM MPI. As commented above I used cvs to obtain uptodate copies of boost components: serialise, archive and detail and compiled only those source files which were needed.
Did you have any problems?
Only with the code typo above, once I had identified the need for the uptodate archive version.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I have spent a couple of evenings and have run most of the examples in the manual and examples. I very much liked the string-cat example which can be easily adapted to gather output from all the tasks.
- Are you knowledgeable about the problem domain?
I have used mpich and OOMPI (http://www.osl.iu.edu/research/oompi/) which does something similar in a different way. I have also experimented with bsfcpp (http://f.loulergue.free.fr/research/bsfcpp/main.html) John Fletcher

On Sep 12, 2006, at 7:07 AM, John Fletcher wrote:
I think there is a typo in this line in the code on page 6.
world << "I am process " << world.rank() << " of " << world.size() << "." << std::endl;
The example did not compile as there does not seem to be operator<< defined for the communicator.
Oops, that's an unfortunate typo. Thank you for reporting the problem!
I have also experimented with bsfcpp (http://f.loulergue.free.fr/research/bsfcpp/main.html)
I had never heard of this library before. Very interesting. Doug

On Sep 6, 2006, at 1:16 PM, Jeremy Graham Siek wrote:
Review questions ================
Please always explicitly state in your review, whether you think the library should be accepted into Boost.
I strongly recommend that this library be accepted into Boost.
You might want to comment on the following questions:
- What is your evaluation of the design?
The design reflects considerable effort into streamlining the MPI interface (fewer arguments). I like the use of vector<T> in the collectives although most of the legacy codes use C style arrays. I also like the overloaded gather and scatter for root / non-root processes.
- What is your evaluation of the implementation?
I found the implementation to be lucid, well documented and ready for use. I was able to review the implementations of the broadcast, gather, scatter, and reduce functions which all call through to the corresponding MPI_ function. This is perfectly reasonable. But these, and other, functions can be implemented much more efficiently using sends and recvs. These less efficient implementations may adversely impact adoption of Boost.MPI by the larger high performance computing community. I would like the authors to consider these more efficient algorithms at some point in the future.
- What is your evaluation of the documentation?
Modulo a few typos, perfectly acceptable.
- What is your evaluation of the potential usefulness of the library?
This is a huge win over coding the C functions directly or using the very poor MPI C++ bindings as noted by the authors.
- Did you try to use the library? With what compiler? Did you have any problems?
I modified a small explicit dynamics code to use some calls from the Boost MPI library. The conversion was straight forward and successful using the Sun 5.7 compilers with Sun's HPC library. There were no problems but I was unable to test the skeleton and the function object capabilities, which interest me greatly.
- How much effort did you put into your evaluation?
A moderate amount of time. I spent 7 hours converting a program's MPI calls to use Boost.MPI and testing the changes, and a couple of hours reading the documentation and perusing the implementation code.
- Are you knowledgeable about the problem domain?
Yes. -- Noel Belcourt

On Sep 16, 2006, at 2:20 AM, K. Noel Belcourt wrote:
I was able to review the implementations of the broadcast, gather, scatter, and reduce functions which all call through to the corresponding MPI_ function. This is perfectly reasonable. But these, and other, functions can be implemented much more efficiently using sends and recvs. These less efficient implementations may adversely impact adoption of Boost.MPI by the larger high performance computing community. I would like the authors to consider these more efficient algorithms at some point in the future.
Performance is extremely important to us, so I want to make sure I understand exactly what you mean. One of the biggest assumptions we make, particularly with collectives, is that using the most specialized MPI call gives the best performance. So if the user sums up integers with a reduce() call, we should call MPI_Reduce(..., MPI_INT, MPI_SUM, ...) to get the best performance, because it has probably been optimized by the MPI vendor, both in general (i.e., a better algorithm than ours) and for their specific hardware. Of course, if the underlying MPI has a poorly-optimized implementation of MPI_Reduce, it is conceivable that Boost.MPI's simple tree-based implementation could perform better. I haven't actually run into this problem yet, but it clearly can happen: I've peeked at one or two MPI implementations and have been appalled at how naively some of the collectives are implemented. I think this is the point you're making: it might be better not to specialize down to, e.g., the MPI_Reduce call, depending on the underlying MPI implementation. There is at least one easy way to address this issue. We could introduce a set of global, compile-time flags that state whether the underlying implementation of a given collective is better than ours. These flags would vary depending on the underlying MPI. For instance, maybe Open MPI has a fast broadcast implementation, so we would have typedef mpl::true_ has_fast_bcast; whereas LAM/MPI might not have a fast broadcast: typedef mpl::false_ has_fast_bcast; These flags would be queried in the algorithm dispatch logic: template<typename T> void broadcast(const communicator& comm, T& value, int root = 0) { detail::broadcast_impl(comm, value, root, mpl::and_<is_mpi_datatype<T>, has_fast_bcast>()); } The only tedious part of implementing this is determining which collectives are well-optimized in all of the common MPI implementations, although we could certainly assume the best and tweak the configuration as our understanding evolves. Doug

On Sep 16, 2006, at 8:50 AM, Douglas Gregor wrote:
On Sep 16, 2006, at 2:20 AM, K. Noel Belcourt wrote:
I was able to review the implementations of the broadcast, gather, scatter, and reduce functions which all call through to the corresponding MPI_ function. This is perfectly reasonable. But these, and other, functions can be implemented much more efficiently using sends and recvs. These less efficient implementations may adversely impact adoption of Boost.MPI by the larger high performance computing community. I would like the authors to consider these more efficient algorithms at some point in the future.
Performance is extremely important to us, so I want to make sure I understand exactly what you mean.
One of the biggest assumptions we make, particularly with collectives, is that using the most specialized MPI call gives the best performance. So if the user sums up integers with a reduce() call, we should call MPI_Reduce(..., MPI_INT, MPI_SUM, ...) to get the best performance, because it has probably been optimized by the MPI vendor, both in general (i.e., a better algorithm than ours) and for their specific hardware. Of course, if the underlying MPI has a poorly-optimized implementation of MPI_Reduce, it is conceivable that Boost.MPI's simple tree-based implementation could perform better. I haven't actually run into this problem yet, but it clearly can happen: I've peeked at one or two MPI implementations and have been appalled at how naively some of the collectives are implemented. I think this is the point you're making: it might be better not to specialize down to, e.g., the MPI_Reduce call, depending on the underlying MPI implementation.
Precisely.
There is at least one easy way to address this issue. We could introduce a set of global, compile-time flags that state whether the underlying implementation of a given collective is better than ours. These flags would vary depending on the underlying MPI. For instance, maybe Open MPI has a fast broadcast implementation, so we would have
typedef mpl::true_ has_fast_bcast;
whereas LAM/MPI might not have a fast broadcast:
typedef mpl::false_ has_fast_bcast;
These flags would be queried in the algorithm dispatch logic:
template<typename T> void broadcast(const communicator& comm, T& value, int root = 0) { detail::broadcast_impl(comm, value, root, mpl::and_<is_mpi_datatype<T>, has_fast_bcast>()); }
The only tedious part of implementing this is determining which collectives are well-optimized in all of the common MPI implementations, although we could certainly assume the best and tweak the configuration as our understanding evolves.
I think this is the best option, assume the native MPI implementations are efficient and then flip these flags as we find evidence to the contrary. I like this solution, very clean, no runtime overhead, easy to configure. I look forward to using your library. -- Noel Belcourt

Hi All, The Boost.MPI review is drawing to a close. If you haven't had a chance to look at it yet, please do so in the next day or two. Cheers, Jeremy On Sep 6, 2006, at 1:16 PM, Jeremy Graham Siek wrote:
Hi All,
The view of the Boost.MPI (for Message Passing Interface) begins today, Sept. 6 and continues through Sept. 15.
Description:
The Message Passing Interface (MPI) is a standard interface for message passing in high-performance parallel applications. It defines a library interface, available from C, Fortran, and C++, for which there are many MPI implementations. Although there exist C++ bindings for MPI, they offer little functionality over the C bindings. The Boost.MPI library provides an alternative C++ interface to MPI that better supports modern C++ development styles, including complete support for user- defined data types and C++ Standard Library types, arbitrary function objects for colective algorithms, and the use of modern C++ library techniques to maintain maximal efficiency. As an example, one can concatenate the std::strings stored on each processor with a single reduce call and the function object std::plus<std::string>. However, if the call to reduce is merely computing the sum of integers, the implementation transforms into the appropriate call to MPI_Reduce. For more information about the design of Boost.MPI, see our design philosophy.
The review tarball is here:
http://www.generic-programming.org/~dgregor/boost.mpi/boost- mpi-20060906.tgz
Online documentation is available here:
http://www.generic-programming.org/~dgregor/boost.mpi/libs/ parallel/doc/html/
PDF documentation is available here:
http://www.generic-programming.org/~dgregor/boost.mpi/mpi.pdf
Review questions ================
Please always explicitly state in your review, whether you think the library should be accepted into Boost.
You might want to comment on the following questions:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Cheers, Jeremy
__________________________________ Jeremy Siek <siek@cs.colorado.edu> Visiting Assistant Professor Department of Computer Science University of Colorado at Boulder
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost
__________________________________ Jeremy Siek <siek@cs.colorado.edu> Visiting Assistant Professor Department of Computer Science University of Colorado at Boulder
participants (9)
-
Dean Michael Berris
-
Doug Gregor
-
Douglas Gregor
-
Jeremy Graham Siek
-
Jeremy Siek
-
John Fletcher
-
K. Noel Belcourt
-
Matthias Troyer
-
Robert Ramey