
Dean Michael Berris wrote:
Hi Guys,
I'm writing quite a number of parallel-distributed applications in the day job, and I'm missing a facility which allows for "automagically" nesting my reductions. The reduction strategy I'm looking for is one of a networked approach where (if you can imagine a tree):
0 |-1 | |-2 | |-3 | |-4 |-5 |-6
Nodes 2 and 3 send their data to node 1, node 5 and 6 send their data to node 4, and eventually node 1 and 4 send their (reduced) data to node 0. My idea is that it should be simple to implement this without having to go through too much trouble with communicators -- or that it could be automagically done by hiding the communicator splits in the special reduce implementation.
Would something like this be best implemented within Boost.MPI? Or is there a way of implementing this by composing the functionality already available in Boost.MPI?
The reason I ask is that sometimes the reduction step dominates the amount of time the application spends especially when you have quite a number of nodes (around 90ish). Being able to parallelize (or nest) the reduction would definitely help, but the cost of supporting that routine over a number of applications seems to warrant a special implementation of the Boost.MPI reduce.
TIA
Dean, What MPI implementation are you using? I ask because I believe that at least some of them already use optimizations of this sort when implementing reduce. That is really more the right place for a communications scheme of this sort. Boost.MPI just uses the facilities for this that are provided by MPI itself. In principle, the MPI implementation should do the reduce in as efficient a way as it can. What does yours currently do? John