-----Original Message-----From: "Brian Budge" [brian.budge@gmail.com]Date: 18/11/2010 03:20 PMTo: boost-users@lists.boost.orgSubject: Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions>> The large calculation that I currently do serially and that I intend to>> parallelize is the maximum of the return values of a large number of>> evaluations of a given "function" in the mathematical sense.>> The number of arguments of the function is only known at runtime.>> Let's say it is determined at runtime that the number of arguments is 10, ie>> we have 10 arguments x0, ..., x9>> Each argument can take a different number of values, for e.g. x0 can be>> x0_0, x0_1 .... x0_n0>> x1 can be x1_0, x1_1, ...x1_n1 and so on...n0 and n1 are typically known at>> runtime and different>>>> so serially, I run>> f(x0_0, x1_0, ..., x9_0)>> f(x0_0, x1_0, ..., x9_1)>> ...>> f(x0_0, x1_0, ..., x9_n9)>> then with all the x8 then all the x7 ... then all the x0.>> There is n0*n1*...*n9 runs>> Then I get the maximum of the return values.>>>>>> Imagining I have N mpi processes, ideally each process would run>> n0*n1*...*n9/N function evaluations.>> How do I split?>>>> In terms of current implementation, each of the x is a boost::variant over 4>> types:>> a double, a <min,max> pair, a <min, max, increment> triplet or a>> vector<double>>> A visitor is applied recursively to the variants in order to traverse the>> whole parameter space. >> apply_visitor on x0 => say if x0 is a triplet, then>> for (x0 from min to max with increment)>> apply_visitor on x1>> and so on until x9, then we actually call the f function with all the>> arguments collected so far.>>>> How can one parallelize such a beast?>>>> rds,>>>>>> _______________________________________________>This partly depends on how many processors/machines you have>available. You need to find a way of partitioning your state space>into tasks, and then doling those tasks out to processes/threads. How>expensive is f()? How much memory is used?>Brian f() takes about 1ms. I have a couple of billion n-tuples in the parameter space for now but thinking of reaching easily a couple of trillions.In the serial mode, only about 20Mbytes of RAM is used. The parameter space is not generated ahead. It looks like a tree.The tree is generated, when I reach a leaf, I evaluate f(), and compare the return to the current maximum.So I'm not storing much.rds,