Re: [boost] Proposal: MapReduce library (single machine)

15 Jun 2009

      Craig Henderson wrote:
...
I have designed the class infrastructure to be as flexible as possible using
templates. Job scheduling is a particular interest of mine, and is a policy
that can be specified. The current 'library' includes two schedulers,
mapreduce::schedule_policy::cpu_parallel as in the example which maximises
the use of the CPU cores in the machine, and
mapreduce::schedule_policy::sequential which runs one Map task followed by
one Reduce task. This is useful for debugging the algorithms.
...
So, to answer your question, I don't have specific performance metrics and
comparisons that I can shared with you at this time. The principle for the
library is that everything is templated (policy-based) so can be swapped
around and re-implemented to best suite the needs of the application. The
supplied implementations provide the framework and a decent implementationof the policies, but will not be optimal for all users
Well, some figures could be nice to at least check we don't go slower
Do you have different kind of parallel scheduling like openMP can have : 
static, dynamic, etc ...

than on a single CPU ;) A simple scalability test could be already enough.

THe other quirks I have are :
 * it seems t have a lot of work to be done to take one user function 
and turn it into something your library could manage.
 * it seems we have to write some loop ourselves at some point in the 
mapper and reducer. Can't this be leveraged somehow ? What an end-user 
may want to write is the single element->element sequential function for 
map and the element->element->element fold function to be used on top of 
the element list.

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Re: [boost] Proposal: MapReduce library (single machine)

joel