Re: [boost] Proposal: MapReduce library (single machine)

15 Jun 2009


      ...
Do you have different kind of parallel scheduling like openMP can have : 
static, dynamic, etc ...
I've already answered this in other threads... the scheduling is implemented
in a policy class so other threading approaches can be used. Current
implementations are Sequential (single thread Map followed by Reduce
phases), and CPU Parallel to maximize CPU core utilization.
...
...
So, to answer your question, I don't have specific performance metrics
...
Well, some figures could be nice to at least check we don't go slower 
than on a single CPU ;) A simple scalability test could be already enough.
I'm running some tests and will update the site with performance comparisons
shortly
...
* it seems t have a lot of work to be done to take one user function 
and turn it into something your library could manage.
Can you expand on this a bit? Sure there is some scaffolding for defining
types and constructing objects, but the WordCount example is just 5 lines
for the Map and 4 lines for the Reduce - that sounds quite lightweight me :)
Seriously, though, I'd like to understand your concern about 'a lot of
work', and hear suggestion on reducing the overhead.
...
* it seems we have to write some loop ourselves at some point in the 
mapper and reducer. Can't this be leveraged somehow ? What an end-user 
may want to write is the single element->element sequential function for 
map and the element->element->element fold function to be used on top of 
the element list.
The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce
(k2,list(v2)) --> list(v2). This inevitably requires iteration over
collections. A generic Map & Reduce task could be written to delegate to
sequential functions as you suggest, but I see this as an extension to the
library rather than a core component.

Thanks
-- Craig

Re: [boost] Proposal: MapReduce library (single machine)

Craig Henderson