Re: [boost] Proposal: MapReduce library (single machine)

16 Jun 2009

      Craig Henderson wrote:
...
I've already answered this in other threads... the scheduling is implemented
in a policy class so other threading approaches can be used. Current
implementations are Sequential (single thread Map followed by Reduce
phases), and CPU Parallel to maximize CPU core utilization.
I saw that, the question was, for your parallel scheduler, how do you 
generate
worklaod for each processor ?
...
I'm running some tests and will update the site with performance comparisons
shortly
...
The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce
(k2,list(v2)) --> list(v2). This inevitably requires iteration over
collections. A generic Map & Reduce task could be written to delegate to
sequential functions as you suggest, but I see this as an extension to the
library rather than a core component.
Well, canonically, running a map function only require the 
(k1,v1)->(k2,v2) funcion.
The sequence iteration is leveraged by the map skeleton. Similary for 
Reduce where
a fold like function is strictly needed. Having to specify how to 
iterate over the sequence
is uneeded IMHO and add clutter to what you need to write. I don't see 
an actual improvement on this
Great

point if I still have to iterate myself on my data and just use yopur 
tool to generate the scheduling.
I can do it by hand with a thread_pool and it won't be more verbose.

An "optimal" way to have this should be :

map_reduce<SomeSchedulingPolicy>( input_seq, output_seq, map_func, 
reduce_func)

and having xxx_seq be conforming to some IterableSequence concept and 
have xxx_func be functions object or PFO conforming
to the standard map/fold prototype. Instrospection on ypes and presence 
of given methods/functions then helps finding how to
iterate over the sequence (using type_traits and suc) and generate the 
appropriate, optimized iteration code calling map and fold where it should.

Re: [boost] Proposal: MapReduce library (single machine)

Joel Falcou