
Do you have different kind of parallel scheduling like openMP can have : static, dynamic, etc ...
I've already answered this in other threads... the scheduling is implemented in a policy class so other threading approaches can be used. Current implementations are Sequential (single thread Map followed by Reduce phases), and CPU Parallel to maximize CPU core utilization.
So, to answer your question, I don't have specific performance metrics ... Well, some figures could be nice to at least check we don't go slower than on a single CPU ;) A simple scalability test could be already enough.
I'm running some tests and will update the site with performance comparisons shortly
* it seems t have a lot of work to be done to take one user function and turn it into something your library could manage.
Can you expand on this a bit? Sure there is some scaffolding for defining types and constructing objects, but the WordCount example is just 5 lines for the Map and 4 lines for the Reduce - that sounds quite lightweight me :) Seriously, though, I'd like to understand your concern about 'a lot of work', and hear suggestion on reducing the overhead.
* it seems we have to write some loop ourselves at some point in the mapper and reducer. Can't this be leveraged somehow ? What an end-user may want to write is the single element->element sequential function for map and the element->element->element fold function to be used on top of the element list.
The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce (k2,list(v2)) --> list(v2). This inevitably requires iteration over collections. A generic Map & Reduce task could be written to delegate to sequential functions as you suggest, but I see this as an extension to the library rather than a core component. Thanks -- Craig