
Actual comment: Do you have actual performance data for non trivial task ? more precisely, I kno from experience that such kind of implementation may suffer from some C++ induced overhead. How do you compare to han written pthread or boost::thread code ?
I'm interested to see how this performs
I have designed the class infrastructure to be as flexible as possible using templates. Job scheduling is a particular interest of mine, and is a policy that can be specified. The current 'library' includes two schedulers, mapreduce::schedule_policy::cpu_parallel as in the example which maximises the use of the CPU cores in the machine, and mapreduce::schedule_policy::sequential which runs one Map task followed by one Reduce task. This is useful for debugging the algorithms. What I haven't shown in the documentation is that intermediates::local_disk<> takes three parameters; the last two being defaulted. These are for Sorting and Merging the intermediate results. The current implementation uses a crude system() call to the OS, which of course need improving. Interestingly it is the sorting that takes much of the time in my tests so far. So, to answer your question, I don't have specific performance metrics and comparisons that I can shared with you at this time. The principle for the library is that everything is templated (policy-based) so can be swapped around and re-implemented to best suite the needs of the application. The supplied implementations provide the framework and a decent implementation of the policies, but will not be optimal for all users. -- Craig