Re: [boost] RFC - Updated MapReduce library

9 Aug 2009


      Craig Henderson wrote:
...
This interface has changed several times and I can't decide the most
appropriate. I have provided a base class to define the required types, hence the use of
a function object. However, it is dangerous to use a real functor with an
instance because the Map Tasks are independent of each other and run in
different threads. If they had instance data, then synchronization becomes
an issue, but more importantly, it breaks the programming model. In a true
distributed system, map tasks will run on separate machines, and therefore
unable to share data. Support for will is intended for a later release of
the library, so I need to keep the design pure.
Why not having each thread with a local copy of the functor. Ideally, 
those are stateless anyway and thus this copy is mainly free.
Other thing is, why not allowing the use of anything that acts as a 
function and provides the correct interface. You'll face the dreaded
legacy code reuse wall if your users can't take their years old 
sequential function and turn it into a mapper or reducer. Storing 
boost::function inside the implementation to leverage this genericity is 
maybe a good idea.
...
These stats are very useful for research and testing, but I agree are less
important in a production environment. The timings need to be built into the
library infrastructure because the library user does not have access to the
granularity of timing (without writing a bespoke schedule_policy). I can
look at making the timing a another policy class, but I don't think the
overhead is really that significant, is it?
I like when my phone phones and my toaster toasts. When I want a phone 
that toast, I like to be able to explicitly decide so ;)
Make it a policy is def. better in my book.
...
I'm disappointed you think this. I have worked really hard to make the
interface as light as possible. If you compare the library interface to
other implementations such as Phoenix, I hope you'll agree that this library
is quite light.
It's mainly around the need to have type::other_type::stuff and to have 
to check/rememebr which comes before.
An unified thing like result_of<type(user type)>::type looks better.
...
I am keen to make it lighter if you can be specific with some suggestions,
though?
This sample code is maybe the FIRST thing to be shown to the user really 
as it is far clearer on the intend on how to structurate the library.
...
Agreed on the performance figures, and I'll provide some comparisons in the
future. Jose on this list has helped with some comparison with Phoenix, and
the results are comparable with the WordCount example. You'll appreciate
that I am limited to the machines I have access to, and Phoenix isn't
available on my platform.
I fully understand that and ...
...
Only that I am not familiar with openMP, and haven't looked at it. It's
unlikely that I'll be able to do this, but if someone in the Boost community
would like to help out, I'd be delighted.
... this is something I can contribute.
...
In the documentation I did say that I am not providing a tutorial on
programming in MapReduce, but maybe I will one day. I do, however, recognize
that one example does not demonstrate the possibilities for the library, and
I will be providing more samples in the future.
Nice thing could be small scale examples tied to tasks that one can have 
to do in a parallel way and demonstrate that the MapReduce approach adds 
a value at some point (paraphrasing Murray Cole statement of "show the 
payback").


-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Re: [boost] RFC - Updated MapReduce library

joel