Re: [boost] RFC - Updated MapReduce library

9 Aug 2009

      Hi Phil,
...
Quoting from the start of your docs:
"The Boost.MapReduce library is a MapReduce implementation across
a
     plurality of CPU cores rather than machines."
Isn't that rather missing the point of what MapReduce is supposed to be
about?  If I'm limited to one machine, I can write parallel code using
the full repertoire of techniques.
You can, and this is basically just another alternative technique. Writing
multithreaded applications can be difficult, and is often done badly, so
this library provides a framework to do the donkey-work and allow the
developer to concentrate on solving their problem.  Other libraries already
exist for single-machine map/reduce (google for "phoenix mapreduce"), and
there's an evaluation paper on it at
http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf
...
By re-designing my application to
fit into the MapReduce pattern I can potentially scale it over multiple
machines.  But if I can't scale over multiple machines, why bother?
In this scenario, then don't bother, indeed. But if you want easily to
implement low-lock-contention multithreaded processing, then you might take
a look.
...
Are you planning to support scaling over multiple machines in the
future?
Yes, I am designing and developing a distributed file system that is aimed
to achieve this (see
http://craighenderson.co.uk/blog/index.php/tag/distributed-file-system/) or
integration to any other DFS could do the same.

The library is very much in its infancy, but I believe is useful enough to
be a part of Boost in its single-machine state.

Regards
-- Craig

Re: [boost] RFC - Updated MapReduce library

Craig Henderson