
On Mon, Aug 31, 2009 at 4:19 AM, Craig Henderson<cdm.henderson@googlemail.com> wrote:
I am thinking now about how to progress my MapReduce library that is in the Boost SandBox. I have completed the single-machine implementation; it's performance is comparable to other libraries such as Phoenix (http://mapreduce.stanford.edu) and has been tested by a few people on this list.
There has, however, been little interest in the library so far from Boost users/developers which surprises me. I don't know if that is because of the single-machine limitation and people don't see any value that MR can bring to multi-threaded programming?
So where do I go next with the library? Options that I see are: ... 2. Continue to develop the library in the sandbox to multi-machine implementation and work towards submitting that for formal review. Is there interest for this?
I would say 2 is the best option. People have been splitting tasks into multiple operations and combining the results on single machines for ages -- MapReduce doesn't really offer any innovation there. The innovation, and the buzz about it, is that it offers a reliable, general-purpose, and large-scale distributed implementation of this very basic idea. If you can accomplish that in this library, I think there will be _a lot_ more interest. I think a lot of the MapReduce buzz also has to do with the services tied to it that further ease common scalability bottlenecks, the big ones being Google File System and BigTable. It's really just part of the bigger ecosystem. -- Cory Nelson http://int64.org