
Cory Nelson said
People have been splitting tasks into multiple operations and combining the results on single machines for ages -- MapReduce doesn't really offer any innovation there.
Well, it provides a very easy framework for implementing parallel algorithms. Mulithreading is hard and often done very badly - MR simplifies the task tremendously.
The innovation, and the buzz about it, is that it offers a reliable, general-purpose, and large-scale distributed implementation of this very basic idea. If you can accomplish that in this library, I think there will be _a lot_ more interest.
I think a lot of the MapReduce buzz also has to do with the services tied to it that further ease common scalability bottlenecks, the big ones being Google File System and BigTable. It's really just part of the bigger ecosystem.
Agreed - the difficulty is in defining where a library ends and the infrastructure begins. This library cannot (and should not, IMO) explode into a distributed file system (extension to Boost.FileSystem) & communications library (Boost.MPI or Boost.ASIO based). This is the MapReduce algorithm to sit upon other infrastructure to provide an overall solution. -- Craig