
1) compare to the C phoenix library http://mapreduce.stanford.edu The library has just been updated to v2 and supports linux x86_64 (provides many datasets/examples and it could be benchmarked against. There is also a great paper/video/slides describing the library)
I have seen Pheonix, but it is not available for my development platform (Windows). I guess the primary difference in design is that my library uses C++ templates to build a type specific MapReduce runtime, so there is minimal void* casting required.
2) have a plan so that the library can eventually range from working on 1 multi-core system to distributed
This is the eventual goal, and I have designed the library to be extensible to be able to do this in the future. With the policy based designed, making use of the open source projects that you mention should be non-intrusive; the library allows the user to supply a handler for the intermediate files (to use kosmosfs or hypertable, for example)
- http://kosmosfs.sf.net (C++ distributed file system) - http://hypertable.org (C++ distributed db, which already
Thanks for the references
Do you plan to set a project/mailing list ?
If there is enough interest, then I will. Regards -- Craig