
16 Jun
2009
16 Jun
'09
6:27 a.m.
I'm running some tests and will update the site with performance comparisons shortly
Great
I've posted metrics from three runs of WordCount on a ~10Gb dataset at http://www.craighenderson.co.uk/mapreduce/ Scalability is not linear, as you would expect, as there is contention in reading the files from 8 or 16 threads simultaneously. This is where multi-machine MapReduce clearly comes into its own - assuming the data is distributed with a decent replication filesystem. -- Craig