
Hi Phil, I investigated a little further about VIA C3s and downloaded your testcase, so I have more to add. On Thu, Jul 3, 2008 at 9:36 AM, Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
No, this is Linux. It's a VIA C3, so it probably has less cache than many other systems.
From my reading, it actually has a larger L1 cache (and smaller L2 cache), and for that purpose a MAX_SPLITS as high as 13 might be better. Increasing LOG_MIN_SPLIT_COUNT might also speed up your test a little. Those are
probably the only consts that are worth changing. std::sort performs very well as soon as the data is small enough to fit in the L1 cache. Spreadsort performs less iterations the larger MAX_SPLITS is, though if MAX_SPLITS exceeds the cache size, its performance gets worse by 50% or sometimes more (drops off a cliff). The best usage of Spreadsort is usually to cut the data up small enough for std::sort to fit in the L1 cache and finish sorting.
OK, I've put a 14 Mbyte file called placenames.dat.bz2 in the directory http://chezphil.org/tmp/ (I've not written the URL in full to stop search engines from wasting their time on it). When un-bziped you'll find a tab-delimited 3-column text file. I got basically the same results using the latitude and longitude columns.
What's the best way to read data like that from a file? fscanf("%f %f %s\n")? Won't that have problems if there is a space? I don't do much file parsing in C/C++. Spreadsort is often even faster relative to std::sort when sorting key + data relative to sorting just keys, because it performs less swaps than std::sort, and outside of swapping just uses the keys. It also looks like you could use multikey sorting, which is a logical next step once I have integer_sort, float_sort, and string_sort fully implemented. Steve