Re: [boost] Proposed templated integer_sort

4 Jul 2008

      Hi Phil,

I investigated a little further about VIA C3s and downloaded your testcase,
so I have more to add.

On Thu, Jul 3, 2008 at 9:36 AM, Phil Endecott <
spam_from_boost_dev@chezphil.org> wrote:
...
No, this is Linux.  It's a VIA C3, so it probably has less cache than many
other systems.
...
From my reading, it actually has a larger L1 cache (and smaller L2 cache),
and for that purpose a MAX_SPLITS as high as 13 might be better.  Increasing
LOG_MIN_SPLIT_COUNT might also speed up your test a little.  Those are
probably the only consts that are worth changing.
std::sort performs very well as soon as the data is small enough to fit in
the L1 cache. Spreadsort performs less iterations the larger MAX_SPLITS is,
though if MAX_SPLITS exceeds the cache size, its performance gets worse by
50% or sometimes more (drops off a cliff).  The best usage of Spreadsort is
usually to cut the data up small enough for std::sort to fit in the L1 cache
and finish sorting.
...
OK, I've put a 14 Mbyte file called placenames.dat.bz2 in the directory
http://chezphil.org/tmp/ (I've not written the URL in full to stop search
engines from wasting their time on it).  When un-bziped you'll find a
tab-delimited 3-column text file.  I got basically the same results using
the latitude and longitude columns.
What's the best way to read data like that from a file?
fscanf("%f %f %s\n")?  Won't that have problems if there is a space?  I
don't do much file parsing in C/C++.
Spreadsort is often even faster relative to std::sort when sorting key +
data relative to sorting just keys, because it performs less swaps than
std::sort, and outside of swapping just uses the keys.  It also looks like
you could use multikey sorting, which is a logical next step once I have
integer_sort, float_sort, and string_sort fully implemented.

Steve

Re: [boost] Proposed templated integer_sort

Steven Ross