On 10 Nov 2014 at 8:18, Steven Ross wrote:
I want to see scaling benchmarks comparing this implementation with the traditional STL containers for 0 to 10,000 items, for both single threaded and concurrent usage (even if that is simply a spinlock around the library).
If these scaling benchmarks are not provided, I automatically vote no.
Niall, I'm not sure precisely what you're asking for, and would like clarification:
Are you asking me to sort 0, 1, 2, 4, ... 2^14 elements, input in this form: vector<int> vector<float> vector<string>
I'd be happy with int and string only personally. Float is nice though.
And compare the speeds of std::sort vs integer_sort, float_sort, and string_sort, both sequentially, and doing multiple separate sorts in parallel?
Sounds good. You'd get a graph of CPU cycle scaling to N looking like: http://www.nedprod.com/programs/portable/nedtries/images/LogLogBitwise TreesScaling.png http://www.nedprod.com/programs/portable/nedtries/images/LogLogRedBlac kTreesScaling.png http://www.nedprod.com/programs/portable/nedtries/images/LogLogHashTab leScaling.png For your situation you just sort, so a combined graph showing a comparative to other algorithms makes much more sense. Regarding parallel sort, you don't need to necessarily sort the same dataset. Rather many parallel sorts shows how well an algorithm coexists with other work on the same CPU. As an example of what I mean, a single thread doing red-black tree modification is fast, but two threads doing red-black tree modification sees exponential slow down if the cache lines touch. That makes std::map extremely unsuited to multiple thread use, even when protected by a spinlock. Finally, and the most important, these sorts of scaling benchmarks demonstrate proof you've thought about this sort of stuff. That stamps a great big fat mark of quality on your library. Anyone can claim anything about some bit of code of theirs. Not everyone can empirically demonstrate its truth others can replicate easily for themselves.
What distribution do you want this to sort? Evenly distributed random, the variety of distributions (including evenly distributed random) used in tune.pl, or something else?
I think I remember seeing your algorithm makes good use of structure? If so, totally random, partially sorted and nearly sorted are all good distributions.
I'd like to note that if the input is <3000 elements, this library immediately falls back to std::sort, as that is the approximate crossover point at which hybrid radix sorting becomes faster. Differences for arrays smaller than that are likely to be due to the overhead of the size check + the increase in executable size, which is likely to be difficult to separate out from noise.
Now that is a very vital piece of new information. I'll be honest in saying that I am unsure how useful a > 3000 element algorithm can be. Maybe if you implement a no-alloc core loop like Antony suggested you can significantly improve on this? > 300 would be lovely. I also have to admit a curiosity. My non-allocating bitwise tries library nedtries could be viewed as a way of very quickly nearly sorting items, even though I wrote it with the intention of it being an indexing algorithm rather than sorting algorithm. An interesting side effect of the non-allocating in-place algorithm is that it is highly amenable to being multithreaded i.e. you can fire N threads at it, and get ~N speedup if your keys are very random. Reading back that index would give you an almost sorted array, one upon which bubble sort would surely perform excellently. As you are proposing a full on Boost.Sort library rather than just a single sort algorithm, I thought I should raise this perhaps even better performing sort algorithm which can make excellent use of multiple cores, something which is always hard to do for sort algorithms. I sadly have no need for such algorithms in my present work, and therefore can't allocate any time of my own on this sort of stuff. A shame. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/