I want to see scaling benchmarks comparing this implementation with the traditional STL containers for 0 to 10,000 items, for both single threaded and concurrent usage (even if that is simply a spinlock around the library).
If these scaling benchmarks are not provided, I automatically vote no.
Niall, I'm not sure precisely what you're asking for, and would like clarification: Are you asking me to sort 0, 1, 2, 4, ... 2^14 elements, input in this form: vector<int> vector<float> vector<string> And compare the speeds of std::sort vs integer_sort, float_sort, and string_sort, both sequentially, and doing multiple separate sorts in parallel? What distribution do you want this to sort? Evenly distributed random, the variety of distributions (including evenly distributed random) used in tune.pl, or something else? I'd like to note that if the input is <3000 elements, this library immediately falls back to std::sort, as that is the approximate crossover point at which hybrid radix sorting becomes faster. Differences for arrays smaller than that are likely to be due to the overhead of the size check + the increase in executable size, which is likely to be difficult to separate out from noise.