- What is your evaluation of the potential usefulness of the library?
I am not sure the library is useful. For very few users, sorting 100kb arrays of integers is a performance bottleneck. And if it's truly a performance bottleneck, one would probably ponder cache architecture, or use of GPU, or a zillion other tricks. Like this paper from 2010 appears to do, for example:
With regards to practicality: I applied the string_sort algorithm to bzip and saw a 40% reduction in total compression time. Would you like to see the source for that? There are problems where people sort billions of strings, integers, floats, or things that can be sorted like them, such as processing integrated circuit designs, where time is money. One quote from that article you cite: "Merge sort performs better than radix sort for sorting keys of large sizes - such keys will be required to accommodate the increasing cardinality of future databases" Merge sort is fundamentally less efficient than a smart hybrid-radix approach when CPU sorting long keys because it has to compare the entire length of the key up to the point of difference multiple times, where the string_sort in this library only passes through each character in the key once (or twice depending on how you count) during the radix sorting portion, and then only compares what hasn't already been passed through in the final comparison-sorting stage once the problem has been cut down to less than a specific constant size. Do you happen to have a copy of the full article? I'm happy to apply any helpful concept to improve the library.
And finally, every review should attempt to answer this question:
- Do you think the library should be accepted as a Boost library?
I think the library should not be accepted. There's no reason to give Boost approval to implementation of algorithm that is neither accepted classic nor an obvious breakthrough.
With regards to classic algorithms, this library provides an implementation similar to Adaptive Left Radix for integer_sort, which came after it but was more widely available, with the addition of better worst-case performance. It also provides an adaptation of American Flag Sort (a classic algorithm) to C++ strings and with better worst-case performance guarantees. Did you read the overview section of the documentation that came with the library?