Vladimir,
It's interesting, and probably would make a good addition to documentation. Though bzip appears to be phased out these days, replaced by xz, which does not use Burrows–Wheeler transformation.
Will do (in the development branch).
One quote from that article you cite: "Merge sort performs better than radix sort for sorting keys of large sizes - such keys will be required to accommodate the increasing cardinality of future databases" Merge sort is fundamentally less efficient than a smart hybrid-radix approach when CPU sorting long keys because it has to compare the entire length of the key up to the point of difference multiple times, where the string_sort in this library only passes through each character in the key once (or twice depending on how you count) during the radix sorting portion, and then only compares what hasn't already been passed through in the final comparison-sorting stage once the problem has been cut down to less than a specific constant size.
The problem is that it does not seem to me that Boost is a forum to discuss algorithm implementation. You propose a library that purports to do efficient sorting, based on 2002 paper. Were there no papers in subsequent 12 years with better algorithms?
I did an algorithm search in 2008 when preparing the boost submission. ALR came out, fairly similar to my paper. I have not seen much attention paid in academia to hybrid radix sorting.
Do you happen to have a copy of the full article? I'm happy to apply any helpful concept to improve the library.
The "full text" link at the top of that page works for me.
I hit a paywall at home, but accessed it at work. This is a comparison of parallel LSD radix sort vs Mergesort. The conclusion is that parallel LSD radix sort is slower with long keys. This is not surprising. MSD hybrid radix sort (string_sort) does much better than Mergesort or LSD radix sort with long keys. This library does not get into parallel implementation issues; it can be used as a subsort inside a parallel Mergesort, or you can split up the problem by MSD to parallelize it and use this library to subsort.
With regards to classic algorithms, this library provides an implementation similar to Adaptive Left Radix for integer_sort, which came after it but was more widely available, with the addition of better worst-case performance. It also provides an adaptation of American Flag Sort (a classic algorithm) to C++ strings and with better worst-case performance guarantees.
I'm worried about "similar" and "adaptation" above. If I see that function whatever_sort implements an algorithm from paper N, with 200 citations, then I can assume enough people reviewed the algorithm. If that function implements an adaptation, then I'm not really sure what complexity I can expect.
Since you've read the overview, how about just looking through the code and seeing if it works as described, or testing it? Can you ever be sure a concrete implementation correctly applies an abstract concept without looking at the implementation? These implementations are fast for the same reason ALR and AFS are. They have superior worst-case performance because they fall back to std::sort at a size that optimizes worst-case performance. The only other difference is that these algorithms scan to see if they can shrink the range each iteration, which improves performance for non-worst-case distributions with little impact on worst-case distributions.