
Thanks for the replies. I finished a first pass of the documentation, tests, and other work of including my algorithm in boost and uploaded it here: https://github.com/skarupke/sort/tree/master I also re-ran all the single threaded benchmarks and updated the documentation. For the big O complexity: I am actually still not quite sure what it is. I used similar notation to what spreadsort uses. I think O(n*key_length) is correct for the worst case, but for long keys (like when sorting long strings) the algorithm may fall back to comparison based sorting, so I used the same thing as spreadsort: min(O(n log n), O(n key_length)). For speed: ska_sort should be the fastest sorting algorithm even for small numbers of items. I included three graphs of benchmarks, for sorting ints, floats and strings: https://github.com/skarupke/sort/blob/master/doc/images/sorting_ints.png https://github.com/skarupke/sort/blob/master/doc/images/sorting_strings.png https://github.com/skarupke/sort/blob/master/doc/images/sorting_floats.png For ints and strings ska_sort is always fastest, for floats pdqsort is briefly faster, but starting at N >= 128 ska_sort is faster. Part of the work of this new version of ska_sort was to make radix sort really fast even for small N. Before I would use std::sort for any small array of less than 128 items. Now that cutoff is at 32 items, so I skip straight to insertion sort and never do std::sort. The biggest effect of this change is that it removes the "waves" from the benchmark graph, like we still see for spreadsort in the above picture. I used to also have those when I gave a talk about this algorithm. Now the graph is much more smooth. (this also confuses my answer to the question "what is the big O complexity of this algorithm?" Because the O(n*b) answer that I gave in my talk was justified by the presence of the waves) The biggest questions I have at this point are around the interface. I reworked that completely so that even relatively advanced use cases are now pretty straightforward. I don't think that anyone else will ever need to use those advanced features, so I kept the documentation of them small, choosing instead to focus on the common use cases. And I made sure to give lots of examples of converting existing comparison-based sorting examples to my interface. Oh and for the big point of a parallel version of this algorithm: At this point I have no idea how to do that. I haven't even begun to look into parallel algorithms. The current version was a lot of work already. I think that'll have to be a project for another time. (it's definitely appealing though since currently the sorted algorithms are only 2.4 times as fast as ska_sort on a machine with 8 cores/16 hyper threads in benchmark_numbers.cpp, so even a little bit of parallelism would beat the existing algorithms...) Francisco, since you'll be busy in the near term it sounds like I can't expect too much response for another month, but I would really appreciate it if you could look at it a little just to see if anything obvious stands out to you that I can work on while I wait. (otherwise no biggie, I understand how hard it can be to find time for maintaining projects, since that is exactly one of the reasons why I want to get this into boost...) Thanks, Malte -- Malte Skarupke malteskarupke@fastmail.fm
Date: Sun, 20 Oct 2019 13:24:06 +0200 From: Francisco Jos? Tapia <fjtapia@gmail.com> To: boost@lists.boost.org Subject: Re: [boost] Adding my sorting algorithm, ska_sort to boost Message-ID: <CADBNGqQJpHux3AdRJON-nh76BaouRn1FRO4_LrMR76TywmCH+g@mail.gmail.com> Content-Type: text/plain; charset="UTF-8"
Hi Malte, I am Francisco Tapia, one of the co-authors of the Sort library. We will examine your algorithm. But we must examine the internal parts of the algorithm, and evaluate the correctness of the algorithm, how is implemented the worst case and a couple of details. After, we check the speed with the benchmarks used in the library, with several sets of data. Since the near sorted to the fully unsorted, and from the single numbers to structs with heavy comparison functions.
Now, I am overburdened by work, and I can't revise until December. But we will do and provide to you a reasoned response, and according to it, decide the inclusion or not in the library. As commented by Phil Endecott, the most interesting now is the parallel algorithms. Today, we don't have any parallel algorithm based on radix sort or any other hybrid algorithm. Because with small sets of data the differences are insignificant. But when you have big sets of data, use parallel algorithms, and then, the new single thread algorithms are in the middle, in the no man's land.
If you have any doubt or question, contact me, and I will try to answer you.
El mar., 15 oct. 2019 a las 23:06, Kostas Savvidis via Boost (< boost@lists.boost.org>) escribi?:
On Oct 13, 2019, at 03:06, Malte Skarupke via Boost < boost@lists.boost.org> wrote:
Hi, a couple of years ago I wrote a generalized version of radix sort, called ska_sort. I gave a talk about it at CppNow, available here: https://www.youtube.com/watch?v=zqs87a_7zxw
I have a new version of that algorithm that is both faster and more general. I can now sort almost anything. For example when I gave the talk I couldn't sort a std::vector<std::list<int>> because std::list doesn't have a random access interface, but now that works. (and it's faster than sorting the same thing with std::sort)
This seems like a product of a large investment of effort. The big advantage over any previously available radix_sort is the ability to sort almost any datatype if a key maps to an integer somehow. Plus, an advantage in measured practical speed, and apparently no claims about Big O improvement.
The average performance is O(n*b) where b is unknown and/or dependent on n ?
Is the theoretical worst-case O(n^2) ? Why? Is it because of working in-place? (yes, I understand you fall back to std::sort if you hit this case)
Is this software/algorithm fully documented? - I mean by another way than reading the source code.
Maybe too many questions at once, but it was not clear after watching your talk at cppcon and the blog posts.
Cheers, Kostas
<https://www.youtube.com/watch?v=Nz1KZXbghj8>
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
------------------------------
Subject: Digest Footer